Knowledge Base Refinement and Knowledge Translation with Markov Logic Networks

Shangpu Jiang

Machine learning and data mining have provided plenty of tools for extracting knowledge from data. Yet, such knowledge may not be directly applicable to target applications and might need further manipulation: The knowledge might contain too much noise, or the target application may use a different representation or terminology.

In this dissertation, we study three problems related to knowledge management and manipulation. First, given a knowledge base (KB) automatically extracted from the text, we explore how to refine it based on the dependencies among the possible KB instances and their confidence values. Second, when the target application to which we want to apply our knowledge uses a different schema, we explore how to translate the knowledge based on the mapping between the schemas. Sometimes, the mapping between two schemas can be discovered automatically, so the third problem we consider is whether we can find the mapping more accurately using the corresponding knowledge contained in the two schemas.

We notice that a large fraction of data and knowledge can be represented in relational models, which can be formalized with first-order logic. Moreover, uncertainty is a common feature existing in these problems, e.g., the confidence values associated with the KB instances, the probabilistic knowledge rules to be translated, or the schemas not perfectly aligned with each other. Therefore, we adopt statistical relational learning, which combines first-order logic with probabilistic models, to resolve these problems. In particular, we use Markov logic networks (MLNs), which consist of sets of weighted first-order formulas. MLNs are a powerful and flexible language for representing hard and soft constraints of relational domains.

We develop the MLN formulations for each of these problems, and we use the representation, inference and learning approaches in the literature with certain adaptations to solve them. The experiment results show that MLNs successfully provide solutions to these problems or achieve better performances than the existing methods.

This dissertation includes previously published and unpublished coauthored material.