Dissertation Defense Details
A Graph-based Approach For Semantic Data Mining
|Date:||June 01, 2012|
|Committee:||Dejing Dou (Chair)|
Data mining, also referred to as knowledge discovery in databases (KDD), is the non-trivial extraction of implicit, previously unknown, and potentially useful information from data. The measure of what is meant by ``useful" to the user is dependent on the user as well as the domain within which the data mining system is being used. Therefore, the role of domain knowledge in the discovery process is essential. However, previous research has made limited attempts to build data mining systems that are capable of incorporating domain knowledge in a principled manner. This motivates us to develop a framework for explicit incorporation of domain knowledge in a data mining system so that insights can be drawn from both knowledge and data in a systematic way. We call such technology the semantic data mining.
Latest research in knowledge representation, particularly in the area of W3C's Semantic Web that seeks to embed semantic content in web pages, has led to mature standards such as the Web Ontology Language (OWL) for authoring ontologies. An ontology is an explicit specification of a conceptualization. Today, Semantic Web ontologies have become a key technology for intelligent knowledge processing, providing a framework for sharing conceptual models about a domain. We make extensive use of ontologies as a means encode domain knowledge in this dissertation.
The OWL ontology language is built on the W3C's Resource Description Framework (RDF) that provides a simple model to describe information resources as a graph. At the same time, there has been a surge of interest in tackling the problem of mining semantically rich datasets, where objects are linked in a number of ways. In fact, many datasets of interest today are best described as a linked collection, or a graph, of interrelated objects. It can be seen that, the interface between domain knowledge and data mining can be possibly achieved by using graph representations in which distinct sorts of knowledge that has been traditionally differently represented can be structured in a unified manner. Therefore, in this dissertation, we explore a graph-based approach for modeling both knowledge and data, and for analyzing the combined information source from which insight can be drawn systematically.
This dissertation presents two novel directions in semantic data mining. First, we describe how a graph interface for both knowledge representation and data mining can be structured. This is achieved by employing the RDF model and by the fact that RDF allows a combined specification of both schema and data structured under this schema. We propose to use the RDF hypergraph (bipartite) model as the unified representation for both data and domain knowledge. And second, we describe a graph theoretic analysis approach for mining the combined information source of both data and knowledge. We showcase hypergraph-based analysis method for an important data mining task, namely, the frequent pattern mining. We hope these novel contributions can lead to the development of new principles towards semantic data mining.