Committee: Dejing Dou (Chair), Arthur Farley, Michal Young
Area Exam(Mar 2009)
Keywords: information extraction; ontologies; knowledge representation; semantic web
Information Extraction (IE) aims to retrieve certain types of information from natural language text by processing them automatically. Ontology-Based Information Extraction (OBIE) has recently emerged as a subfield of Information Extraction. Here, ontologies - which provide formal and explicit specifications of conceptualizations - play a crucial role in the information extraction process. Because of the use of ontologies, this field is related to Knowledge Representation and has the potential to assist the development of the Semantic Web. This paper presents a survey of the current research work in this field including a classification of OBIE systems along different dimensions and an attempt to identify a common architecture among these systems. It also presents a definition for an OBIE system by taking several factors into consideration. In addition, this paper presents the details of some implementation work carried out by the author to explore the use of ontology-based information extraction. These include a project aimed at extracting information from a set of emails and a project aimed at using multiple ontologies to extract information from a set of university websites. The latter appears to be the first OBIE system to make use of multiple ontologies. Finally, the paper discusses possible directions for future research work on this field.