Skip Navigation

Colloquium Details

Bridging the Gap between Open and Closed Information Extraction

Author:Mathias Niepert University of Washington
Date:April 25, 2013
Time:15:30
Location:220 Deschutes

Abstract

Extracting, integrating, and querying the world's knowledge has been a long-standing goal of AI research. Due to the growing availability of very large open data sets, both structured and unstructured, and recent progress in scalable probabilistic inference, we are closer to this vision than ever before. Current IE projects, however, either follow a more data-driven approach, processing large text corpora to extract relations between entities, or a more schema-driven approach, building on ontologies and structured data. Both approaches have advantages and disadvantages. Open, data-driven information extraction projects, for instance, often provide more factual knowledge but suffer from low quality extractions. Closed, schema-driven projects facilitate well-defined query formalisms but lack coverage and are challenged by uncertainty in the data. This dichotomy, however, can be overcome by integrating open with closed information extraction. We present some recent advances in probabilistic data integration and efficient probabilistic inference moving closer towards the unification of data-driven and schema-driven information extraction.

Biography

Mathias is a postdoctoral research associate with Pedro Domingos at the University of Washington in Seattle. From 2009-2012 he was also a member of the Data and Web Science Research Group at the University of Mannheim. He obtained his PhD from Indiana University under the supervision of Dirk Van Gucht.

He and his co-authors were fortunate enough to win awards at international conferences such as UAI, IJCNLP, and ESWC. He is the principle investigator of a Google faculty research award and a bilateral DFG-NEH research award. He is also a co-founder of the Indiana Philosophy Ontology project. His research interests include probabilistic graphical models, statistical relational learning, digital libraries and, more broadly, the extraction, integration, and manipulation of structured data.