Skip Navigation

Colloquium Details

SystemT: an Algebraic Approach to Declarative Information Extraction

Author:Yunyao Li IBM Almaden Research Center
Date:April 23, 2013
Time:15:30
Location:220 Deschutes
Host:Dejing Dou

Abstract

In recent years, Information Extraction (IE) has become increasingly important to a wide array of enterprise applications, ranging from Business Intelligence and Semantic Search to Data-as-a-Service. Such applications drive three main requirements for IE systems: scalability, accuracy and usability. In this talk I will give an overview of SystemT, a rule-based IE system designed to address these requirements. SystemT ships over 10 IBM products, including Lotus Notes, Omnifind, and eDiscovery Analyzer, and is used in multiple ongoing research projects.

SystemT is based on the basic principle underlying relational database technology: complete separation of specification from execution. SystemT uses a declarative rule language, AQL, and an optimizer that generates high-performance algebraic execution plans for AQL rules. We show that SystemT removes the expressivity and performance limitations of previous state-of-art rule-based systems based on cascading grammars, delivering comparable result quality and an order of magnitude higher annotation throughput with much lower memory footprint.

Biography

Yunyao Li joined the Search&Analytics group at IBM Research - Almaden in July 2007 after obtaining her Ph.D degree in Computer Science & Engineering from University of Michigan. Yunyao has broad interest in the area of databases, natural language processing, human-computer interaction and machine learning. She is particularly interested in designing, developing and analyzing scalable and usable systems for a wide spectrum of users. Her current research towards this direction focuses on text analytics and enterprise search.