Evaluating a Joint Neural Model with Global Features for Document-Level End-to-End Information Extraction
Haoran Wang
Committee: Thien Nguyen
Masters Thesis(Jun 2021)
Keywords: Information Extraction, Natural Language Processing

Information Extraction (IE) is one of the most important elds in Natural Language Processing (NLP). The goal for IE tasks is to extract structured knowledge from unstructured text. While most datasets focus on sentence-level IE and paragraph-level IE, a document-level IE dataset is needed for research on processing long documents. Fortunately, researchers at Allen Institute for AI published a comprehensive and challenging document-level IE dataset (SCIREX) for the IE research community to study. Performing end-to-end IE tasks on SCIREX requires global understanding of the full document as relations can span across beyond sentences or even sections. This thesis applies a joint neural model with global features (ONEIE) to perform two end-to-end IE tasks on SCIREX, named entity extraction (NER) and relation extraction (RE). The performance of ONEIE is compared to SCIREX baseline model and DYGIE++, the state-of-the- art end-to-end IE model.