CIS607, Spring 2016
Natural Language Processing and Information Extraction
Course Description:
Natural Language Processing (NLP) and Information Extraction (IE) are two closely related research areas. NLP has been an important research area in Artificial Intelligence and Linguistics. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input. IE is a special task of natural language understanding to extract a particular class of objects and look for relationships among objects. This course will introduce the state-of-art Natural Language Processing and Information Extraction Techniques. We will focus on the application of special NLP/IE systems in real world problems.
The instructors will give introduction about the state-of-art of NLP and IE in a couple of lectures. Students are expected to read and discuss papers from journals or conference proceedings. Each student is expected to collect some real-world text data to apply IE systems for a small course project. The final report for each student will be a short survey of the fields and the results generated from your project.
Prerequisites:
None. Basic knowledge of AI will be helpful.
Time and Place:
Fridays 12:00-1:20pm, 200 Deschutes Hall.
Instructor:
Dejing Dou, 303 Deschutes, phone 541-346-4572, email dou@cs.uoregon.edu.
Co-Instructor:
Steve Fickas, 309 Deschutes, phone 541-346-3964 , email fickas@cs.uoregon.edu.
Evaluation:
There is no exam for this seminar. Attendance and participation, paper
reading, paper presentation and final report will determine the course score. Students
will be encouraged to conduct further research projects from the topics
discussed in this seminar, but it is not the requirement. Some more detail.
Schedule and Lecture Notes:
Papers for Reading (keeping updated):
- Research in Natural Language Processing and Information Extraction
- H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 168-175, 2002.
- D. Klein and C. D. Manning. Accurate Unlexicalized Parsing. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL), pp. 423-430, 2003.
- E. Charniak and M. Johnson. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative
Reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 173-180, 2005.
- A. McCallum, D. Freitag, and F. C. Pereira. Maximum Entropy Markov Models for
Information Extraction and Segmentation. In Proceedings of International Conference on Machine Learning (ICML), pp. 591-598, 2000.
- J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for
segmenting and labeling sequence data. In Proceedings of International Conference on Machine Learning (ICML), pp. 282-289, 2001.
- A. Jain, A. Doan, and L. Gravano. Optimizing SQL Queries over Text Databases. In Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 636-645, 2008.
- D. Z. Wang, E. Michelakis, M. J. Franklin, M. N. Garofalakis, and J. M. Hellerstein. Probabilistic declarative information extraction. In Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 173-176, 2010.
- Y. Bengio, R. Ducharme, P. Vincent and C. Janvin. A Neural Probabilistic Language Model. Journal of Machine Learning Research, volume 3, pp. 1137-1155, 2003.
- R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1631-1642, 2013.
- K. S. Hasan and V. Ng. Why are You Taking this Stance? Identifying and Classifying Reasons in Ideological Debates. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 751-762, 2014.
- G. Goth. Deep or Shallow, NLP is Breaking Out
. Communications of the ACM, Vol. 59 No. 3, pp. 13-16, 2016.
- AQL and SystemT
- R. Fagin, B. Kimelfeld, and F. Reiss. Spanners: a formal
Framework for Information Extraction. In Proceedings of the 32nd Symposium on Principles of Database Systems (PODS), pp. 37-48, 2013.
- L. Chiticariu, R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, and S. Vaithyanathan. SystemT: An Algebraic Approach to Declarative Information Extraction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 128-137, 2010.
- F. Reiss, S. Raghavan, R. Krishnamurthy, H. Zhu, and S. Vaithyanathan. An algebraic approach to rule-based information extraction. In Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 933-942, 2008.
- D. Z. Wang, L. Wei, Y. Li, F. Reiss, and S. Vaithyanathan. Selectivity
estimation for extraction operators over text data. In Proceedings of IEEE International Conference on Data Engineering (ICDE), pp. 685-696, 2011.
Useful Links