CIS 410/510    

    Natural Language Processing    

Course Description

We rely on natural languages for knowledge storage, communication, and reasoning. Much of our collective knowledge resides in textual form, found in books, papers, and articles. A pivotal focus of artificial intelligence (AI) involves creating computer systems capable of comprehending and emulating human communication and reasoning processes using this textual data. This field, known as Natural Language Processing (NLP), holds significant importance across various domains due to its wide-ranging applications. Recent advancements in AI, powered by large language models such as ChatGPT, GPT-4, and Gemini, as well as transformer-based deep learning architectures, stem directly from NLP research.

This course will cover several levels of text analysis and understanding, including word and phrase level analysis (document retrieval and text classification), syntactic analysis (grammars and parsing), semantic analysis (word and sentence meaning), and discourse analysis (pronoun resolution and text structure). Students will learn to use such techniques to solve different NLP problems, including part of speech tagging, parsing, language modeling, sentiment analysis, information extraction, question answering, machine translation and text generation. While fundamental technologies will be introduced, emphasis will be placed on machine learning methods, particularly deep learning and pre-trained language models, to address these challenges. Deep learning and pre-trained language models have demonstrated exceptional performance in recent years, establishing themselves as primary tools for solving NLP problems.

Instructor

Thien Huu Nguyen, thien@cs.uoregon.edu

Lectures

Two 80-minute lectures are delivered each week.

Prerequisites

Textbooks and Readings

Major Topics

Expected Learning Outcomes

This course covers key challenges in Natural Language Processing (NLP), including text classification, part-of-speech tagging, parsing, information extraction, language modeling, question answering, and text generation. It emphasizes fundamental methods to address these challenges, with a primary focus on machine learning techniques such as word embeddings, deep learning, sequential labeling, supervised learning, semi-supervised learning, sequence-to-sequence models, and pre-trained language models.

Upon successful completion of the course, students will be able to:

Acquired Skills

Upon successful completion of the course, students will have acquired the following skills:

Tentative Schedule

Dates Topics Resources
Apr 1, 3 NLP introduction (Slides), Text classification (Slides) SLP 4
Apr 8, 10 Word embeddings (Slides), deep learning (Slides) SLP 6
Apr 15, 17 Sequential Labeling, HMM, MEMM, CRF, Viterbi and RNN (Slides) SLP 8, 9
Apr 22, 24 Syntax, Constituent (Slides) and Dependency Parsing (Slides) SLP 17, 18
Apr 29, May 1 Information Extraction (Slides) SLP 19
May 6, 8 Continue content from previous week
May 13, 15 Semi-supervised learning, distant supervision, Review (Slides) and Midterm
May 20, 22 Language Modeling, Transformers, Pre-trained Language Models SLP 3, 7, 10
May 27, 29 In-context Learning (No class on May 27 - Memorial Day)
June 3, 5 Tuning and Aligning LLMs

Assignments

Final Project

Course Requirements and Grading

This course will be taught in-person. Please use Piazza and Canvas for communication and discussion.

Grading will be based on the following criteria:

Percentage Component
40written and programming assignments
30midterm exam
30final project
410 students will be evaluated separately from 510 students.

Grading Scale

  A    A+ >= 97.00   A 93.34-96.90   A- 90.00-93.33 
  B    B+ 86.67-89.99   B 83.34-86.66   B- 80.00-83.33 
  C    C+ 76.67-79.99   C 73.34-76.66   C- 70.00-73.33 
  D    D+ 66.67-69.99   D 63.34-66.66   D- 60.00-63.33 
  F    F 0.00-59.99