CIS 410/510

Natural Language Processing

Course Description

We rely on natural languages for knowledge storage, communication, and reasoning. Much of our collective knowledge resides in textual form, found in books, papers, and articles. A pivotal focus of artificial intelligence (AI) involves creating computer systems capable of comprehending and emulating human communication and reasoning processes using this textual data. This field, known as Natural Language Processing (NLP), holds significant importance across various domains due to its wide-ranging applications. Recent advancements in AI, powered by large language models such as ChatGPT, GPT-4, and Gemini, as well as transformer-based deep learning architectures, stem directly from NLP research.

This course will cover several levels of text analysis and understanding, including word and phrase level analysis (document retrieval and text classification), syntactic analysis (grammars and parsing), semantic analysis (word and sentence meaning), and discourse analysis (pronoun resolution and text structure). Students will learn to use such techniques to solve different NLP problems, including part of speech tagging, parsing, language modeling, sentiment analysis, information extraction, question answering, machine translation and text generation. While fundamental technologies will be introduced, emphasis will be placed on machine learning methods, particularly deep learning and pre-trained language models, to address these challenges. Deep learning and pre-trained language models have demonstrated exceptional performance in recent years, establishing themselves as primary tools for solving NLP problems.

Instructor

Thien Huu Nguyen, thien@cs.uoregon.edu

Lectures

Two 80-minute lectures are delivered each week.

Prerequisites

CIS 315 - Intermediate Algorithms

Textbooks and Readings

SLP:
Daniel Jurafsky and James H. Martin, Speech and Language Processing, 3nd Edition, 2024. Draft available Online!
Mitchell:
Tom Mitchell, Machine Learning, 1997.
Murphy:
Kevin Murphy, Machine Learning: A Probabilistic Perspective, 2012.
ESL:
Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning, 2009 (Online!)
DL:
Ian Goodfellow and Yoshua Bengio and Aaron Courville, Deep Learning, 2016 (Online!)

Major Topics

NLP introduction, text classification: 2 lectures
Word embeddings and deep learning: 2 lectures
Sequential labeling: HMM, MEMM, CRF, Viterbi, RNN: 2 lectures
Syntax, constituent and dependency parsing: 2 lectures
Information Extraction, Relation Extraction: 2 lectures
Semi-supervised learning, distant supervision: 2 lectures
Language modeling, transformers: 2 lectures
In-context learning: 2 lectures
Tuning and aligning LLMs: 2 lectures

Expected Learning Outcomes

This course covers key challenges in Natural Language Processing (NLP), including text classification, part-of-speech tagging, parsing, information extraction, language modeling, question answering, and text generation. It emphasizes fundamental methods to address these challenges, with a primary focus on machine learning techniques such as word embeddings, deep learning, sequential labeling, supervised learning, semi-supervised learning, sequence-to-sequence models, and pre-trained language models.

Upon successful completion of the course, students will be able to:

demonstrate facility with feature-based machine learning methods in NLP, e.g., Naïve Base Classifiers, Logistic Regression, Support Vector Machines, Hidden Markov Models, Conditional Random Fields;
demonstrate facility with word embeddings and deep learning methods for NLP, e.g., Covolutional Neural Networks, Recurrent Neural Networks, Sequence-to-Sequence Models, Transformers;
demonstrate facility with text classification and the ability to use introduced methods and features to solve this problem;
demonstrate facility with sequential labeling problems in NLP and the ability to use introduced methods and features to solve these problems;
demonstrate facility with constituent and dependency parsing problems in NLP, the typical methods and features for these problems (CYK, transition-based methods), and the performance measures for models;
demonstrate facility with constituent and dependency parsing problems, typical methods and features for these problems, and performance measures;
demonstrate facility with learning paradigms beyond supervised learning: semi-supervised learning, active learning, and distant supervision;
demonstrate facility with information extraction problems in NLP and the ability to use introduced methods and features to solve these problems;
demonstrate facility with pre-trained large language models, in-context prompting, model finetuning techniques, and the ability to employ these techniques to solve NLP problems;

Acquired Skills

Upon successful completion of the course, students will have acquired the following skills:

be able to identify NLP problems and design appropriate features and models to solve those problems;
be able to leverage machine learning frameworks to implement models for basic NLP problems;
be able to analyze the operations/outputs of basic NLP models for debugging and performance improvement purposes.

Tentative Schedule

Dates	Topics	Resources
Apr 1, 3	NLP introduction (Slides), Text classification (Slides)	SLP 4
Apr 8, 10	Word embeddings (Slides), deep learning (Slides)	SLP 6
Apr 15, 17	Sequential Labeling, HMM, MEMM, CRF, Viterbi and RNN (Slides)	SLP 8, 9
Apr 22, 24	Syntax, Constituent (Slides) and Dependency Parsing (Slides)	SLP 17, 18
Apr 29, May 1	Information Extraction (Slides)	SLP 19
May 6, 8	Continue content from previous week
May 13, 15	Semi-supervised learning, distant supervision, Review (Slides) and Midterm
May 20, 22	Language Modeling, Transformers, Pre-trained Language Models	SLP 3, 7, 10
May 27, 29	In-context Learning (No class on May 27 - Memorial Day)
June 3, 5	Tuning and Aligning LLMs

Assignments

Assignment 1 (written): Link (posted on April 9), due date: April 17 at 11:59pm.
Assignment 2 (programming): Link (posted on April 22), due date: May 3 at 11:59pm.
Assignment 3 (programming): Link (posted on May 12), due date: May 27 at 11:59pm.

Final Project

Final projects involve implementing and evaluating models for some NLP problems. You will need to submit a proposal and a report for your project.
We will also collect your code. Working on the project early is highly recommended. Good projects can lead to paper submission at conferences afterward.

More detailed instructions for the final project and report: Link
Final Project Proposal Due: May 9 (11:59 pm)
Final report and code due: Finals week (June 11, 11:59pm)
Helpful links

Course Requirements and Grading

This course will be taught in-person. Please use Piazza and Canvas for communication and discussion.

Grading will be based on the following criteria:

Percentage	Component
40	written and programming assignments
30	midterm exam
30	final project

410 students will be evaluated separately from 510 students.

Grading Scale

A	A+ >= 97.00	A 93.34-96.90	A- 90.00-93.33
B	B+ 86.67-89.99	B 83.34-86.66	B- 80.00-83.33
C	C+ 76.67-79.99	C 73.34-76.66	C- 70.00-73.33
D	D+ 66.67-69.99	D 63.34-66.66	D- 60.00-63.33
F	F 0.00-59.99