CSE 635 Spring 2023

 

NLP and Text Mining

 

Wed 12 – 2:30 pm

Reg # 22454

Alumni 97

 

Instructor:  Rohini K. Srihari

Piazza link:  https://piazza.com/buffalo/spring2023/cse635

 

 

Description:

 

This course covers a comprehensive set of topics in natural language processing (NLP).  We will cover knowledge-based and traditional feature-based approaches in addition to deep learning approaches to gain a more intuitive understanding.  The course begins with fundamental algorithms relating to early stages in the NLP pipeline including language models, POS analysis and entity recognition.  The next section is a sequence of topics related to deep learning in NLP, including neural embeddings, encoder-decoder models, transformers, and transfer learning using pretrained contextual models.  These topics are presented in the context of NLP tasks such as machine translation and sentiment analysis.  The latter parts of the course cover applications such as chatbots, information extraction, question answering along with advanced NLP topics such as semantic role labeling and discourse.  Each session will have a lecture component followed by a recitation involving interactive and code demonstration session for hands-on learning.

 

 

Textbook:

Speech and Language Processing (3rd Edition)  Daniel Jurafsky and James Martin, January 2023.   [SLP]

https://web.stanford.edu/~jurafsky/slp3/ed3book_jan72023.pdf

 

Project:  Students are expected to work on two programming projects:  (i) an individual project involving implementing an NLP algorithm on a standard data set with evaluation, and (ii)  a semester-long group project involving a state-of-the-art NLP problem:  students will choose between topics.    We will be using common data sets to facilitate evaluation wherever possible.  The project requirements will be discussed in detail during the first week.  You will receive guidance regarding data collection, algorithms, evaluation methodology during the semester.  Students will be required to present their final group project during the last week of class.  Students are also required to write a technical paper describing their project and experiments.  You will work in groups for the class project which will satisfy department requirements for the MS project. 

 

Grading:  There is no midterm or final for this course.  Instead, there will be 6 bi-weekly quizzes (multiple-choice questions) based on the previous week’s lectures.  Quiz sessions will be announced in advance.  If you come to class regularly and work on the projects, you should find the quizzes easy. The final grade will be based on all of the above as follows.

 

Quizzes:  40%

Individual Project:  20%

Group Project:  40%

 

 

Prerequisites:  The required background is a combination of information retrieval (CSE 535), machine learning (CSE 574), and programming expertise.

 

Piazza:  Students should enroll for the piazza site for this course at the link provided.  All class-related communication will take place through piazza.  Lecture notes and recitation materials will be posted here.

 

UB Learns:

 

Grades for the quizzes and projects will be available on UBLearns.  All class recordings (Zoom, Panopto) will be available on the UB Learns site for this course.  Quizzes will also be conducted through UB Learns. 

 

Academic Integrity:

 

Students should read the departmental academic integrity policies.  This is available at   https://engineering.buffalo.edu/computer-science-engineering/information-for-students/graduate-program/cse-graduate-academic-policies/cse-academic-integrity-policy.html   

These policies will be strictly enforced.

 

TA/Graders:

 

Souvik Das (TA)

Shalini Agarwal  (Grader)

Kartik Sehgal  (Grader)

 


 

 

Schedule (subject to change):

 

Lecture Date

Topic

Recitation

Readings

Feb 1

Course Overview

ChatGPT

Language Models

Naïve Bayes Sentiment Analysis

Project Discussion

n-gram language model, Naive-Bayes classification

Notes

[SLP] Ch 3, 4

Feb 8

Vector Semantics and Embeddings

Neural Networks

Word2Vec, genism

Multilingual embeddings

*** Practice Quiz on AI ***

[SLP] Ch 6, 7

Feb 15

Sequence Labeling: POS, Entity tagging

Chunking

 

POS, NER basics using Spacy, StanzaNLP, etc.

Multilingual POS tagging

*** Quiz ***

[SLP] Ch 8

Notes

Feb 22

Deep Learning Architectures for Sequence Processing

RNNs and LSTMs

Encoder-Decoder Models

Machine Translation

** Individual Project assigned **

PyTorch tutorial

Ch 9, 13

Mar 1

Transformers and Pretrained Language Models

Attention

Fine Tuning, Masked LMs

BERT based sequence classification using transformers' library

*** Quiz ***

[SLP] Ch 10,11

Jay Alammar Blog

Mar 8

Chatbots and Dialogue Systems

Chatbot  demo

[SLP] Ch 15

 

Mar 15

IR Based Applications

Question Answering

Entity Linking

***  Quiz  ***

[SLP] Ch 14

Mar 22

*** Spring Break ***

 

 

Mar 29

Parsing

Constituency Parsing

Dependency Parsing

Meaning Representation

 

[SLP] 17, 18, 19

April 5

Information Extraction: Relationship/Event extraction, Co-reference Resolution

*** Quiz ***

[SLP] Ch 20, 26

April 12

Word Senses and WordNet

Lexicons for Sentiment, Affect and Connotation

 

*** Quiz ***

[SLP] Ch 23, 25

April 19

Semantic Role Labeling

Textual Entailment

 

[SLP] Ch 24

Handouts

April 26

Discourse Coherence

Speech Recognition and Text to Speech

*** Quiz ***

[SLP] Ch 27, 16

 

May 3

Ethics, Bias in NLP

Course Summary

Previous project demos

Notes

May 10

Project Presentations