CSE 635 Spring 2020

NLP and Text Mining

MW 5 – 6:20 pm NSC 210

Reg # 21768

Instructor:  Rohini K. Srihari


This course will explore various approaches to text, web and social media mining.  Since natural language processing (NLP) is the foundation for most text mining solutions, a major focus of the course is on widely used NLP algorithms.  This includes topic models, entity tagging, opinion analysis, information extraction, parsing, summarization, machine translation and question answering.  We will cover both traditional, feature-based approaches, as well as recent approaches based on neural embeddings and deep learning.  Several applications utilizing text mining will be covered including social media mining and recommender systems (algorithms powering Amazon, Facebook and Twitter).  


Speech and Language Processing (3rd Edition)  Daniel Jurafsky and James Martin, 2019.   [SLP]

Recommender Systems  Charu C. Aggarwal, Springer, 2016.  [RS]

Social Media Mining   Zafarni, Abbasi and Liu, Cambridge University Press, 2014. [SMM]

Project:  Students are expected to work on two programming projects:  (i) an individual project involving implementing an NLP algorithm on a standard data set with evaluation, and (ii)  a semester long group project involving text/web mining:  students will choose between three projects.    We will be using common data sets to facilitate evaluation.  The project requirements will be discussed in detail during the first week.  You will receive guidance regarding data collection, algorithms, evaluation methodology during the semester.  Students will be required to present their final group project during the last week of class.  Students are also required to write a technical paper describing their project and experiments.  You will work in pairs or groups of three for the class project which will satisfy department requirements for the MS project.  

Grading:  There is no midterm or final for this course.  Instead, there will be a weekly or bi-weekly, in-class short quiz ( a few multiple choice questions) based on the previous week’s lectures.  If you come to class regularly, you should find the quizzes easy. The final grade will be based on all of the above, i.e. (i) quizzes, (ii) individual project, (iii) group project and demonstration, (iii) final paper.

Prerequisites:  The required background is a combination of information retrieval (CSE 535), machine learning, and programming expertise.