LIN/CSE 467/567: Advanced Topics in Computational Linguistics
Instructor Name: Dr. Cassandra Jacobs Class Day and Time: MW 4-5 PM Number of Credits: 3-4 units Email Address: email@example.com Office Location: 614 Baldy Hall Office Hours: [Friday 9-10am by Zoom] or [in-person by appointment] Zoom link: https://buffalo.zoom.us/j/94128376411?pwd=SUtKNDduVTF3UmpPWGRLVEhpWCtaUT09
Magalí López Cortez - firstname.lastname@example.org Head TA
Office Hours: https://calendly.com/mlopezcortez/office-hours
Sean Afridi - email@example.com - Undergraduate TA
Office Hours: Tuesday 4:30-5:30pm in Discord
This course aims to provide students with an overview of the key areas which make up the field called Computational Linguistics, an understanding of the major challenges of the field as well as the major application areas for language processing techniques, and the skills to implement fundamental language processing algorithms. This course is dual listed between CSE 467/567 and LIN 467/567.
Required Text and Materials
All reading materials will be made available on UBLearns as well as the course webpage and will consist primarily of readings from the 3rd Edition of the Jurafsky and Martin (SLP3) book Speech and Language Processing: https://web.stanford.edu/~jurafsky/slp3/
Theme: Annotation and Error Analysis for Mastery in Computational Linguistics
Modern computational linguistics would not exist without high-quality annotated data and the people who create it. Many resources and tasks are the result of carefully curated datasets. But, very little attention is given in computational linguistics courses, despite data being a prerequisite for model creation and evaluation. The annotation assignments are individual annotation tasks using the python annotation software Prodigy. Three assignments will teach you about the challenges of different types of linguistic data, what about these problems can trip up natural language processing systems, and ways that human error and disagreement can influence our ability to draw conclusions at all. While annotations are performed individually, you will be part of a group that will submit reports after each assignment that synthesizes your results and decisions and answers targeted questions about the task.
|Course Learning Outcome||Instructional Methods||Assessment Methods|
|Mastery of definitions and uses of basic NLP concepts||Pre-recorded lecture videos||Watching pre-recorded lectures and completing lecture quizzes in Panopto|
|Mastery of natural language data annotation||In-class data annotation (M/W lectures)||* Submission of in-class annotation results * 3 annotation-centric tasks within a new domain|
|Mastery of basic NLP pipeline tools||In-class collaborative “live coding” sessions||* Submission of in-class coding results * 3 coding-centric homework assignments|
Weekly class/lecture structure
The course is presented as a “flipped classroom”, in which lectures on linguistics and mathematical content are pre-recorded and must be watched prior to the class. During the first half of each meeting period, students should ask questions about the lecture content. The second half of each lecture will be dedicated to coding exercises using Jupyter notebooks which will be used to review core concepts in natural language processing and implement text data processing pipelines.
The course is held synchronously and it is expected that you will contribute to the course community. Documented engagement with the course material and with other students is critical for a fun and fulfilling experience in the classroom for everyone — even asking or answering “basic” questions advances our learning goals.
Coding portions of assignments will be submitted through Gradescope.
|Weight (percent of overall grade)||Assignment|
|45%||Homework assignments (3)|
|40%||Annotation assignments (3) 66% of grade: Annotation completion 33% of grade: Group annotation reports|
|15%||Creating value for others via discussion and teamwork|
I guarantee minimum grades — students are never curved down below the numeric grade they receive in the course. Undergraduates and graduate students are graded to a slightly different curve and some questions on assignments will be required for graduate students but bonus for undergraduate students. Here are the cutoffs for the grade categories:
Lecture and reading schedule
Week 1 (January 30 & February 2)
M: Course content, course structure, and syllabus
W: Discussion: The promise and pitfalls of ChatGPT
Recorded video lecture on ethics in NLP
Week 2 (February 6 & 8) Gradescope practice submission out
Topic: Mathematical and Python Foundations
- Frequentist statistics
- Probability theory and Bayes’ rule
- Python idiosyncrasies
M: Mathematics practice
W: Python practice and introduction to some NLP toolkits
Gradescope practical assignment (participation only)
Week 3 (February 13 & 15) A1
Statistical language modeling 1
- What is a corpus? What are corpora?
- Computing n-gram statistics
- N-gram smoothing, “history”, and interpolation
- M: In-class data annotation ([Task 1 - Part 1]) instructions and annotation time
- Theme: Topic segmentation
- W: Hands-on statistical language modeling
- Readings: Jurafsky Martin SLP3 Chapter 3
Week 4 (February 20 & 22) PS1 out A1 report due
Statistical language modeling 2
- Why move beyond n-gram language models?
- RNNs, LSTMs, Transformers, and more
- M: In-class data annotation ([Task 1 - Part 2]) instructions and annotation time
- Theme: Topic labeling
- W: Annotation task 1 group report (In-class) - Due Friday February 24, 25% of report grade
- Programming assignment 1 released
- Topic: Language modeling
- Due March 6 by 9am
- Readings: Jurafsky Martin SLP3 Chapter 7
Week 5 (February 27 & March 1)
Tokenization, lemmatization, morphological structure
- Whitespace, Unicode, punctuation - What is a “word”?
- Learning and producing morphological structure
- W: Morfessor demo and inducing morphology from data
- Readings: Jurafsky Martin SLP3 Chapter 2
Week 6 (March 6 & 8) PS1 due A2
- Programming assignment 1 ([Language modeling]) due (March 6 at 9am)
Syntactic structure 1
- Part-of-speech tagging
- Hidden Markov Models and Viterbi
- Dependency parsing
- M: In-class data [Task 2] annotation instructions and annotation time
- Theme: EDU segmentation
- W: Hands-on part-of-speech tagging and error analysis
- Readings: Jurafsky and Martin SLP3 Appendix A, Chapter 8
Week 7 (March 13 & 15) A2 report due
Syntactic structure 2
- Dependency parsing
- Challenges with parsers
- Other grammar formalisms and parsers
- M: Training a PCFG model and generating text
- W: In-class group annotation report [Task 2] - Due Sunday March 19, 25% of report grade
- Readings: Chapter 17, Chapter 18, Appendix E (CCG)
- Week 8 (March 20 & 22) - Spring Break! No class.
Week 9 (March 27 & 29) PS2 out
- Programming assignment 2 [LLM lexicosyntax] released (due April 10 at 9am)
Lexical semantics 1
- Distributional semantics (static methods)
- Semantic features
- M: Accessing and using WordNet [Sean co-lead]
- W: Creating semantic features with clustering
Week 10 (April 3 & 5)
Lexical semantics 2
- Distributional semantics
- Contextual representations
- Distributional semantics
- M: Clustering word senses across neural models
- W: Clustering word senses across neural models cont.
Week 11 (April 10 & 12) A3 PS2 due
- Programming assignment 2 ([LLM lexicosyntax]) due (April 10 at 9am)
Discourse and discourse structure
- Discourse coherence
- Discourse relations
- Annotated corpora and discourse parsers
- M: In-class data annotation ([Task 3]) instructions and annotation time
- Theme: Multiple discourse relations for EDUs
- Optional readings TBA!
- W: Using and evaluating discourse parsers
- Readings: Chapter 21, Chapter 27
Week 12 (April 17 & 19) PS3 out A3 report due
- Programming assignment 3 [Translation evaluation] out (May 8 at 9am)
- Topic: Neural Machine Translation evaluation
- History of machine translation
- Neural machine translation
- Linguistic challenges
- Knowledge challenges
- M: Translating using
- W: In-class annotation ([Task 3]) group report - Due Sunday April 23, 50% of report grade
- Readings: Chapter 13
- Programming assignment 3 [Translation evaluation] out (May 8 at 9am)
Week 13 (April 24 & 26)
Natural language generation 1
- Classical challenges for NLG
- Applications of NLG
- Historical and rule-based approaches
- M: Generating text and creating an error ontology
- W: Evaluation metrics
- Readings: Chapter 15
Week 14 (May 1 & May 3)
Natural language generation 2
- Hybrid solutions to NLG challenges
- M: Identifying bias in NLG
- W: Identifying bias in NLG, continued
- Readings: Chapter 15
Week 15 (May 8 & May 10) PS3 due
- Programming assignment 3 [Translation evaluation] due (May 8 at 9am)
- Student choice and/or Group discussion on latest technology
- M: Student choice
- W: Student choice
Group annotation project guidelines
Annotation reports will help determine your mastery of the course material and refine your knowledge of language and give you hands-on experience with the hardest part of doing computational linguistics — making datasets that are useful for machine learning and NLP.
Annotation assignments have you go through the process of annotating data (don’t worry, we are only grading on completion!) and then the write-up, which functions as a report of your findings as part of a group of 4-6 people. We will ask structured questions of you and your teammates about each assignment and you will turn in one assignment as a group, along with a statement of contributions for the writing assignments, which will be a component of the participation grade.
Accessibility Services and Student Resources:
If you have a disability and may require some type of instructional and/or examination accommodation, please inform me early in the semester so that we can coordinate the accommodations you may need. If you have not already done so, please contact the Office of Accessibility Services (formerly the Office of Disability Services) University at Buffalo, 60 Capen Hall, Buffalo, NY 14260-1632; email: firstname.lastname@example.org Phone: 716-645-2608 (voice); 716-645-2616 (TTY); Fax: 716-645-3116; and on the web at http://www.buffalo.edu/studentlife/who-we-are/departments/accessibility.html. All information and documentation is confidential.
The University at Buffalo and the Graduate School of Education are committed to ensuring equal opportunity for persons with special needs to participate in and benefit from all of its programs, services and activities.
Academic integrity is critical to the learning process. It is your responsibility as a student to complete your work in an honest fashion, upholding the expectations your individual instructors have for you in this regard. The ultimate goal is to ensure that you learn the content in your courses in accordance with UB’s academic integrity principles, regardless of whether instruction is in-person or remote. Thank you for upholding your own personal integrity and ensuring UB’s tradition of academic excellence.
It is expected that you will behave in an honorable and respectful way as you learn and share ideas. Therefore, recycled papers, work submitted to other courses, and major assistance in preparation of assignments without identifying and acknowledging such assistance are not acceptable. All work for this class must be original for this class. Please be familiar with the University and the School policies regarding plagiarism. Read the Academic Integrity Policy and Procedure for more information. Visit The Graduate School Policies & Procedures page (http://grad.buffalo.edu/succeed/current-students/policy-library.html) for the latest information.
You will have two opportunities to provide anonymous feedback about the course. In the middle of the semester, I will send you a brief questionnaire asking about what activities are contributing to your learning and what might be done to improve your learning. At the conclusion of the semester you will receive an email reminder requesting your participation in the Course Evaluation process. Please provide your honest feedback; it is important to the improvement and development of this course. Feedback received is anonymous and I do not receive copies of the Evaluations until after grades have been submitted for the semester.
As a student you may experience a range of issues that can cause barriers to learning or reduce your ability to participate in daily activities. These might include strained relationships, anxiety, high levels of stress, alcohol/drug problems, feeling down, health concerns, or unwanted sexual experiences. Counseling, Health Services and Health Promotion are here to help with these or other issues you may experience. You can learn more about these program and services by contacting:
120 Richmond Quad (North Campus), 716-645-2720
202 Michael Hall (South Campus), 716-829-5900
Michael Hall (South Campus), 716-829-3316
Office of Health Promotion
114 Student Union (North Campus), 716-645-2837
UB is committed to providing a safe learning environment free of all forms of discrimination and sexual harassment, including sexual assault, domestic and dating violence and stalking. If you have experienced gender-based violence (intimate partner violence, attempted or completed sexual assault, harassment, coercion, stalking, etc.), UB has resources to help. This includes academic accommodations, health and counseling services, housing accommodations, helping with legal protective orders, and assistance with reporting the incident to police or other UB officials if you so choose. Please contact UB’s Title IX Coordinator at 716-645-2266 for more information. For confidential assistance, you may also contact a Crisis Service Campus Advocate at 716-796-4399.
Please be aware UB faculty are mandated to report violence or harassment on the basis of sex or gender. This means that if you tell me about a situation, I will need to report it to the Office of Equity, Diversity and Inclusion. You will still have options about how the situation will be handled, including whether or not you wish to pursue a formal complaint. Please know that if you not wish to have UB proceed with an investigation, your request will be honored unless UB’s failure to act does not adequately mitigate the risk of harm to you or other members of the university community. You also have the option of speaking with trained counselors who can maintain confidentiality. UB’s Options for Confidentiality Disclosing Sexual Violence provides a full explanation of the resources available, as well as contact information. You may call UB’s Office of Equity, Diversity and Inclusion at 716-645-2266 for more information, and you have the option of calling that office anonymously if you would prefer not to disclose your identity.
To effectively participate in this course, regardless of mode of instruction, the university recommends you have access to a Windows or Mac computer with webcam and broadband. Your best opportunity for success in the blended UB course delivery environment (in-person, hybrid and remote) will require these minimum capabilities.
Public health compliance in a classroom setting
UB student Behavioral Requirements in all Campus Public Spaces include:
- Should a student need to miss class due to illness, isolation or quarantine, they are required to notify their faculty to make arrangements to make up missed work.
- Students are responsible for following any additional directives in settings such as labs, clinical environments etc.