Spring 2024
LIN/CSE 467/567: Computational Linguistics
Instructor Name: Dr. Cassandra Jacobs
Class Day and Time: MWF 11AM-12PM
Location: Clemens 19
Number of Credits: 3-4 units
Email Address: cxjacobs@buffalo.edu
Office Location: Clemens 224 (Computational Linguistics Lab)
Office Hours: Monday 12-2pm or by appointment
Course description
This course aims to provide students with an overview of the key areas which make up the field called Computational Linguistics, an understanding of the major challenges of the field as well as the major application areas for language processing techniques, and the skills to implement fundamental language processing algorithms. This course is dual listed between CSE 467/567 and LIN 467/567.
Required Text and Materials
All reading materials will be made available on Brightspace as well as the course webpage and will consist primarily of readings from the 3rd Edition of the Jurafsky and Martin (SLP3) book Speech and Language Processing: https://web.stanford.edu/~jurafsky/slpdraft/, version published 2024-01-05. The book is freely available. The text is used and referenced in lectures, as well as take-home exams.
Theme: Annotation and Error Analysis
Modern computational linguistics would not exist without high-quality annotated data and the people who create it. Many resources and tasks are the result of carefully curated datasets. But, very little attention is given in computational linguistics courses, despite data being a prerequisite for model creation and evaluation. The annotation assignments are individual annotation tasks using the python-based annotation software Doccano.
Three assignments will teach you about the challenges of different types of linguistic data, what about these problems can trip up natural language processing systems, and ways that human error and disagreement can influence our ability to draw conclusions at all. While annotations are performed individually, you will participate in a brief survey on Brightspace about your experience with that task and an assessment of your performance.
Finally, you will be part of a group that will submit a proposal for an annotation study of your own, which will culminate in a short paper and a presentation.
Goals
Course Learning Outcome | Instructional Methods | Assessment Methods |
---|---|---|
Mastery of linguistic constructs | Lectures, data annotation exercises, exams | Attending lectures and completing assignments |
Mastery of concepts in computational linguistics | Lectures, data annotation exercises | Take-home exams |
Mastery of natural language data annotation | Group annotation assignments | Submission of in-class annotation results and reflection assignments |
Mastery of computational linguistics tools | Lectures with live coding, course notebooks | Take-home exams |
Class/lecture structure
The course is held synchronously and it is expected that you will contribute to the course community. Lectures are presented as Jupyter notebooks with Python code. Documented engagement with the course material and with other students is critical for a fun and fulfilling experience in the classroom for everyone â even asking or answering âbasicâ questions advances our learning goals.
Assignments will be submitted through Brightspace.
Grade composition
Weight | Assignment |
---|---|
75% | Take-home exams (3) |
10% | Post-annotation surveys |
10% | Group annotation project and presentation |
5% | Attendance, participation, and clear communication about any relevant absences |
Grading Scales
I guarantee minimum grades â students are never curved down below the numeric grade they receive in the course. Depending on the distribution of scores, undergraduates and graduate students may be graded to a slightly different curve and some questions on assignments will be required for graduate students but bonus for undergraduate students. Here are the cutoffs for the grade categories:
Letter Grade | Percentage |
A | 96â100% |
Aâ | 90â96% |
B+ | 87â89% |
B | 83â86% |
Bâ | 80â82% |
C+ | 77â79% |
C | 73â76% |
Câ | 70â72% |
D+ | 67â69% |
D | 63â66% |
Dâ | 60â62% |
F | 0â59% |
Lecture and reading schedule
Week 1 (January 24 & 26) - Welcome!
- W: Course content, course structure, syllabus, ethics
- F: Discussion: The promise and pitfalls of ChatGPT
Week 2 (January 29, January 31, February 2) - Prerequisites
February 4: Computational linguistics pre-test (participation)
- Frequentist statistics
- Probability theory and Bayesâ rule
- Geometry
- Objects
- Functions
- Python idiosyncrasies
- Common NLP toolkits and comparisons
Week 3 (February 5, 7, 9) - Statistical language modeling 1
- What is a corpus? What are corpora?
- Tokenization
- Whitespace, Unicode, punctuation - What is a âwordâ?
- Computing n-gram statistics
- Readings: Chapter 2
Week 5 (February 19, 21, 23) - Tokenization and morphology
- Learning and producing morphological structure
- Hidden Markov Models
- If time: Finite state automata and finite state transducers
- Take-home exam: Tokenization - February 21 12:01 am to February 28 11:59 pm
- Readings: Appendix A
Week 6 (February 26, 28, March 1) - Syntax 1
- Using Doccano for annotation
- Parts-of-speech
- The Viterbi algorithm
- Readings: Chapter 8, Appendix A
Week 7 (March 4, 6, 8) - Syntax 2
- Challenges with parsers
- Dependency parsing
- Annotation (Due 11:59pm March 13): Classifying wordform mistakes
- Readings: Chapter 18, and Appendix D
Week 8 (March 11, 13, 15) - Syntax 3
- Shift reduce parsing
- PCFGs
- The CKY algorithm
- If time: Combinatory Categorial Grammar
- Readings: Chapter 17, Appendix C {Appendix E Optional}
- Week 9 (March 18, 20, 22) - Spring Break! No class.
Week 10 (March 25, 27, 29) - Lexical semantics 1
- Take home exam: Syntactic parses across genres - March 27 12:01am to April 3, 11:59pm
- Propositional representations of word meaning
- Semantic features and semantic knowledge
- WordNet
- Readings: Chapter 19 and Chapter 23 (old); Appendix F and Appendix G (new)
Week 12 (April 8, 10, 12) - Discourse and discourse structure
- April 8 - No class, eclipse day!
- Coreference resolution
- Discourse relations
- Annotated corpora and discourse parsers
- Readings: Chapter 26 and Chapter 27 (old); Chapter 22 and Chapter 23 (new)
- Week 13 (April 15, 17, 19) - NLP for low-resource languages
- Effect of typological properties on computational linguistics systems
- Using âhigh-resourceâ languages to boost low-resource performance
- Multilingual neural language models
- Annotation (Due 11:59pm April 24): Validating othersâ annotations
- Readings: Zoph, Yuret, May, and Knight (2016): https://aclanthology.org/D16-1163
Week 14 (April 22, 24, 26) - Evaluation metrics
- Generating text
- Computing performance
- Inter-annotator agreement
- Best practices for statistical NLP
- Readings: Dror et al. (2018): https://aclanthology.org/P18-1128/
- April 28: Computational linguistics post-test (Participation)
- Take-home exam: Computing inter-annotator agreement and validating clusters - May 1 12:01am to May 8 11:59pm
- Week 15 (April 29, May 1, May 3) - Student request and Team Project presentations
- Week 16 (May 6) - Team Project presentations
Group annotation project guidelines
You will be placed in a group of approximately even teams composed of at least one LIN 567 student, one CSE 567 student, and one undergraduate (467). With appropriate written consent of the instructor, this project may be used to fulfill the Capstone requirement for the MS in Computational Linguistics. Students in a team are expected to participate equitably to the best of their ability and must communicate with the Instructor about potential team conflicts.
Accessibility Services and Student Resources:
If you have a disability and may require some type of instructional and/or examination accommodation, please inform me early in the semester so that we can coordinate the accommodations you may need. If you have not already done so, please contact the Office of Accessibility Services (formerly the Office of Disability Services) University at Buffalo, 60 Capen Hall, Buffalo, NY 14260-1632; email: stu-accessibility@buffalo.edu Phone: 716-645-2608 (voice); 716-645-2616 (TTY); Fax: 716-645-3116; and on the web at http://www.buffalo.edu/studentlife/who-we-are/departments/accessibility.html. All information and documentation is confidential.
The University at Buffalo and the Graduate School of Education are committed to ensuring equal opportunity for persons with special needs to participate in and benefit from all of its programs, services and activities.
Academic Integrity:
Academic integrity is critical to the learning process. It is your responsibility as a student to complete your work in an honest fashion, upholding the expectations your individual instructors have for you in this regard. The ultimate goal is to ensure that you learn the content in your courses in accordance with UBâs academic integrity principles, regardless of whether instruction is in-person or remote. Thank you for upholding your own personal integrity and ensuring UBâs tradition of academic excellence.
It is expected that you will behave in an honorable and respectful way as you learn and share ideas. Therefore, recycled papers, work submitted to other courses, and major assistance in preparation of assignments without identifying and acknowledging such assistance are not acceptable. All work for this class must be original for this class. Please be familiar with the University and the School policies regarding plagiarism. Read the Academic Integrity Policy and Procedure for more information. Visit The Graduate School Policies & Procedures page (http://grad.buffalo.edu/succeed/current-students/policy-library.html) for the latest information.
Any use of generative AI (e.g., ChatGPT) is prohibited in this class and will be considered a violation of UBâs academic integrity policy. Details of what resources are allowed will be provided for each assignment. If you are unsure if a resource or tool is allowable, be sure to ask.
Course Evaluations:
You will have two opportunities to provide anonymous feedback about the course. In the middle of the semester, I will send you a brief questionnaire asking about what activities are contributing to your learning and what might be done to improve your learning. At the conclusion of the semester you will receive an email reminder requesting your participation in the Course Evaluation process. Please provide your honest feedback; it is important to the improvement and development of this course. Feedback received is anonymous and I do not receive copies of the Evaluations until after grades have been submitted for the semester.
Counseling Services:
As a student you may experience a range of issues that can cause barriers to learning or reduce your ability to participate in daily activities. These might include strained relationships, anxiety, high levels of stress, alcohol/drug problems, feeling down, health concerns, or unwanted sexual experiences. Counseling, Health Services and Health Promotion are here to help with these or other issues you may experience. You can learn more about these program and services by contacting:
Counseling Services
120 Richmond Quad (North Campus), 716-645-2720
202 Michael Hall (South Campus), 716-829-5900
https://www.buffalo.edu/studentlife/who-we-are/departments/counseling.html
Health Services
Michael Hall (South Campus), 716-829-3316
https://www.buffalo.edu/studentlife/who-we-are/departments/health.html
Office of Health Promotion
114 Student Union (North Campus), 716-645-2837
https://www.buffalo.edu/studentlife/who-we-are/departments/health-promotion.html
Sexual Harassment/Violence:
UB is committed to providing a safe learning environment free of all forms of discrimination and sexual harassment, including sexual assault, domestic and dating violence and stalking. If you have experienced gender-based violence (intimate partner violence, attempted or completed sexual assault, harassment, coercion, stalking, etc.), UB has resources to help. This includes academic accommodations, health and counseling services, housing accommodations, helping with legal protective orders, and assistance with reporting the incident to police or other UB officials if you so choose. Please contact UBâs Title IX Coordinator at 716-645-2266 for more information. For confidential assistance, you may also contact a Crisis Service Campus Advocate at 716-796-4399.
Please be aware UB faculty are mandated to report violence or harassment on the basis of sex or gender. This means that if you tell me about a situation, I will need to report it to the Office of Equity, Diversity and Inclusion. You will still have options about how the situation will be handled, including whether or not you wish to pursue a formal complaint. Please know that if you not wish to have UB proceed with an investigation, your request will be honored unless UBâs failure to act does not adequately mitigate the risk of harm to you or other members of the university community. You also have the option of speaking with trained counselors who can maintain confidentiality. UBâs Options for Confidentiality Disclosing Sexual Violence provides a full explanation of the resources available, as well as contact information. You may call UBâs Office of Equity, Diversity and Inclusion at 716-645-2266 for more information, and you have the option of calling that office anonymously if you would prefer not to disclose your identity.
Technology Recommendations
To effectively participate in this course, regardless of mode of instruction, the university recommends you have access to a Windows or Mac computer with webcam and broadband. Your best opportunity for success in the blended UB course delivery environment (in-person, hybrid and remote) will require these minimum capabilities.
You should spend some time following the instructions to install Doccano on your local machine:
Public health compliance in a classroom setting
UB student Behavioral Requirements in all Campus Public Spaces include:
- Should a student need to miss class due to illness, isolation or quarantine, they are required to notify their faculty to make arrangements to make up missed work.
- Students are responsible for following any additional directives in settings such as labs, clinical environments etc.