Week 15: December 7 - 11
Computing With Text
We use Python's text processing functions to explore the distribution of words in some classic texts.
Biopython is a set of Python modules containing tools for computational molecular biology. We install Biopython and learn to use one such tool to identify a set of unknown genetic sequences.
Week 15 Notebook
- String processing - lower, replace, split, strip and join.
- Sorting sequences using sorted.
There will be no quiz this week.
Class Activity: Bioinformatics
- Use the tools available in Biopython to identify the origin of six real nucleotide sequences presented in class.
- Determine which nucleotide sequence is the fake one.
Assignment 11: Computing with Text
Activity: Word Frequency Analysis
- Download the text of Shakespeare's "Hamlet" from Project Gutenberg.
- Make a list of all the words in this text, ordered by frequency of use.
- Identify the most commonly used words.
- Make a pie chart of the words, showing their frequency.
- Plot the frequency of the words against their rank in frequency order, and comment on any patterns seen.
- Perform the word frequency analysis with another text of your choice.
- Generate a random text, and compare to the word frequency patterns seen for real texts.
Assignment: Report Compilation
How to create a master document
- Include all reports from the entire semester.
- Create one master document with each report inserted as a section.
- Title the document 337f15_yourname.odt.
- Start each report on a separate page.
- Add a title page.
- Add a table of contents.
- Export the document as a pdf.