.. Created by Adam Cunnningham on Mon May 30 2016. **Week 14: May 8 - 12** ======================= Computing With Text ------------------- We use Python's text processing functions to explore the probability distributions of words in some classic texts. Bioinformatics -------------- `Biopython `_ is a set of Python modules containing tools for computational molecular biology. We `install `_ Biopython and learn to use one such tool to identify a set of :download:`unknown genetic sequences <./sequences.txt>`. Week 14 Notebook ---------------- - `View online <./Week14Notebook.html>`_ - :download:`Download <../Lessons/Week14/Week14Notebook.ipynb>` Python ------ - String formatting using **format**. Class Activity: Bioinformatics ------------------------------ Use the tools available in Biopython to identify the origin of six real nucleotide sequences presented in class. Determine which nucleotide sequence is the fake one. The code we will use for this is shown below. .. code-block:: python def identify_sequence(seq_data): 'Identify a genetic sequence' # Second (database) argument can also be "nt" results = NCBIWWW.qblast("blastn", "nr", seq_data, hitlist_size=2 ) records = NCBIXML.parse(results) E_VALUE_THRESH = 0.04 for record in records: for alignment in record.alignments: for hsp in alignment.hsps: if hsp.expect < E_VALUE_THRESH: print('****Alignment****') print('sequence:', alignment.title) print('length:', alignment.length) print('e value:', hsp.expect) nshow = 95 if len(hsp.query)<=nshow: print(hsp.query) print(hsp.match) print(hsp.sbjct) else: print(hsp.query[0:nshow-10] + '...' + hsp.query[-10:]) print(hsp.match[0:nshow-10] + '...' + hsp.match[-10:]) print(hsp.sbjct[0:nshow-10] + '...' + hsp.sbjct[-10:]) Report 7 --------- :doc:`language_models`