Week 15: December 5 - 9

Bioinformatics

Biopython is a set of Python modules containing tools for computational molecular biology. We install Biopython and learn to use one such tool to identify a set of unknown genetic sequences.

Class Activity: Bioinformatics

Use the tools available in Biopython to identify the origin of six real nucleotide sequences presented in class. Determine which nucleotide sequence is the fake one. The code we will use for this is shown below.

def identify_sequence(seq_data):
    'Identify a genetic sequence'
    # Second (database) argument can also be "nt"
    results = NCBIWWW.qblast("blastn", "nr", seq_data, hitlist_size=2 )
    records = NCBIXML.parse(results)
    E_VALUE_THRESH = 0.04
    for record in records:
        for alignment in record.alignments:
            for hsp in alignment.hsps:
                if hsp.expect < E_VALUE_THRESH:
                    print('****Alignment****')
                    print('sequence:', alignment.title)
                    print('length:', alignment.length)
                    print('e value:', hsp.expect)
                    nshow = 95
                    if len(hsp.query)<=nshow:
                        print(hsp.query)
                        print(hsp.match)
                        print(hsp.sbjct)
                    else:
                        print(hsp.query[0:nshow-10] + '...' + hsp.query[-10:])
                        print(hsp.match[0:nshow-10] + '...' + hsp.match[-10:])
                        print(hsp.sbjct[0:nshow-10] + '...' + hsp.sbjct[-10:])

Quiz 12: More Plotting

  • loglog
  • semilogx
  • semilogy
  • polar
  • hist
  • contour
  • contourf
  • colorbar
  • axvline
  • axhline

Sample Quiz 12