Functions, STL, and Data Structures

BCH 519
Spring 2021

Andrew E. Bruno
aebruno2@buffalo.edu

Topics Covered

Standard Library and Built-in Functions
User defined Functions
Lists and Dictionaries
Exercises and Solutions

The Python Standard Library and Built-in Functions

What is a function?

Python has many built-in functions


print	prints objects to stdout
len	returns the length (number of items)
reversed	Return a reverse iterator (ex. reverse a string)
float	convert a string or number to a floating point
str	convert an object to a string
range	retuns a sequence of integers
abs	return the absolute value of a number

https://docs.python.org/3/library/functions.html

What is a module?

A module is a collection of related functions/code
Modules are “imported” using the import statement
Python comes with an extensive collection of modules called the python standard library
Modules allow for the reuse and organization of code
Can also create your own custom modules and functions

The Python Standard Library

Python’s standard library is very extensive
Includes modules that provide standardized solutions for many problems that occur in everyday programming
https://docs.python.org/3/library/index.html
Examples [module].[function]:


string.format	format string
string.upper	uppercase string
string.strip	remove leading and trailing characters
random.randint	return a random integer

Exercise 1: Modules and functions

import random

seq = "ATGTAATCGGGTAC"
seq_len = len(seq)      # number of characters in string
seq_lower = seq.lower() # lower case all characters

print(f"Length: {seq_len}")
print(f"Lower case: {seq_lower}")

# Generate 5 random numbers between 0 and 100
for i in range(0, 5):
    rand_int = random.randint(0, 100)  # 0 <= rand_int <= 100
    print(f"{i}: {rand_int}")

Exercise 1: Output

Then run in the terminal by typing:

$ python3 ex1.py
Length: 14
Lower case: atgtaatcgggtac
0: 23
1: 84
2: 69
3: 55
4: 45

Exercise 1: What we learned

functions typically take an argument list and return one or more values
Example: random.randint(A, B)
len() returns the length in characters of a string
range(0, 5) returns a sequence of five ints: 0, 1, 2, 3, 4
import random imports the “random” module from the python standard library

Defining Functions

Functions allow us to reuse blocks of code

Exercise 2: Functions

def power(num, pw):
    result = num ** pw
    return result

sq = power(8, 2)
print(f"8 squared = {sq}")

cu = power(3, 3)
print(f"3 cubed = {cu}")

Exercise 2: Output

Then run in the terminal by typing:

$ python3 ex2.py
8 squared = 64
3 cubed = 27

Exercise 2: What we learned

def keyword is used to define your own functions
code under the def keyword needs to be indented 4 spaces
functions can be passed an argument list
values can be returned from a function using the return keyword

Data Structures

Lists and Dictionaries

Lists

Used to store a list of values
Defined using [] or ()
Indicies are 0 based
The empty list: `list = []
Simple list with one item:
- list = [32]
- list = ["DNA Replication"]

Exercise 3: Accessing List Values

nums = [10, 20, 30, 40, 50, 60]

first = nums[0]
second = nums[1]
last = nums[-1]

# string.join() takes in a list of strings
print(",".join([str(first), str(second), str(last)]))

# Extract a "slice". start at index 1 up to index 4
#    0   1   2   3   4   5
#  [10, 20, 30, 40, 50, 60]
#        *   *   * 
sub_list = nums[1:4]
for n in sub_list:
    print(n)

Exercise 3: Output

Then run in the terminal by typing:

$ python3 ex3.py
10,20,60
20
30
40

Exercise 3: What we learned

access items of a list using their index: nums[3]
negative index starts at end of list: last item nums[-1] second last item: nums[-2]
extract a sublist using the slice notation: nums[1:4]
Convert integers to strings using the str() function
lists are 0 based

index:        0   1   2   3   4   5 
list:       [10, 20, 30, 40, 50, 60]    len() = 6

Dictionaries

Used to store a list of key/value pairs
Defined using {}
The empty dict: dct = {}
Example:
- dct = {'key': 'value'}

Exercise 4: Accessing Dict items

fruit = {
    'oranges': 10,
    'grapes': 20,
}
# Add new key
fruit['pears'] = 30

keys = fruit.keys()      # fetch all keys as a list
values = fruit.values()  # fetch values keys as a list
print(','.join(keys))
total_grapes = fruit['grapes']
print(f"Total grapes: {total_grapes}")

for key in fruit:
    value = fruit[key]
    print(f"{key} = {value}")

Exercise 4: Output

Then run in the terminal by typing:

$ python3 ex4.py
pears,grapes,oranges
Total grapes: 20
pears = 30
grapes = 20
oranges = 10

Exercise 4: What we learned

dictionaries store a mapping of key/value pairs
Access values using key: fruit['grapes']
Fetch the list of all keys using: fruit.keys()
Iterate over keys: for key in fruit

Printing and manipulating text

Exercises and Solutions

Calculating AT content

Here’s a short DNA sequence:

ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT

Write a program that will print out the AT content of this DNA sequence.

Solution 1

seq = 'ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT'
print(seq.count('A') + seq.count('T'))

Solution 2

seq = 'ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT'
at_counter = 0
for base in seq:
    if base.lower() == 'a' or base.lower() == 't':
        at_counter = at_counter + 1

print(at_counter)

Solution 3

seq = 'ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT'
print(sum(1 for b in seq if b == 'A' or b == 'T'))

Complementing DNA

Here’s a short DNA sequence:

ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT

Write a program that will print the complement of this sequence.

Solution 1

seq = 'ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT'
comp = ''
for base in seq:
    if base.upper() == 'A':
        comp = comp + 'T'
    elif base.upper() == 'T':
        comp = comp + 'A'
    elif base.upper() == 'C':
        comp = comp + 'G'
    elif base.upper() == 'G':
        comp = comp + 'C'

print(comp)

Solution 2

seq = 'ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT'
base_map = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
comp = ''
for base in seq:
    comp = comp + base_map[base]

print(comp)

# or shorthand version
print(''.join(base_map[base] for base in seq))

Restriction fragment lengths

Here’s a short DNA sequence:

ACTGATCGATTACGTATAGTAGAATTCTATCATACATATATATCGATGCGTTCAT

The sequence contains a recognition site for the EcoRI restriction enzyme, which cuts at the motif G*AATTC (the position of the cut is indicated by an asterisk). Write a program which will calculate the size of the two fragments that will be produced when the DNA sequence is digested with EcoRI.

Solution 1

seq = 'ACTGATCGATTACGTATAGTAGAATTCTATCATACATATATATCGATGCGTTCAT'
index = seq.find('GAATTC')
print(f'Frag 1 size: {len(seq[0:index+1])}')
print(f'Frag 2 size: {len(seq[index+1:])}')

Solution 2

seq = 'ACTGATCGATTACGTATAGTAGAATTCTATCATACATATATATCGATGCGTTCAT'
frag1_size = seq.find('GAATTC') + 1
print(frag1_size)
print(len(seq) - frag1_size)

Splicing out introns, part one

Here’s a short DNA sequence:

ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTC
GATCGATCGATCGATCGATCGATCGATCGATCGATCATGCTATCATCGA
TCGATATCGATGCATCGACTACTAT

It comprises two exons and an intron. The first exon runs from the start of the sequence to the sixty-third character, and the second exon runs from the ninety-first character to the end of the sequence. Write a program that will print just the coding regions of the DNA sequence.

Solution 1

seq  = 'ATCGATCGATCGATCGACTGACTAGT'
seq += 'CATAGCTATGCATGTAGCTACTCGATCGATCGATCGA'
seq += 'TCGATCGATCGATCGATCGATCATGC'
seq += 'TATCATCGATCGATATCGATGCATCGACTACTAT'

exon1 = seq[0:62] 
exon2 = seq[90:] 
print(exon1 + exon2)

Splicing out introns, part two

Using the data from part one, write a program that will calculate what percentage of the DNA sequence is coding.

Solution 1

seq  = 'ATCGATCGATCGATCGACTGACTAGTCATAGCTA'
seq += 'TGCATGTAGCTACTCGATCGATCGATCGA'
seq += 'TCGATCGATCGATCGATCGATCATGCTAT'
seq += 'CATCGATCGATATCGATGCATCGACTACTAT'
print( ( len(seq[0:62]) + len(seq[90:]) ) / len(seq) )

Splicing out introns, part three

Using the data from part one, write a program that will print out the original genomic DNA sequence with coding bases in uppercase and non-coding bases in lowercase.

Solution 1

seq  = 'ATCGATCGATCGATCGACTGACTAGTCATAGC'
seq += 'TATGCATGTAGCTACTCGATCGATCGATCGA'
seq += 'TCGATCGATCGATCGATCGATCATGCTATCA'
seq += 'TCGATCGATATCGATGCATCGACTACTAT'
print( seq[0:62].upper() + seq[63:90].lower() + seq[90:].upper() )