## Topics Covered

• Standard Library and Built-in Functions
• User defined Functions
• Lists and Dictionaries
• Exercises and Solutions

## What is a function? #### Python has many built-in functions

print prints objects to stdout
len returns the length (number of items)
reversed Return a reverse iterator (ex. reverse a string)
float convert a string or number to a floating point
str convert an object to a string
range retuns a sequence of integers
abs return the absolute value of a number

https://docs.python.org/3/library/functions.html

## What is a module?

• A module is a collection of related functions/code
• Modules are “imported” using the `import` statement
• Python comes with an extensive collection of modules called the python standard library
• Modules allow for the reuse and organization of code
• Can also create your own custom modules and functions

## The Python Standard Library

• Python’s standard library is very extensive
• Includes modules that provide standardized solutions for many problems that occur in everyday programming
• https://docs.python.org/3/library/index.html
• Examples `[module].[function]`:
string.format format string
string.upper uppercase string
string.strip remove leading and trailing characters
random.randint return a random integer

## Exercise 1: Modules and functions

``````import random

seq = "ATGTAATCGGGTAC"
seq_len = len(seq)      # number of characters in string
seq_lower = seq.lower() # lower case all characters

print("Length: {}".format(seq_len))
print("Lower case: {}".format(seq_lower))

# Generate 5 random numbers between 0 and 100
for i in range(0, 5):
rand_int = random.randint(0, 100)  # 0 <= rand_int <= 100
print("{}: {}".format(i, rand_int))``````

## Exercise 1: Output

#### Then run in the terminal by typing:

``````\$ python ex1.py
Length: 14
Lower case: atgtaatcgggtac
0: 23
1: 84
2: 69
3: 55
4: 45``````

## Exercise 1: What we learned

• functions typically take an argument list and return one or more values
• Example: `random.randint(A, B)`
• `len()` returns the length in characters of a string
• `range(0, 5)` returns a sequence of five ints: 0, 1, 2, 3, 4
• `import random` imports the “random” module from the python standard library

## Exercise 2: Functions

``````def power(num, pw):
result = num ** pw
return result

sq = power(8, 2)
print("8 squared = {}".format(sq))

cu = power(3, 3)
print("3 cubed = {}".format(cu))``````

## Exercise 2: Output

#### Then run in the terminal by typing:

``````\$ python ex2.py
8 squared = 64
3 cubed = 27``````

## Exercise 2: What we learned

• `def` keyword is used to define your own functions
• code under the `def` keyword needs to be indented 4 spaces
• functions can be passed an argument list
• values can be returned from a function using the `return` keyword

## Lists

• Used to store a list of values
• Defined using `[]` or `()`
• Indicies are 0 based
• The empty list: `list = []
• Simple list with one item:
• `list = `
• `list = ["DNA Replication"]`

## Exercise 3: Accessing List Values

``````nums = [10, 20, 30, 40, 50, 60]

first = nums
second = nums
last = nums[-1]

# string.join() takes in a list of strings
print(",".join([str(first), str(second), str(last)]))

# Extract a "slice". start at index 1 up to index 4
#    0   1   2   3   4   5
#  [10, 20, 30, 40, 50, 60]
#        *   *   *
sub_list = nums[1:4]
for n in sub_list:
print(n)``````

## Exercise 3: Output

#### Then run in the terminal by typing:

``````\$ python ex3.py
10,20,60
20
30
40``````

## Exercise 3: What we learned

• access items of a list using their index: `nums`
• negative index starts at end of list: last item `nums[-1]` second last item: `nums[-2]`
• extract a sublist using the slice notation: `nums[1:4]`
• Convert integers to strings using the `str()` function
• lists are 0 based
``````index:        0   1   2   3   4   5
list:       [10, 20, 30, 40, 50, 60]    len() = 6``````

## Dictionaries

• Used to store a list of key/value pairs
• Defined using `{}`
• The empty dict: `dct = {}`
• Example:
• `dct = {'key': 'value'}`

## Exercise 4: Accessing Dict items

``````fruit = {
'oranges': 10,
'grapes': 20,
}
# Add new key
fruit['pears'] = 30

keys = fruit.keys()      # fetch all keys as a list
values = fruit.values()  # fetch values keys as a list
print(','.join(keys))
total_grapes = fruit['grapes']
print("Total grapes: {}".format(total_grapes))

for key in fruit:
value = fruit[key]
print("{} = {}".format(key, value))``````

## Exercise 4: Output

#### Then run in the terminal by typing:

``````\$ python ex4.py
pears,grapes,oranges
Total grapes: 20
pears = 30
grapes = 20
oranges = 10``````

## Exercise 4: What we learned

• dictionaries store a mapping of key/value pairs
• Access values using key: `fruit['grapes']`
• Fetch the list of all keys using: `fruit.keys()`
• Iterate over keys: `for key in fruit`

## Calculating AT content

Here’s a short DNA sequence:

``ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT``

Write a program that will print out the AT content of this DNA sequence.

#### Solution 1

``````seq = 'ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT'
print(seq.count('A') + seq.count('T'))``````

#### Solution 2

``````seq = 'ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT'
at_counter = 0
for base in seq:
if base.lower() == 'a' or base.lower() == 't':
at_counter = at_counter + 1

print(at_counter)``````

#### Solution 3

``````seq = 'ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT'
print(sum(1 for b in seq if b == 'A' or b == 'T'))``````

## Complementing DNA

Here’s a short DNA sequence:

``ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT``

Write a program that will print the complement of this sequence.

#### Solution 1

``````seq = 'ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT'
comp = ''
for base in seq:
if base.upper() == 'A':
comp = comp + 'T'
elif base.upper() == 'T':
comp = comp + 'A'
elif base.upper() == 'C':
comp = comp + 'G'
elif base.upper() == 'G':
comp = comp + 'C'

print(comp)``````

#### Solution 2

``````seq = 'ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT'
base_map = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
comp = ''
for base in seq:
comp = comp + base_map[base]

print(comp)

# or shorthand version
print(''.join(base_map[base] for base in seq))``````

## Restriction fragment lengths

Here’s a short DNA sequence:

``ACTGATCGATTACGTATAGTAGAATTCTATCATACATATATATCGATGCGTTCAT``

The sequence contains a recognition site for the EcoRI restriction enzyme, which cuts at the motif `G*AATTC` (the position of the cut is indicated by an asterisk). Write a program which will calculate the size of the two fragments that will be produced when the DNA sequence is digested with EcoRI.

#### Solution 1

``````seq = 'ACTGATCGATTACGTATAGTAGAATTCTATCATACATATATATCGATGCGTTCAT'
index = seq.find('GAATTC')
print('Frag 1 size: {}'.format(len(seq[0:index+1])))
print('Frag 2 size: {}'.format(len(seq[index+1:])))``````

#### Solution 2

``````seq = 'ACTGATCGATTACGTATAGTAGAATTCTATCATACATATATATCGATGCGTTCAT'
frag1_size = seq.find('GAATTC') + 1
print(frag1_size)
print(len(seq) - frag1_size)``````

## Splicing out introns, part one

Here’s a short DNA sequence:

``````ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTC
GATCGATCGATCGATCGATCGATCGATCGATCGATCATGCTATCATCGA
TCGATATCGATGCATCGACTACTAT``````

It comprises two exons and an intron. The first exon runs from the start of the sequence to the sixty-third character, and the second exon runs from the ninety-first character to the end of the sequence. Write a program that will print just the coding regions of the DNA sequence.

#### Solution 1

``````seq  = 'ATCGATCGATCGATCGACTGACTAGT'
seq += 'CATAGCTATGCATGTAGCTACTCGATCGATCGATCGA'
seq += 'TCGATCGATCGATCGATCGATCATGC'
seq += 'TATCATCGATCGATATCGATGCATCGACTACTAT'

exon1 = seq[0:62]
exon2 = seq[90:]
print(exon1 + exon2)``````

## Splicing out introns, part two

Using the data from part one, write a program that will calculate what percentage of the DNA sequence is coding.

#### Solution 1

``````# Only need this for python 2
from __future__ import division

seq  = 'ATCGATCGATCGATCGACTGACTAGTCATAGCTA'
seq += 'TGCATGTAGCTACTCGATCGATCGATCGA'
seq += 'TCGATCGATCGATCGATCGATCATGCTAT'
seq += 'CATCGATCGATATCGATGCATCGACTACTAT'
print( ( len(seq[0:62]) + len(seq[90:]) ) / len(seq) )``````

## Splicing out introns, part three

Using the data from part one, write a program that will print out the original genomic DNA sequence with coding bases in uppercase and non-coding bases in lowercase.

#### Solution 1

``````seq  = 'ATCGATCGATCGATCGACTGACTAGTCATAGC'
seq += 'TATGCATGTAGCTACTCGATCGATCGATCGA'
seq += 'TCGATCGATCGATCGATCGATCATGCTATCA'
seq += 'TCGATCGATATCGATGCATCGACTACTAT'
print( seq[0:62].upper() + seq[63:90].lower() + seq[90:].upper() )``````