$ python3
Python 3.7 (default, Sep 16 2015, 09:25:04)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> the_world_is_flat = True
>>> if the_world_is_flat:
... print("Be careful not to fall off!")
...
Be careful not to fall off!
https://docs.python.org/3/tutorial/interpreter.html#interactive-mode
pip
to install packages$ python3 -m venv /path/to/venv
$ source /path/to/venv/bin/activate
$ pip install numpy
$ python3
>>> import numpy
>>> numpy.__version__
'1.16.1'
argparse
import argparse
parser = argparse.ArgumentParser(description="My program")
parser.add_argument("--verbose", action="store_true")
parser.add_argument("--cutoff", help="The cutoff value")
parser.add_argument("--input", help="Path to input file")
args = parser.parse_args()
if args.verbose:
print("Verbose option set")
$ python3 cli-advanced.py --verbose
Verbose option set
import argparse
parser = argparse.ArgumentParser(description="Parse data")
parser.add_argument("--input", help="Path to input file")
args = parser.parse_args()
data = []
with open(args.input, "r") as fin:
for line in fin:
cols = line.strip().split("\t")
record = {
"name": cols[1],
"chrom": cols[2],
}
data.append(record)
for rec in data:
print(rec["name"])
$ python3 parse-data.py --input refseq-genes.txt
name
NM_032291
NM_052998
NM_001080397
NM_013943
NM_032785
NM_018090
NM_001145278
NM_001145277
NM_001918
Here’s a short section of genomic DNA:
ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGATCGATC
GATCGATCGATCGATCGATCATGCTATCATCGATCGATATCGATGCATCGACTACTAT
It comprises two exons and an intron. The first exon runs from the start of the sequence to the sixty-third character, and the second exon runs from the ninety-first character to the end of the sequence. Write a program that will split the genomic DNA into coding and non-coding parts, and write these sequences to two separate files.
seq = 'ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCAT'
seq += 'GTAGCTACTCGATCGATCGATCGATCGATCGATCGATCG'
seq += 'ATCGATCATGCTATCATCGATCGATATCGATGCATCGAC'
seq += 'TACTAT'
exon1 = seq[0:62]
exon2 = seq[90:]
intron = seq[63:90]
with open('coding.txt', 'w') as coding:
coding.write(exon1.upper() + exon2.upper() + '\n')
with open('non-coding.txt', 'w') as non_coding:
non_coding.write(intron.lower() + '\n')
The file input.txt contains a number of DNA sequences, one per line. Each sequence starts with the same 14 base pair fragment - a sequencing adapter that should have been removed. Write a program that will (a) trim this adapter and write the cleaned sequences to a new file and (b) print the length of each sequence to the screen.
with open('input.txt') as fin, open('output.txt', 'w') as fout:
for line in fin:
# Strip out adapter
dna = line[14:]
# Print seq to file
fout.write(dna)
# Print len to screen
print(len(dna))
The file genomic_dna.txt contains a section of genomic DNA, and the file exons.txt contains a list of start/stop positions of exons. Each exon is on a separate line and the start and stop positions are separated by a comma. Write a program that will extract the exon segments, concatenate them, and write them to a new file.
Note: You can assume start/stop are 0 based
dna = ''
with open('genomic_dna.txt') as fin:
for line in fin:
dna += line.strip()
exons = ''
with open('exons.txt') as fin:
for line in fin:
start,stop = line.strip().split(',')
exons += dna[int(start):int(stop)]
with open('output.txt', 'w') as fout:
fout.write(exons + '\n')
Write a function that takes two arguments - a protein sequence and an amino acid residue code - and returns the percentage of the protein that the amino acid makes up. Use the following assertions to test your function:
assert my_function("MSRSLLLRFLLFLLLLPPLP", "M") == 5
assert my_function("MSRSLLLRFLLFLLLLPPLP", "r") == 10
assert my_function("MSRSLLLRFLLFLLLLPPLP", "L") == 50
assert my_function("MSRSLLLRFLLFLLLLPPLP", "Y") == 0
def amino_acid_pct(seq, residue):
pct = seq.upper().count(residue.upper()) / len(seq)
return round(pct * 100)
assert amino_acid_pct("MSRSLLLRFLLFLLLLPPLP", "M") == 5
assert amino_acid_pct("MSRSLLLRFLLFLLLLPPLP", "r") == 10
assert amino_acid_pct("MSRSLLLRFLLFLLLLPPLP", "L") == 50
assert amino_acid_pct("MSRSLLLRFLLFLLLLPPLP", "Y") == 0
Modify the function from part one so that it accepts a list of amino acid residues rather than a single one. If no list is given, the function should return the percentage of hydrophobic amino acid residues (A, I, L, M, F, W, Y and V). Your function should pass the following assertions:
assert my_function("MSRSLLLRFLLFLLLLPPLP", ["M"]) == 5
assert my_function("MSRSLLLRFLLFLLLLPPLP", ['M', 'L']) == 55
assert my_function("MSRSLLLRFLLFLLLLPPLP", ['F', 'S', 'L']) == 70
assert my_function("MSRSLLLRFLLFLLLLPPLP") == 65
def amino_acid_pct(seq, residue = None):
if residue is None:
residue = ['A', 'I', 'L', 'M', 'F', 'W', 'Y', 'V']
count = 0
for r in residue:
count += seq.upper().count(r.upper())
return round((count / len(seq)) * 100)
assert amino_acid_pct("MSRSLLLRFLLFLLLLPPLP", ["M"]) == 5
assert amino_acid_pct("MSRSLLLRFLLFLLLLPPLP", ['M', 'L']) == 55
assert amino_acid_pct("MSRSLLLRFLLFLLLLPPLP", ['F', 'S', 'L']) == 70
assert amino_acid_pct("MSRSLLLRFLLFLLLLPPLP") == 65
import sys
hw3due = "2021-03-02 20:00:00"
print(f"Homework 3 due: {hw3due}")
print("Goodbye!")
sys.exit(1)