CLI Arguments, File I/O, Modules

BCH 519
Spring 2019

Andrew E. Bruno
aebruno2@buffalo.edu

Topics Covered

  • Command line arguments
  • File I/O
  • Modules, PyPI
  • Debugging

Command Line Arguments

Passing arguments to python script

  • Your python script is run from the command line
  • Recall the basic form of a unix command:
    • command options arguments
  • You can pass “arguments” to your python script just like any other unix command
  • Allows you to pass input or “data” into your programs
  • Example:
python myscript.py arg1 arg2 arg3 ..

The special list: sys.argv

  • Python sys module has a number of special pre-defined variables
  • sys.argv is a special list containing the command line arguments to your script
  • For the more info:

https://docs.python.org/3/library/sys.html

Exercise 1: echo in python

# sys.argv is a list of command line arguments passed to 
# your script. Here, we simply print or "echo" the first 
# argument that was passed to our script.
#
# Note: sys.argv[0] is the name of the script. 
# The first argument starts at: sys.argv[1]
import sys

print(sys.argv[1])

run in the terminal:

$ python ex1.pl Hello
Hello
$ python ex1.pl Hello World
Hello
$ python ex1.pl "Hello World"
Hello World

Exercise 1: What we learned

  • sys.argv is a list of command line arguments passed to your script
  • sys.argv[1] The arguments start at index 1. Index 0 is the name of the script
  • What happens when you run ex1.py with no arguments? How can you fix this?

Exercise 2: add two numbers

import sys

# Good to always validate any input to your program
if len(sys.argv) != 3:
    print "Usage: python ex16.py [num1] [num2]"
    sys.exit(1)

num1 = sys.argv[1]
num2 = sys.argv[2]
total = int(num1) + int(num2)

print "Sum: {}".format(total)

run in the terminal:

$ python ex2.py 2 15
Sum: 17

Exercise 2: What we learned

  • Good practice to always validate any input to your programs
  • sys.exit function exits program immediately
  • What happens when you run ex2.py with the following input? How can you fix this?
$ python ex2.py bob bill

File I/O

Files

  • Data is typically stored in files
  • We can write programs to process and mainpulate data stored in files
  • Three basic file operations: read, write, append

File Objects

  • Files objects contain functions for working with files
  • Special file objects: sys.stdout and sys.stdin
  • More details:

https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files

sys.stdout = Standard Output

  • We’ve actually been working with this file object all along
  • Standard output refers to the output of a program (what get’s printed to the screen)
  • Python has a file object called sys.stdout
  • print statement “writes” to a file object, by default sys.stdout
import sys

print("Hello World")

# Is the same as 

print(sys.stdout, "Hello World")

Opening and closes files

  • File objects are created in a “mode” using open function:
    • 'r' = read
    • 'w' = write
    • 'a' = append
  • When your done, close your file using file.close:
# Open file for reading
fin = open('input.txt', 'r')

# Open file for writing
fout = open('input.txt', 'w')

fin.close()
fout.close()

Writing to Files

  • Open a file object in write mode: 'w'
  • Use the file.write function to write to the file object
fout = open('output.txt', 'w')
fout.write("Hello World\n")
fout.close()

Reading Files

  • Open a file object in read mode: 'r'
  • Good practice to use with keyword. Will properly close files for you
  • File objects in python are iterators

Reading Files Example Code

with open('input.txt', 'r') as fin:
    # Read entire file
    contents = fin.read()

    # Read all lines
    lines = fin.readlines()

    # Read single line
    line = fin.readline()

    # Loop through lines of file
    for line in fin:
        # Force print to suppress newline end=''
        print(line, end='')

sys.in = Standard Input

  • Standard input refers to the data going into a program (data provided as input)
  • Python uses a file object called sys.stdin

    import sys
    
    # Read line from STDIN
    line = sys.stdin.readline()
    print(line, end='')
  • Example program execution, ‘|’ pipe will send the output of the echo command as input into our python script:

    $ echo "Hello World" | python test.py
    Hello World

Typical I/O Scenario

  • Get command line arguments
  • Open files for reading and/or writing
  • Read data and process
  • Write output
  • Close files

Exercise 3: cat in python

import sys

if len(sys.argv) != 2:
    print("Usage: python ex3.py [filename]")
    sys.exit(1)

filename = sys.argv[1]
with open(filename, 'r') as fin:
    for line in fin:
        print(line, end='')

run in the terminal:

$ echo "Hello World" > test-file.txt
$ cat test-file.txt
Hello World
$ python ex3.py test-file.txt
Hello World

Exercise 4: grep in python

import sys

if len(sys.argv) != 3:
    print("Usage: python ex4.py [pattern] [filename]")
    sys.exit(1)

pattern = sys.argv[1]
filename = sys.argv[2]
with open(filename, 'r') as fin:
    for line in fin:
        if pattern in line:
            print(line.strip())

Exercise 4: grep in python output

Run in the terminal:

$ python ex4.py NM_001080142 refseq-genes.txt 
187     NM_001080142    chrX    -       120011344
1501    NM_001080142    chrX    -       120106600
1501    NM_001080142    chrX    -       120101740
1501    NM_001080142    chrX    -       120111460
1501    NM_001080142    chrX    -       120067694
1501    NM_001080142    chrX    -       120072555
1501    NM_001080142    chrX    -       120077415
1501    NM_001080142    chrX    -       120082276
1501    NM_001080142    chrX    -       120096880

# Test using refseq-genes.txt from HW1
$ python ex4.py NM_001080142 refseq-genes.txt | wc -l
9
$ grep NM_001080142 refseq-genes.txt | wc -l
9

Exercise 3 & 4: What we learned

  • sys.argv contains the command line arguments
  • with closes the open file object automatically
  • print(line, end='') will ommit printing a newline
  • string.strip() will remove leading and trailing whitespace
  • We can re-write linux commands in Python!!

Modules

What is a Module?

  • A set of related functions in a “library” file
  • Designed to be reusable by other modules or programs
  • Code reuse! Repeating same code is bad

Simple Python module example

File: mathfunc.py

greeting = 'Hello World'
salutation = 'Goodbye'

def power(num, pow):
    return num ** pow

How to use Python modules?

File: square.py

import mathfunc

sq = mathfunc.power(8, 2)
print(mathfunc.greeting)
print("8 squared = {}".format(sq))
print(mathfunc.salutation)

PyPI

  • The Python Package Index
  • A.K.A “the cheese shop”
  • Contains over 40K Python modules
  • https://pypi.org/

Example: Biopython

  • https://biopython.org
  • Set of freely available tools for biological computation written in Python by an international team of developers
  • Collection of many Python modules and related documenation

Parsing FASTA files

  • Sequencing data is often stored in a simple text based format called FASTA
  • Begins with a single line description, followed by lines of sequencing data
  • Instead of re-inventing the wheel and writing our own FASTA parser. Let’s use a module from Biopython

Exercise 5: Parsing FASTA files with Biopython

from Bio import SeqIO

for rec in SeqIO.parse("mirna-targets.fasta", "fasta"):
    print(rec.id)
    print("{} {}".format(len(rec), rec.seq))

Run in the terminal (on vortex):

$ module load biopython
$ python ex5.py
hg19_targetScanS_SAMD11:miR-504
8 CAGGGTCA
...

Debugging - pprint

  • Excellent python module for debugging your code
  • “Data pretty printer”
  • Allows you to “dump” the contents of a variable
  • Example: dump the value of an entire hash

Exercise 6: pprint

import pprint

data = [
    {
        'id': 124,
        'name': 'miR-245',
        'chrom': 'chr10'
    },
    {
        'id': 234,
        'name': 'miR-201',
        'chrom': 'chr11'
    },
]

pprint.pprint(data)

Homework #3

Due: 2019-02-26 09:00:00