Exercise 1 recap


Finally, we can wrap everything up in a function that can retrieve the price of any product:

In [2]:
import requests
def getprice(pid):
         ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'
         url = 'https://www.amazon.com/dp/'+pid
         s = requests.get(url, headers={'User-Agent':ua})
         pattern = '<span id="priceblock_ourprice" class="a-size-medium a-color-price">$'
         price = float( s.text.split(pattern)[-1].split('</span>')[0] )
         return price

getprice('B0027YPQEC')
Out[2]:
19.59

Exercise 2: More play with text


Download this list of English words: http://www.acsu.buffalo.edu/~danet/Sp18/MTH448/class1/class1_files/words.txt

Exercise 2a: Sort words by right-to-left alphabetical order

  • Slicing
  • Sorting
    • sorted(), sorted(,reverse=True), sorted(,key=)
  • string replace

Exercise 2b: List all the palindromes

Exercise 2c: List all the reversible words

Note that set-membership can be tested much faster than list membership.

Exercise 3: Analyze the first Presidential debate between Hilary Clinton and Donald Trump


The file is at

http://www.acsu.buffalo.edu/~danet/Sp18/MTH448/class2/class2_files/political_transcript.txt

Some questions to answer:

  • How much did each of them speak?
  • How big is the vocabulary of each?
  • Which words did each use most frequently?

Note: Might want to remove punctuation first.

Report 1 data set: History of first names in the US


  • Consider the National dataset for names given to babies in the US from 1890 through 2015, which is available at the US Social Security Administration page,

https://www.ssa.gov/oact/babynames/limits.html

Homework (due Saturday, Feb 10 by midnight) - Report 1 will be entail extracting something interesting from this data.

Reminder: Semester-long data-collection project


Think about what you'd like to do, and let me know next week.