Nested Queries Wrapped in Python

Download this datafile, which we studied in class 13.

By wrapping the SQL in Python, we can easily construct nested queries. This allows us to construct queries that are as complicated as we like.

In [6]:
import pandas
pandas.set_option('display.max_rows',30)
def sql_request(q):
    cur.execute(q)
    return pandas.DataFrame(cur.fetchall(),columns=[d[0] for d in cur.description])  # just using Pandas for nice tabular display of results, nothing more
In [ ]:
import sqlite3
con = sqlite3.connect('pres2.db')# create database pres2.db (or connect to it)
cur = con.cursor()#generate a 'live' cursor 

df = pandas.read_csv("2016_US_County_Level_Presidential_Results.csv")
df.to_sql("pres", con, if_exists='append', index=False)
In [15]:
#sql_request("SELECT * FROM pres")

Lets aggregate all the data to be at the state level, and compute the fraction democrat and fraction republican

In [9]:
q1 = 'SELECT state_abbr,SUM(votes_dem)/SUM(total_votes) as frac_dem,SUM(votes_gop)/SUM(total_votes) as frac_gop FROM pres GROUP BY state_abbr' 
print(q1+';')
sql_request(q1+';')
SELECT state_abbr,SUM(votes_dem)/SUM(total_votes) as frac_dem,SUM(votes_gop)/SUM(total_votes) as frac_gop FROM pres GROUP BY state_abbr;
Out[9]:
state_abbr frac_dem frac_gop
0 AK 0.377159 0.528870
1 AL 0.345538 0.628884
2 AR 0.341624 0.611487
3 AZ 0.453871 0.495031
4 CA 0.615885 0.327598
5 CO 0.472746 0.443593
6 CT 0.544755 0.411610
7 DC 0.928466 0.041221
8 DE 0.533550 0.419226
9 FL 0.477881 0.490640
10 GA 0.455955 0.513361
11 HI 0.622228 0.300391
12 IA 0.421802 0.517813
13 ID 0.275599 0.591657
14 IL 0.554027 0.394133
... ... ... ...
36 OK 0.289298 0.653272
37 OR 0.516778 0.410548
38 PA 0.476491 0.487921
39 RI 0.555188 0.398606
40 SC 0.407528 0.548641
41 SD 0.317370 0.615330
42 TN 0.348981 0.610700
43 TX 0.434428 0.525830
44 UT 0.278376 0.458991
45 VA 0.498557 0.450261
46 VT 0.611431 0.326180
47 WA 0.544475 0.381725
48 WI 0.470095 0.477882
49 WV 0.264685 0.686501
50 WY 0.224928 0.700517

51 rows × 3 columns

Now, let's use this query to ask a more refined query: Which states voted demicrat?

In [176]:
q2 = 'SELECT * FROM (' + q1+ ') WHERE frac_dem>frac_gop ORDER BY frac_dem desc'
print(q2+';')
sql_request(q2+';')
SELECT * FROM (SELECT state_abbr,SUM(votes_dem)/SUM(total_votes) as frac_dem,SUM(votes_gop)/SUM(total_votes) as frac_gop FROM pres GROUP BY state_abbr) WHERE frac_dem>frac_gop ORDER BY frac_dem desc;
Out[176]:
state_abbr frac_dem frac_gop
0 DC 0.928466 0.041221
1 HI 0.622228 0.300391
2 CA 0.615885 0.327598
3 VT 0.611431 0.326180
4 MA 0.607999 0.335157
5 MD 0.605345 0.353053
6 NY 0.588103 0.374752
7 RI 0.555188 0.398606
8 IL 0.554027 0.394133
9 NJ 0.550154 0.417839
10 CT 0.544755 0.411610
11 WA 0.544475 0.381725
12 DE 0.533550 0.419226
13 OR 0.516778 0.410548
14 VA 0.498557 0.450261
15 NM 0.486159 0.403351
16 NV 0.478858 0.455319
17 ME 0.478556 0.451538
18 NH 0.476413 0.472678
19 CO 0.472746 0.443593
20 MN 0.468617 0.453603

And now for the republican states

In [13]:
q2 = 'SELECT * FROM (' + q1+ ') WHERE frac_dem<=frac_gop ORDER BY frac_gop desc'
print(q2+';')
sql_request(q2+';')
SELECT * FROM (SELECT state_abbr,SUM(votes_dem)/SUM(total_votes) as frac_dem,SUM(votes_gop)/SUM(total_votes) as frac_gop FROM pres GROUP BY state_abbr) WHERE frac_dem<=frac_gop ORDER BY frac_gop desc;
Out[13]:
state_abbr frac_dem frac_gop
0 WY 0.224928 0.700517
1 WV 0.264685 0.686501
2 OK 0.289298 0.653272
3 ND 0.277552 0.641405
4 AL 0.345538 0.628884
5 KY 0.327140 0.625809
6 SD 0.317370 0.615330
7 AR 0.341624 0.611487
8 TN 0.348981 0.610700
9 NE 0.339927 0.603024
10 ID 0.275599 0.591657
11 MS 0.397254 0.583375
12 LA 0.384437 0.580947
13 KS 0.361584 0.571863
14 IN 0.379112 0.571713
15 MO 0.380127 0.571422
16 MT 0.360898 0.566863
17 SC 0.407528 0.548641
18 AK 0.377159 0.528870
19 TX 0.434428 0.525830
20 OH 0.435085 0.520522
21 IA 0.421802 0.517813
22 GA 0.455955 0.513361
23 NC 0.467024 0.505372
24 AZ 0.453871 0.495031
25 FL 0.477881 0.490640
26 PA 0.476491 0.487921
27 WI 0.470095 0.477882
28 MI 0.473436 0.475860
29 UT 0.278376 0.458991

Note that both queries used q1 to be nested in q2.

When using an aggregation such as SUM, AVG, MIN or MAX with this nesting, you must use an alias for whatever gets aggregated.

Such as: SELECT SUM(some_alias) FROM (SELECT column1 AS some_alias FROM my_table)

Exercise

Construct a nested query for the dataset.

Share your query with the class.