Download this datafile, which contains results for the last presidential election (at the county level).
Launch sqlite in a terminal and open/create a database via
sqlite3 pres.db
Load the .csv via the following
sqlite> .mode csv
sqlite> .separator ,
sqlite> .import 2016_US_County_Level_Presidential_Results.csv pres
Enter the following commands in order, noting their effects.
sqlite> SELECT * FROM pres LIMIT 10;
sqlite> .mode column
sqlite> .header on
sqlite> SELECT * FROM pres LIMIT 10;
Try out the following commands, and comment on what you learn.
sqlite> SELECT * from pres WHERE state_abbr = NY;
sqlite> SELECT * from pres WHERE state_abbr = "NY";
sqlite> SELECT votes_dem from pres WHERE state_abbr = "NY";
sqlite> SELECT votes_dem,combined_fips from pres WHERE state_abbr = "NY";
You can use 'GROUP BY' to implement partial aggregations:
SELECT a,b,c,AGG(d) FROM table GROUP BY a,b;
For example, compare the following 2 queries:
sqlite> SELECT state_abbr,MAX(per_dem) from pres;
sqlite> SELECT state_abbr,MAX(per_dem) from pres GROUP BY state_abbr;
Which query above gives the maximum value of per_dem across all counties in the United States, and which gives the maximum of per_dem for each state?
You can create views, which essentially save your search queries as new tables. That is, given a huge database, you can use it to construct smaller tables (i.e., views) to help you search through the database and to help construct more complicated queries.
Consider the following:
sqlite> CREATE VIEW state_view AS SELECT state_abbr,SUM(votes_dem),SUM(votes_gop),SUM(total_votes) from pres GROUP BY state_abbr; sqlite> SELECT * FROM state_view;
Note that you can now access the table state_view as if it was a table in the database. To delete such a view, you can enter
DROP VIEW IF EXISTS state_view;
Now, quit sqlite by entering
.quit
Repoen the database using
sqlite3 pres.db
and enter the following commands:
sqlite> .separator ,
sqlite> .header on
sqlite> .output votes_and_fips.csv
sqlite> SELECT votes_dem,combined_fips from pres;
sqlite> .quit
What happened?
Reopen sqlite and try the following:
sqlite> SELECT COUNT(votes_dem) from pres where state_abbr = "NY";
1. Use COUNT to identify how many counties are in California (CA)
2. Use SUM to calulate the total number of votes in California.
3. Use MAX to calulate the maximum of 'per_gop' across counties in California.
4. Use the sum and group to create a file state_votes.csv that contains 2 columns: the sum of total_votes across a state, and the state abbreviations.