Relational databases and SQL

A \textbf{relation} (a.k.a. table) is a set (strictly a 'multiset' or 'bag') of tuples with elements drawn respectively from a list of specified sets. For example, if we have sets R, S, T, then any collection of tuples {(ri, si, ti)}, where each ri ∈ R, si ∈ S, and ti ∈ T, is a relation on those sets.

Concrete example: two sets

name = {'Adhish', 'Anna', 'Daniel', 'Edward', 'Maggie', 'Sakar' }

age = ℤ ≥ 0

Relation [('Adhish', 21), ('Anna', 20), ('Sakar',20), ('Maggie', 19)]

It's convenient to display these in a table, and in fact in this context, a relation is often called a "table".

In [1]:
import pandas
df = pandas.DataFrame([('Adhish', 21), ('Anna', 20), ('Sakar',20), ('Maggie', 19)],columns=['name','age'])
df
Out[1]:
name age
0 Adhish 21
1 Anna 20
2 Sakar 20
3 Maggie 19

Relational DataBase Management Systems (RDBMSs)

SQL: a "backronym" for "Simple Query Language" or "Structured Query Language" (the latter being actually a whopping misrepresentation).

sqlite3 and some basic SQL commands

SQL commands end with a semicolon (;)

In [ ]:
sqlite> CREATE table ... ;
sqlite> INSERT into ... ;
sqlite> SELECT * FROM ... ;

sql commands ArE nOT cASe sEnsiTIVE.

sqlite3 metacommands are. They always start with a period.

In [ ]:
sqlite> .tables
sqlite> .mode column
sqlite> .headers on
sqlite> .read
sqlite> .import
sqlite> .mode csv
sqlite> .separator

Google the above commands to see what they do.

To quit/exit sqlite3, enter ctrl-d or .quit

Exercise 1a : Tiny Database

Click here to download a small database. Navigate to the folder containing the file and open it in a text editor. Familiarize yourself with the data structure. How many tables does it contain?

All sqlite3 commands will be entered into a bash terminal. The terminal is native to Linux and OSX. For windows, you have a few options.

Using sqlite3

Open a terminal and launch sqlite3 database:

In [ ]:
'sqlite3 tinydb.db'

You should see something like:

In [ ]:
SQLite version 3.8.11.1 2015-07-29 20:00:57
Enter ".help" for usage hints.
sqlite>

If not, then you may not have sqlite3 installed. In my case, sqlite3 installed automatically when I installed anaconda3. You may need to download/install sqlite3 for your operating system.

Load/read the tiny database by entering

In [ ]:
sqlite> .read tinydb.sql

(Note that 'sqlite>' should already be there. It is your prompt)

Try out the following commands:

In [ ]:
sqlite> .schema person
sqlite> SELECT * FROM person;
sqlite> SELECT name FROM person;
sqlite> .schema question
sqlite> SELECT askerid FROM question;
sqlite> SELECT DISTINCT askerid FROM question;

What do the following commands do?

In [ ]:
sqlite> SELECT q FROM question ON askerid=11;
sqlite> SELECT * FROM person ON name='Andrew';
sqlite> SELECT MIN(id) AS smallestID FROM Person;
sqlite> SELECT MAX(askerid) AS whatever_name_I_choose FROM question;
sqlite> SELECT name FROM person WHERE id > 11;
sqlite> SELECT name FROM person WHERE id != 11;
sqlite> SELECT name FROM person WHERE name LIKE '%i%';
sqlite> SELECT name FROM person WHERE name LIKE 'A%' and;
sqlite> SELECT name FROM person WHERE name LIKE 'A%w';

Exercise 1b : Tiny Database

Study the commands on this page.

Identify commands that select \textit{questions} with the following constraints:

  1. contain the word 'When'
  2. contain both 'When' and 'show'
  3. start with a 'W'
  4. don't start with a 'W'
  5. was asked by person id 11
  6. was asked by a person with id >11
  7. was asked by a person with id between 11 and 13

Make sure you know how to use the following commmands:

In [ ]:
select
from
limit
where
= < >
order by
desc
like '%foo%'
as
+-/*
count
max
min
in

Exercise 1b: Joins

Also, teach yourself how to merge databases. See the above link as well as this link.

What do the following do?

In [ ]:
sqlite> SELECT name, q FROM person, question WHERE person.id = question.askerid;
sqlite> SELECT name, q FROM person AS p JOIN question as qq WHERE p.id = qq.askerid;
sqlite> SELECT name, q FROM person NATURAL JOIN question;

Why did that last line above not work? (Hint: Compare the id name for the 2 tables: person and question.)

How would you modify the table 'question' so that the tables can easily join? Try it.

How does the join below differ from the above joins?

In [ ]:
sqlite> SELECT name, q FROM person LEFT JOIN question ON person.id = question.id;

What does the following find?

In [ ]:
sqlite> CREATE TABLE my_new_table AS SELECT name, q FROM person LEFT JOIN question ON person.id = question.id;

sqlite> SELECT name FROM my_new_table AS mnt WHERE mnt.q IS NULL;

sqlite> SELECT q FROM my_new_table WHERE my_new_table.name = 'Aishani';

Exercise 2: SQL Joins

Answer the questions found on this page

Write your answers in a notepad or text editor.

Exercise 3: Presidential Election Results

Download this datafile, which contains results for the last presidential election (at the county level). Launch sqlite and open the .csv via

In [ ]:
sqlite3 pres.db

Explore the following

In [ ]:
sqlite> .mode csv
sqlite> .separator ,
sqlite> .import 2016_US_County_Level_Presidential_Results.csv pres
sqlite> SELECT * FROM pres LIMIT 10;
sqlite> .mode column
sqlite> .header on
sqlite> SELECT * FROM pres LIMIT 10;
select * from pres where state_abbr = NY;
select * from pres where state_abbr = "NY";