Download this file, which is an sql database for movies. It contains information about the the movies as well as reviews for the movies.
Make sure to save the file to an appropriate folder: perhaps 'class14_files'
First, take a quick look at this documentation on using sqlite3 in python.
Next, create an empty database called movie_db.
import sqlite3
# connect to a database (or create it, if it doesn't already exist)
con = sqlite3.connect('movie_db.db')
#generate a 'live' cursor that can execute commands as if you were entering them in the sqlite terminal
cur = con.cursor()
If you just created an empty database, you can load an .sql datafile into it by reading and executing the string contained in the file in sqlite3.
First, you can load the .sql file into a string.
#run the following only once for your .db to load the .sql data into it
with open("movie_db.sql") as f:
sql_data = f.read()
print(sql_data[0:400],10)
Next, you can execute the string in sqlite, which parses the .sql file into a database. Note that the .sql file essentially just contains code for sql. In other words, you could enter the lines in the .sql file, line by line, into the sqlite> prompt, and it would create alse create the correct database. Theres's no need to do that here. We can exucute all the commands from the .sql file with one command:
cur.executescript(sql_data);
Now we can create and run queries using an easier interface.
cur.execute("SELECT name FROM sqlite_master WHERE type='table';")
What happened? You need to handle the output to view it. Use fetchall()
cur.execute("SELECT name FROM sqlite_master WHERE type='table';")
cur.fetchall()
As you can see. There are 3 tables. Let's use pandas to make the viewing of these tables much nicer!
import pandas
pandas.set_option('display.max_rows',6)
def sql_request(q):
cur.execute(q)
return pandas.DataFrame(cur.fetchall(),columns=[d[0] for d in cur.description]) # just using Pandas for nice tabular display of results, nothing more
Now we can just sql_request() to make request and have a nice display
sql_request("SELECT name FROM sqlite_master WHERE type='table';")
sql_request("SELECT * FROM Movie;")
sql_request("SELECT * FROM Reviewer;")
sql_request("SELECT * FROM Rating;")
Now let's consider a more difficult query:
query = "select * from movie,rating,reviewer on movie.movieID=rating.movieID;"
sql_request(query)
Note that the rater ids are not lined up. Lets now force this:
query = "select * from movie,rating,reviewer on movie.movieid=rating.movieid and rating.raterID=reviewer.raterID"
sql_request(query)
We can sort these results by year
query = "select * from movie,rating,reviewer on movie.movieid=rating.movieid and rating.raterID=reviewer.raterID order by year"
sql_request(query)
We can sort these results by descending year
query = "select * from movie,rating,reviewer on movie.movieid=rating.movieid and rating.raterID=reviewer.raterID order by year desc"
sql_request(query)
We can sort the results by year descending, and then by title:
query = "select * from movie,rating,reviewer on movie.movieid=rating.movieid and rating.raterid=reviewer.raterid order by year desc, title"
sql_request(query)
1. Write a query that returns movies released after 2015.
**Hint, you should find 6 of them.
query1 = .....
sql_request(query1)
2. Write a query that returns information about the movies and ratings in which the rating was 5 stars.
**Hint 1: You should find that there are 28 such ratings.
**Hint 2: Make sure the movie ids match.
query2 = .....
sql_request(query2)
3. Write a query that returns all movies related to Harry Potter, and order the results by title.
*Hint: You should find 8 of them.
query3 = ....
sql_request(query3)
4. Write a query that returns all movies related to Harry Potter, and order the results by year (in descending order).
query4 = ....
sql_request(query4)
5. Write a query that returns for only the Harry Potter movies, their (a) movie title, (b) rating in stars, and (c) reviewer name.
*Hint 1: The raterIDs and movieIDs should all match.
**Hint 2: Your request should return only 2 reviews.
***Hint 3: I recommend using ON and placing all constraints in parantheses ().
query5 = ...
sql_request(query5)