JSON Exercise: The Chevy Cobalt

We consider data acquired using the API for the National Highway Traffic Safety Administration

http://www.nhtsa.gov

From Wikipedia article on Chevrolet Cobalt: "Faulty ignition switches in the Cobalts, which cut power to the car while in motion, were eventually linked to many crashes resulting in fatalities, starting with a teenager in 2005 who drove her new Cobalt into a tree. The switch continued to be used in the manufacture of the vehicles even after the problem was known to GM. On February 21, 2014, GM recalled over 700,000 Cobalts for issues traceable to the defective ignition switches. In May 2014 the NHTSA fined the company $35 million for failing to recall cars with faulty ignition switches for a decade, despite knowing there was a problem with the switches. Thirteen deaths were linked to the faulty switches during the time the company failed to recall the cars."

Let's have a look at the complaints made about the Chevy Cobalt:

http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/2005/make/chevrolet/model/cobalt?format=json

Goal: Identify if this problem was evident from the NHTSA complaint database long before the 2014 recall?

1. Let's first learn a bit about the data structure using JSON

In [43]:
import json
import requests

url = 'http://www.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/2005/make/chevrolet/model/cobalt?format=json'
s = requests.get(url).text
s[0:1000]
Out[43]:
'{"Count":1577,"Message":"Results returned successfully","Results":[{"ODINumber":10276900,"Manufacturer":"General Motors LLC","Crash":"No","Fire":"No","NumberOfInjured":0,"NumberOfDeaths":0,"DateofIncident":"\\/Date(1247198400000-0400)\\/","DateComplaintFiled":"\\/Date(1247630400000-0400)\\/","VIN":"1G1AK52F457","Component":"STEERING","Summary":"I CURRENTLY OWN A 2005 CHEVY COBALT, AND UP UNTIL NOW, I HAVE BEEN PRETTY SATISFIED.  HOWEVER, RECENTLY SINCE I HAVE HIT OVER 60,000 MILES, MY POWER STEERING LIGHT COMES ON AND THEN I LOSE CONTROL OF THE VEHICLE. THE WHEEL IS NEARLY IMPOSSIBLE TO TURN.  THIS HAS ALMOST CAUSED ME TO WRECK ON 4 DIFFERENT OCCASIONS.  WHEN I TURN THE CAR OFF AND THEN BACK ON, IT GOES AWAY FOR AWHILE, BUT THEN ALWAYS SEEMS TO LOCK BACK UP AGAIN.  YOU WOULD THINK THAT THE ONLY THING YOU WOULD NEED TO DO IS ADD POWER STEERING FLUID, BUT UNFORTUNATELY THIS CAR DOESN\'T HAVE THAT, IT IS ALL ELECTRICAL. I HAVE DONE SOME RESEARCH ONLINE AND HAVE NOTICED THAT I AM NOT THE ONLY O'
In [44]:
data = json.loads(s)
for dd in data:
    print(dd)   
Count
Message
Results
In [45]:
data['Count'] 
Out[45]:
1577
In [46]:
data['Message']
Out[46]:
'Results returned successfully'
In [42]:
print(len(data['Results']))
print(type(data['Results']))
1577
<class 'list'>
In [47]:
data['Results'][0]
Out[47]:
{'Component': 'STEERING',
 'Crash': 'No',
 'DateComplaintFiled': '/Date(1247630400000-0400)/',
 'DateofIncident': '/Date(1247198400000-0400)/',
 'Fire': 'No',
 'Make': 'CHEVROLET',
 'Manufacturer': 'General Motors LLC',
 'Model': 'COBALT',
 'ModelYear': '2005',
 'NumberOfDeaths': 0,
 'NumberOfInjured': 0,
 'ODINumber': 10276900,
 'ProductType': 'VEHICLE',
 'Summary': "I CURRENTLY OWN A 2005 CHEVY COBALT, AND UP UNTIL NOW, I HAVE BEEN PRETTY SATISFIED.  HOWEVER, RECENTLY SINCE I HAVE HIT OVER 60,000 MILES, MY POWER STEERING LIGHT COMES ON AND THEN I LOSE CONTROL OF THE VEHICLE. THE WHEEL IS NEARLY IMPOSSIBLE TO TURN.  THIS HAS ALMOST CAUSED ME TO WRECK ON 4 DIFFERENT OCCASIONS.  WHEN I TURN THE CAR OFF AND THEN BACK ON, IT GOES AWAY FOR AWHILE, BUT THEN ALWAYS SEEMS TO LOCK BACK UP AGAIN.  YOU WOULD THINK THAT THE ONLY THING YOU WOULD NEED TO DO IS ADD POWER STEERING FLUID, BUT UNFORTUNATELY THIS CAR DOESN'T HAVE THAT, IT IS ALL ELECTRICAL. I HAVE DONE SOME RESEARCH ONLINE AND HAVE NOTICED THAT I AM NOT THE ONLY ONE WITH THIS PROBLEM.  THERE ARE IN FACT HUNDREDS, IF NOT MORE WITH THE SAME PROBLEM OCCURRING.  THIS IS A VERY DANGEROUS AND COSTLY PROBLEM($700-$1500), AND OF COURSE I'M NOT COVERED UNDER THE WARRANTY ANYMORE, ALONG WITH ALMOST EVERYONE ELSE EXPERIENCING THIS PROBLEM.  I JUST DON'T UNDERSTAND HOW HEADLIGHTS CAN BE RECALLED, BUT SOMETHING AS SERIOUS AS YOUR STEERING NOT.  THIS IS A SERIOUS ISSUE THAT SHOULD BE ADDRESSED BEFORE SOMEONE LOSES THEIR LIFE OVER A FAULTY ELECTRIC POWER STEERING PROBLEM.  I HAVE CALLED A FEW DEALERSHIPS IN MY AREA AND EVERY SINGLE ONE OF THEM HAVE SAID THAT THERE HAVE BEEN MANY CASES EXACTLY LIKE MINE, IF THEY NOTICE THIS DEFECT, THEN WHY CAN'T THE PEOPLE WHO MAKE THE DECISION TO RECALL DO SOMETHING ABOUT IT. *TR",
 'VIN': '1G1AK52F457'}
In [48]:
data['Results'][0]['Summary']
Out[48]:
"I CURRENTLY OWN A 2005 CHEVY COBALT, AND UP UNTIL NOW, I HAVE BEEN PRETTY SATISFIED.  HOWEVER, RECENTLY SINCE I HAVE HIT OVER 60,000 MILES, MY POWER STEERING LIGHT COMES ON AND THEN I LOSE CONTROL OF THE VEHICLE. THE WHEEL IS NEARLY IMPOSSIBLE TO TURN.  THIS HAS ALMOST CAUSED ME TO WRECK ON 4 DIFFERENT OCCASIONS.  WHEN I TURN THE CAR OFF AND THEN BACK ON, IT GOES AWAY FOR AWHILE, BUT THEN ALWAYS SEEMS TO LOCK BACK UP AGAIN.  YOU WOULD THINK THAT THE ONLY THING YOU WOULD NEED TO DO IS ADD POWER STEERING FLUID, BUT UNFORTUNATELY THIS CAR DOESN'T HAVE THAT, IT IS ALL ELECTRICAL. I HAVE DONE SOME RESEARCH ONLINE AND HAVE NOTICED THAT I AM NOT THE ONLY ONE WITH THIS PROBLEM.  THERE ARE IN FACT HUNDREDS, IF NOT MORE WITH THE SAME PROBLEM OCCURRING.  THIS IS A VERY DANGEROUS AND COSTLY PROBLEM($700-$1500), AND OF COURSE I'M NOT COVERED UNDER THE WARRANTY ANYMORE, ALONG WITH ALMOST EVERYONE ELSE EXPERIENCING THIS PROBLEM.  I JUST DON'T UNDERSTAND HOW HEADLIGHTS CAN BE RECALLED, BUT SOMETHING AS SERIOUS AS YOUR STEERING NOT.  THIS IS A SERIOUS ISSUE THAT SHOULD BE ADDRESSED BEFORE SOMEONE LOSES THEIR LIFE OVER A FAULTY ELECTRIC POWER STEERING PROBLEM.  I HAVE CALLED A FEW DEALERSHIPS IN MY AREA AND EVERY SINGLE ONE OF THEM HAVE SAID THAT THERE HAVE BEEN MANY CASES EXACTLY LIKE MINE, IF THEY NOTICE THIS DEFECT, THEN WHY CAN'T THE PEOPLE WHO MAKE THE DECISION TO RECALL DO SOMETHING ABOUT IT. *TR"

What fraction of 2005 Chevy Cobalt complaints are about steering?

In [53]:
count = 0
string = 'STEER'
for c in data['Results']:
        if string in c['Summary']: count += 1
print(count ,'of',len(data['Results']),'complaints mentioned',string )
1167 of 1577 complaints mentioned STEER

How does that compare to Kia Optima

In [54]:
url = 'https://one.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/2005/make/kia/model/optima?format=json'
s = requests.get(url).text
import json
d = json.loads(s)
r = d['Results']
count = 0
string = 'STEER'
for c in r:
        if string in c['Summary']: count += 1
print(count ,'of',len(r),'complaints mentioned',string )
3 of 141 complaints mentioned STEER

2. Lets automate scraping the results data for a given: (make, model, year)

In [5]:
import requests

def complaintstring(year,make,model):
    u0 = 'https://one.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/'
    u1 = '/make/'
    u2 = '/model/'
    u3 = '?format=json'
    url = u0 + str(year) + u1 + make.lower() + u2 + model.lower() + u3
    return requests.get(url).text

3. Lets create a folder and download the data for a few vehicles and a range of years

In [6]:
# first make the cache folder for the data
import os
cache = 'nhtsa_cache'
if not os.path.exists(cache):
    os.mkdir(cache)

    
#next select a range of years and the vehicles to study    
y0,y1 = 2000,2016
# (make,model,firstyear,lastyear)
cars = [('chevrolet','cobalt',y0,y1),
        ('chevrolet','malibu',y0,y1),
        ('ford','fusion',y0,y1),
        ('honda','civic',y0,y1),
        ('hyundai','sonata',y0,y1),
        ('toyota','corolla',y0,y1)]

for car in cars:
    for year in range(car[2],car[3]+1): # why the +1?
        cache_name = cache + '/' + car[0] + '_'+ car[1] + '_' + str(year) + '.json'
        if not os.path.exists(cache_name):
            s = complaintstring(year,car[0],car[1])
            f = open(cache_name,'w')
            f.write(s)
            f.close()
        else:
            print( cache_name, 'already cached' )
nhtsa_cache/chevrolet_cobalt_2000.json already cached
nhtsa_cache/chevrolet_cobalt_2001.json already cached
nhtsa_cache/chevrolet_cobalt_2002.json already cached
nhtsa_cache/chevrolet_cobalt_2003.json already cached
nhtsa_cache/chevrolet_cobalt_2004.json already cached
nhtsa_cache/chevrolet_cobalt_2005.json already cached
nhtsa_cache/chevrolet_cobalt_2006.json already cached
nhtsa_cache/chevrolet_cobalt_2007.json already cached
nhtsa_cache/chevrolet_cobalt_2008.json already cached
nhtsa_cache/chevrolet_cobalt_2009.json already cached
nhtsa_cache/chevrolet_cobalt_2010.json already cached
nhtsa_cache/chevrolet_cobalt_2011.json already cached
nhtsa_cache/chevrolet_cobalt_2012.json already cached
nhtsa_cache/chevrolet_cobalt_2013.json already cached
nhtsa_cache/chevrolet_cobalt_2014.json already cached
nhtsa_cache/chevrolet_cobalt_2015.json already cached
nhtsa_cache/chevrolet_cobalt_2016.json already cached
nhtsa_cache/chevrolet_malibu_2000.json already cached
nhtsa_cache/chevrolet_malibu_2001.json already cached
nhtsa_cache/chevrolet_malibu_2002.json already cached
nhtsa_cache/chevrolet_malibu_2003.json already cached
nhtsa_cache/chevrolet_malibu_2004.json already cached
nhtsa_cache/chevrolet_malibu_2005.json already cached
nhtsa_cache/chevrolet_malibu_2006.json already cached
nhtsa_cache/chevrolet_malibu_2007.json already cached
nhtsa_cache/chevrolet_malibu_2008.json already cached
nhtsa_cache/chevrolet_malibu_2009.json already cached
nhtsa_cache/chevrolet_malibu_2010.json already cached
nhtsa_cache/chevrolet_malibu_2011.json already cached
nhtsa_cache/chevrolet_malibu_2012.json already cached
nhtsa_cache/chevrolet_malibu_2013.json already cached
nhtsa_cache/chevrolet_malibu_2014.json already cached
nhtsa_cache/chevrolet_malibu_2015.json already cached
nhtsa_cache/chevrolet_malibu_2016.json already cached
nhtsa_cache/ford_fusion_2000.json already cached
nhtsa_cache/ford_fusion_2001.json already cached
nhtsa_cache/ford_fusion_2002.json already cached
nhtsa_cache/ford_fusion_2003.json already cached
nhtsa_cache/ford_fusion_2004.json already cached
nhtsa_cache/ford_fusion_2005.json already cached
nhtsa_cache/ford_fusion_2006.json already cached
nhtsa_cache/ford_fusion_2007.json already cached
nhtsa_cache/ford_fusion_2008.json already cached
nhtsa_cache/ford_fusion_2009.json already cached
nhtsa_cache/ford_fusion_2010.json already cached
nhtsa_cache/ford_fusion_2011.json already cached
nhtsa_cache/ford_fusion_2012.json already cached
nhtsa_cache/ford_fusion_2013.json already cached
nhtsa_cache/ford_fusion_2014.json already cached
nhtsa_cache/ford_fusion_2015.json already cached
nhtsa_cache/ford_fusion_2016.json already cached
nhtsa_cache/honda_civic_2000.json already cached
nhtsa_cache/honda_civic_2001.json already cached
nhtsa_cache/honda_civic_2002.json already cached
nhtsa_cache/honda_civic_2003.json already cached
nhtsa_cache/honda_civic_2004.json already cached
nhtsa_cache/honda_civic_2005.json already cached
nhtsa_cache/honda_civic_2006.json already cached
nhtsa_cache/honda_civic_2007.json already cached
nhtsa_cache/honda_civic_2008.json already cached
nhtsa_cache/honda_civic_2009.json already cached
nhtsa_cache/honda_civic_2010.json already cached
nhtsa_cache/honda_civic_2011.json already cached
nhtsa_cache/honda_civic_2012.json already cached
nhtsa_cache/honda_civic_2013.json already cached
nhtsa_cache/honda_civic_2014.json already cached
nhtsa_cache/honda_civic_2015.json already cached
nhtsa_cache/honda_civic_2016.json already cached
nhtsa_cache/hyundai_sonata_2000.json already cached
nhtsa_cache/hyundai_sonata_2001.json already cached
nhtsa_cache/hyundai_sonata_2002.json already cached
nhtsa_cache/hyundai_sonata_2003.json already cached
nhtsa_cache/hyundai_sonata_2004.json already cached
nhtsa_cache/hyundai_sonata_2005.json already cached
nhtsa_cache/hyundai_sonata_2006.json already cached
nhtsa_cache/hyundai_sonata_2007.json already cached
nhtsa_cache/hyundai_sonata_2008.json already cached
nhtsa_cache/hyundai_sonata_2009.json already cached
nhtsa_cache/hyundai_sonata_2010.json already cached
nhtsa_cache/hyundai_sonata_2011.json already cached
nhtsa_cache/hyundai_sonata_2012.json already cached
nhtsa_cache/hyundai_sonata_2013.json already cached
nhtsa_cache/hyundai_sonata_2014.json already cached
nhtsa_cache/hyundai_sonata_2015.json already cached
nhtsa_cache/hyundai_sonata_2016.json already cached
nhtsa_cache/toyota_corolla_2000.json already cached
nhtsa_cache/toyota_corolla_2001.json already cached
nhtsa_cache/toyota_corolla_2002.json already cached
nhtsa_cache/toyota_corolla_2003.json already cached
nhtsa_cache/toyota_corolla_2004.json already cached
nhtsa_cache/toyota_corolla_2005.json already cached
nhtsa_cache/toyota_corolla_2006.json already cached
nhtsa_cache/toyota_corolla_2007.json already cached
nhtsa_cache/toyota_corolla_2008.json already cached
nhtsa_cache/toyota_corolla_2009.json already cached
nhtsa_cache/toyota_corolla_2010.json already cached
nhtsa_cache/toyota_corolla_2011.json already cached
nhtsa_cache/toyota_corolla_2012.json already cached
nhtsa_cache/toyota_corolla_2013.json already cached
nhtsa_cache/toyota_corolla_2014.json already cached
nhtsa_cache/toyota_corolla_2015.json already cached
nhtsa_cache/toyota_corolla_2016.json already cached
In [7]:
%pylab inline
Populating the interactive namespace from numpy and matplotlib

Now use JSON to process data['Results']['Summary]

We will see how many complaints mention the keyword 'steer'

In [56]:
import json

keyword = 'STEER'
nc = len(cars)

figure(figsize=(10,1.9*nc))
barwidth = 0.8
ec = 'k'#none'
lw = 0.5

for i,car in enumerate(cars):
    ax = subplot(nc,1,i+1)
    for year in range(car[2],car[3]+1):
        cachename = cache + '/' + car[0]+'_'+car[1]+'_'+str(year)+'.json'
        with open(cachename) as f:
            data = json.loads(f.read())
            n = data['Count']# number of complaints
            
            # count how many complaint summaries mention STEER for each car/year
            m = 0;
            for c in data['Results']:
                if (keyword in c['Summary']):
                    m = m+1                    
            #m = sum([keyword in c['Summary'] for c in data['Results']])

            if n>0: 
                #print(year,car[0],car[1],n,m,m/n)
                bar(year-barwidth/2,n-m,barwidth,color='#7777ff',edgecolor=ec,linewidth=lw)
                bar(year-barwidth/2,-m ,barwidth,color='#ff4444',edgecolor=ec,linewidth=lw)
        title(car[0].title()+' '+car[1].title(),x = 0.8,y=0.8)
        xlim(y0-barwidth,y1+barwidth)
        if i<nc-1: xticks([])
suptitle('NHTSA Complaints with those mentioning '+keyword+' below the axis in red')
savefig('nhtsa_'+keyword+'.png')    
savefig('nhtsa_'+keyword+'.svg')  

Exercise - find something interesting in the data

In [91]:
data['Results'][0]
Out[91]:
{'Component': 'SEATS | UNKNOWN OR OTHER',
 'Crash': 'No',
 'DateComplaintFiled': '/Date(1444795200000-0400)/',
 'DateofIncident': '/Date(1444795200000-0400)/',
 'Fire': 'No',
 'Make': 'TOYOTA',
 'Manufacturer': 'Toyota Motor Corporation',
 'Model': 'COROLLA',
 'ModelYear': '2016',
 'NumberOfDeaths': 0,
 'NumberOfInjured': 1,
 'ODINumber': 10782262,
 'ProductType': 'VEHICLE',
 'Summary': 'HEAD RESTRAINT/HEADREST IS SO EXTREMELY PAINFUL THAT IT HAS TO BE REMOVED OR REVERSED TO SAFELY DRIVE THE CAR.',
 'VIN': '5YFBURHE9GP'}
In [89]:
keyword = 'STEER'
make = 'kia'
model = 'optima'

for year in range(2000,2018+1):
    url = 'https://one.nhtsa.gov/webapi/api/Complaints/vehicle/modelyear/' + str(year) +'/make/' + make+'/model/'+model +'?format=json'
    s = requests.get(url).text
    import json
    d = json.loads(s)
    r = d['Results']
    count = 0

    for c in r:
        #print(c['Fire'])
        if 'Yes' in c['Fire']: 
            count += 1
    print(count ,'of',len(r),'complaints mentioned fire in ' + str(year) )
0 of 0 complaints mentioned fire in 2000
0 of 140 complaints mentioned fire in 2001
4 of 102 complaints mentioned fire in 2002
2 of 137 complaints mentioned fire in 2003
3 of 219 complaints mentioned fire in 2004
1 of 141 complaints mentioned fire in 2005
1 of 144 complaints mentioned fire in 2006
3 of 143 complaints mentioned fire in 2007
3 of 112 complaints mentioned fire in 2008
2 of 97 complaints mentioned fire in 2009
0 of 56 complaints mentioned fire in 2010
11 of 278 complaints mentioned fire in 2011
17 of 313 complaints mentioned fire in 2012
12 of 287 complaints mentioned fire in 2013
4 of 109 complaints mentioned fire in 2014
1 of 89 complaints mentioned fire in 2015
3 of 27 complaints mentioned fire in 2016
0 of 8 complaints mentioned fire in 2017
0 of 1 complaints mentioned fire in 2018
In [97]:
print(r[0]['DateofIncident'])
print(float(r[0]['DateofIncident'][6:16]))
/Date(1516424400000-0500)/
1516424400.0
In [94]:
import datetime 
r[0]['DateofIncident'][6:16]
datetime.datetime.fromtimestamp(float(r[0]['DateofIncident'][6:16]))
Out[94]:
datetime.datetime(2018, 1, 20, 0, 0)