Intro to Pandas 1¶

https://pandas.pydata.org/

import pandas

Download the following datasets:¶

airport data (see here for info)

fruit sales data

SPARCS data (see here for info)

and download this script

Ways to load data into pandas¶

Load from clip board¶

df = pandas.read_clipboard()
df

Load from csv¶

ddf = pandas.read_csv('airports.csv')
ddf.head()

From an excel file¶

Download foo excel spreadsheet from here

df = pandas.read_excel('foo.xlsx') df

fillna()¶

Exercise - fix airport spreadsheet by reaplacing NA Exercise: fix the airports.csv dataframe by replacing all the nulls by "NA". (Pandas interprets NA for North America as "Not Available" :-/ )

ddf = ddf.fillna('NA')
ddf.head()

Load from a Numpy array¶

from numpy import *
a = vander([3,4,5,6,7])
df = pandas.DataFrame(a)
df

Columns and index¶

On construction of a dataframe, pandas will provide labels for the rows and columns, as seen above.

But we can change them if we like:

df.columns=['aa','b','c','d','z']
df

(and df.index = whatever, for the rows).

Accessing columns¶

Access a column like this:

df['aa']

0      81
1     256
2     625
3    1296
4    2401
Name: aa, dtype: int64

Or a subset of the columns:

df[['aa','z']]

The index of the dataframe labels the rows:

df.index

RangeIndex(start=0, stop=5, step=1)

df.index=['Maggie','Edward','Sanjeevani','Michael','Robert']
df

Accessing rows: loc, iloc, ix¶

loc, iloc, ix are three accessors

loc provides access by labels:

df.loc['Sanjeevani']

aa    625
b     125
c      25
d       5
z       1
Name: Sanjeevani, dtype: int64

iloc provides access by row number:

df.iloc[2]

aa    625
b     125
c      25
d       5
z       1
Name: Sanjeevani, dtype: int64

Labels could be integers:

df.index=['Maggie',4,'Sanjeevani','Michael','Robert']
df

Then

df.loc[4]  # gives row with label 4

aa    256
b      64
c      16
d       4
z       1
Name: 4, dtype: int64

ix provides access by either label or row number. If a row has an integer label, i, and we ask ix for row i, do we get row with label i? Or row number i?

df.ix[4]

/Users/drt/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.

aa    2401
b      343
c       49
d        7
z        1
Name: Robert, dtype: int64

Answer row number i.

These indexers also support slicing. Beware that unlike every other start:stop slicing in Python, "stop" is included:

df.loc['Maggie':'Michael']

Examples of Pandas¶

fruit_sales

airports

SPARC

	id	ident	type	name	latitude_deg	longitude_deg	elevation_ft	continent	iso_country	iso_region	municipality	scheduled_service	gps_code	iata_code	local_code	home_link	wikipedia_link	keywords
0	6523	00A	heliport	Total Rf Heliport	40.070801	-74.933601	11.0	NaN	US	US-PA	Bensalem	no	00A	NaN	00A	NaN	NaN	NaN
1	323361	00AA	small_airport	Aero B Ranch Airport	38.704022	-101.473911	3435.0	NaN	US	US-KS	Leoti	no	00AA	NaN	00AA	NaN	NaN	NaN
2	6524	00AK	small_airport	Lowell Field	59.949200	-151.695999	450.0	NaN	US	US-AK	Anchor Point	no	00AK	NaN	00AK	NaN	NaN	NaN
3	6525	00AL	small_airport	Epps Airpark	34.864799	-86.770302	820.0	NaN	US	US-AL	Harvest	no	00AL	NaN	00AL	NaN	NaN	NaN
4	6526	00AR	closed	Newport Hospital & Clinic Heliport	35.608700	-91.254898	237.0	NaN	US	US-AR	Newport	no	NaN	NaN	NaN	NaN	NaN	00AR

	id	ident	type	name	latitude_deg	longitude_deg	elevation_ft	continent	iso_country	iso_region	municipality	scheduled_service	gps_code	iata_code	local_code	home_link	wikipedia_link	keywords
0	6523	00A	heliport	Total Rf Heliport	40.070801	-74.933601	11	NA	US	US-PA	Bensalem	no	00A	NA	00A	NA	NA	NA
1	323361	00AA	small_airport	Aero B Ranch Airport	38.704022	-101.473911	3435	NA	US	US-KS	Leoti	no	00AA	NA	00AA	NA	NA	NA
2	6524	00AK	small_airport	Lowell Field	59.949200	-151.695999	450	NA	US	US-AK	Anchor Point	no	00AK	NA	00AK	NA	NA	NA
3	6525	00AL	small_airport	Epps Airpark	34.864799	-86.770302	820	NA	US	US-AL	Harvest	no	00AL	NA	00AL	NA	NA	NA
4	6526	00AR	closed	Newport Hospital & Clinic Heliport	35.608700	-91.254898	237	NA	US	US-AR	Newport	no	NA	NA	NA	NA	NA	00AR

	aa	b	c	d	z
Maggie	81	27	9	3	1
Edward	256	64	16	4	1
Sanjeevani	625	125	25	5	1
Michael	1296	216	36	6	1
Robert	2401	343	49	7	1