import pandas
airport data (see here for info)
SPARCS data (see here for info)
and download this script
df = pandas.read_clipboard()
df
ddf = pandas.read_csv('airports.csv')
ddf.head()
df = pandas.read_excel('foo.xlsx') df
Exercise - fix airport spreadsheet by reaplacing NA Exercise: fix the airports.csv dataframe by replacing all the nulls by "NA". (Pandas interprets NA for North America as "Not Available" :-/ )
ddf = ddf.fillna('NA')
ddf.head()
from numpy import *
a = vander([3,4,5,6,7])
df = pandas.DataFrame(a)
df
On construction of a dataframe, pandas will provide labels for the rows and columns, as seen above.
But we can change them if we like:
df.columns=['aa','b','c','d','z']
df
df['aa']
Or a subset of the columns:
df[['aa','z']]
The index of the dataframe labels the rows:
df.index
df.index=['Maggie','Edward','Sanjeevani','Michael','Robert']
df
df.loc['Sanjeevani']
iloc provides access by row number:
df.iloc[2]
Labels could be integers:
df.index=['Maggie',4,'Sanjeevani','Michael','Robert']
df
Then
df.loc[4] # gives row with label 4
ix provides access by either label or row number. If a row has an integer label, i, and we ask ix for row i, do we get row with label i? Or row number i?
df.ix[4]
Answer row number i.
These indexers also support slicing. Beware that unlike every other start:stop slicing in Python, "stop" is included:
df.loc['Maggie':'Michael']