SPARC dataset - an extended study

SPARCS data (see here for info)

In [4]:
import pandas
from numpy import *
In [5]:
# cleaning the data
mycons = {'Total Costs':lambda x:float(x.replace('$','')),
          'Total Charges':lambda x:float(x.replace('$','')),
          'Length of Stay':lambda x:int(x.replace('+',''))}
s = pandas.read_csv('sparcs2014.csv',converters=mycons)
s.head()
Out[5]:
Health Service Area Hospital County Operating Certificate Number Facility Id Facility Name Age Group Zip Code - 3 digits Gender Race Ethnicity ... Payment Typology 2 Payment Typology 3 Attending Provider License Number Operating Provider License Number Other Provider License Number Birth Weight Abortion Edit Indicator Emergency Department Indicator Total Charges Total Costs
0 Western NY Allegany 226700.0 37.0 Cuba Memorial Hospital Inc 30 to 49 147 F White Not Span/Hispanic ... NaN NaN 90335341.0 NaN NaN 0 N Y 9546.85 12303.20
1 Western NY Allegany 226700.0 37.0 Cuba Memorial Hospital Inc 50 to 69 147 F White Not Span/Hispanic ... NaN NaN 90335341.0 NaN NaN 0 N Y 11462.75 10298.32
2 Western NY Allegany 226700.0 37.0 Cuba Memorial Hospital Inc 18 to 29 147 M White Not Span/Hispanic ... NaN NaN 90335341.0 167816.0 NaN 0 N Y 1609.40 1966.25
3 Western NY Allegany 226700.0 37.0 Cuba Memorial Hospital Inc 18 to 29 147 F White Not Span/Hispanic ... NaN NaN 90335341.0 167816.0 NaN 0 N Y 2638.75 2863.94
4 Western NY Allegany 226700.0 37.0 Cuba Memorial Hospital Inc 18 to 29 147 F White Not Span/Hispanic ... NaN NaN 90335341.0 NaN NaN 0 N Y 3538.25 4656.77

5 rows × 39 columns

Recall that we can grab the info for the most expensive total charge using

In [6]:
itcmax = s['Total Charges'].idxmax()# index of row in which the charages are the highest
s.iloc[itcmax] # get data in row for the most expensive hospital visit
Out[6]:
Health Service Area                                                        New York City
Hospital County                                                                    Bronx
Operating Certificate Number                                                 7.00001e+06
Facility Id                                                                         1169
Facility Name                          Montefiore Medical Center - Henry & Lucy Moses...
Age Group                                                                        0 to 17
Zip Code - 3 digits                                                                  104
Gender                                                                                 M
Race                                                              Black/African American
Ethnicity                                                              Not Span/Hispanic
Length of Stay                                                                       120
Admit Day of Week                                                                    FRI
Type of Admission                                                              Emergency
Patient Disposition                                         Home w/ Home Health Services
Discharge Year                                                                      2014
Discharge Day of Week                                                                TUE
CCS Diagnosis Code                                                                    63
CCS Diagnosis Description                                        WHITE BLOODCELL DISEASE
CCS Procedure Code                                                                    64
CCS Procedure Description                                         BONE MARROW TRANSPLANT
APR DRG Code                                                                           3
APR DRG Description                                               BONE MARROW TRANSPLANT
APR MDC Code                                                                          16
APR MDC Description                    Diseases and Disorders of Blood, Blood Forming...
APR Severity of Illness Code                                                           4
APR Severity of Illness Description                                              Extreme
APR Risk of Mortality                                                            Extreme
APR Medical Surgical Description                                                Surgical
Payment Typology 1                                              Private Health Insurance
Payment Typology 2                                                              Self-Pay
Payment Typology 3                                                                   NaN
Attending Provider License Number                                                 198304
Operating Provider License Number                                                 229870
Other Provider License Number                                                        NaN
Birth Weight                                                                           0
Abortion Edit Indicator                                                                N
Emergency Department Indicator                                                         N
Total Charges                                                                8.59346e+06
Total Costs                                                                  2.96142e+06
Name: 965564, dtype: object

We can also study all hospital visits for a particular hospital

In [7]:
print(s['Facility Name'][itcmax])
print(s['Facility Id'][itcmax])
Montefiore Medical Center - Henry & Lucy Moses Div
1169.0
In [8]:
Montefiore_Medical_Center_data = s[s['Facility Name']=='Montefiore Medical Center - Henry & Lucy Moses Div']
len(Montefiore_Medical_Center_data)
Out[8]:
40041

This hostpital had 40,000+ visits!

Let's see how many births happened at this hospital

In [9]:
mmcbabies = Montefiore_Medical_Center_data[ Montefiore_Medical_Center_data['CCS Diagnosis Description'] == 'LIVEBORN' ]
#babylogcharges = log10(mmcbabies['Total Charges'])
#babymin = babylogcharges.min()
print(len(mmcbabies)) 
mmcbabies
17
Out[9]:
Health Service Area Hospital County Operating Certificate Number Facility Id Facility Name Age Group Zip Code - 3 digits Gender Race Ethnicity ... Payment Typology 2 Payment Typology 3 Attending Provider License Number Operating Provider License Number Other Provider License Number Birth Weight Abortion Edit Indicator Emergency Department Indicator Total Charges Total Costs
950795 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 104 F Other Race Not Span/Hispanic ... Medicaid Self-Pay 198374.0 260567.0 NaN 1600 N N 1708560.99 320283.19
951583 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 104 M Black/African American Not Span/Hispanic ... Self-Pay NaN 260567.0 260567.0 NaN 3300 N N 135761.10 28406.53
953234 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 104 F White Not Span/Hispanic ... Self-Pay NaN 215886.0 191052.0 NaN 4900 N N 1137589.18 265956.32
953247 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 104 M Other Race Not Span/Hispanic ... Medicaid Self-Pay 248239.0 191052.0 NaN 3600 N N 1974849.52 497833.71
954312 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 104 M White Not Span/Hispanic ... Medicaid Self-Pay 260733.0 240229.0 NaN 3400 N N 629367.26 176809.44
954670 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 112 F Other Race Not Span/Hispanic ... Medicaid Self-Pay 156936.0 191052.0 NaN 2100 N N 331016.49 74112.41
954888 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 104 M Black/African American Not Span/Hispanic ... Medicaid Self-Pay 248239.0 191052.0 NaN 3200 N N 388244.48 90134.71
956278 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 104 M Other Race Spanish/Hispanic ... Medicaid Self-Pay 241550.0 228459.0 NaN 1600 N N 595902.09 180442.20
966869 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 100 M Black/African American Not Span/Hispanic ... Self-Pay NaN 271469.0 260567.0 NaN 3800 N N 381524.26 111155.79
967021 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 109 F White Not Span/Hispanic ... Medicaid Self-Pay 260733.0 90254770.0 NaN 2600 N N 350724.70 63746.86
968891 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 100 F Other Race Not Span/Hispanic ... Self-Pay NaN 250182.0 250182.0 NaN 2500 N N 67388.25 16998.92
973104 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 104 M White Spanish/Hispanic ... Self-Pay NaN 172015.0 172015.0 NaN 2500 N N 717343.25 144991.29
973547 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 104 M Other Race Not Span/Hispanic ... Medicaid Self-Pay 256091.0 190719.0 NaN 2800 N N 119553.43 30303.36
975992 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 104 F Other Race Not Span/Hispanic ... Medicaid Self-Pay 260567.0 120819.0 NaN 2300 N N 68228.61 18510.49
977166 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 104 F Other Race Not Span/Hispanic ... Medicaid Self-Pay 269942.0 190719.0 NaN 2300 N N 417421.01 102866.25
979494 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 104 F Black/African American Not Span/Hispanic ... Medicaid Self-Pay 270318.0 90254770.0 NaN 2900 N N 406212.07 108074.16
989927 New York City Bronx 7000006.0 1169.0 Montefiore Medical Center - Henry & Lucy Moses... 0 to 17 114 F Black/African American Not Span/Hispanic ... Medicaid Self-Pay 213619.0 213619.0 NaN 2900 N N 244949.56 49361.94

17 rows × 39 columns

Make a histogram of the total charges for these births

In [10]:
%pylab inline
n, bins, patches = plt.hist(mmcbabies['Total Charges']/1000, 15, density=1, facecolor='green', alpha=0.25)
Populating the interactive namespace from numpy and matplotlib

Make a list of all hospital names

In [11]:
hospital_list = set(s['Facility Name'])
print(hospital_list)
len(hospital_list)
{'St Charles Hospital', 'St James Mercy Hospital - Mercycare', 'Medina Memorial Hospital', 'Ira Davenport Memorial Hospital Inc', 'Ellis Hospital', 'Lenox Hill Hospital', 'Mid-Hudson Valley Division of Westchester Medical Center', 'Bellevue Hospital Center', 'The University of Vermont Health Network - Elizabethtown Community Hos', 'RUMC-Bayley Seton', 'Abortion Record - Facility Name Redacted', 'St Anthony Community Hospital', 'Womans Christian Assoc Hospital - WCA Hosp at Jones Memorial Health Ce', 'Chenango Memorial Hospital Inc', 'Our Lady of Lourdes Memorial Hospital Inc', 'Monroe Community Hospital', 'Westfield Memorial Hospital Inc', 'Sunnyview Hospital and Rehabilitation Center', 'Degraff Memorial Hospital', 'The University of Vermont Health Network - Champlain Valley Physicians', 'Brookhaven Memorial Hospital Medical Center Inc', 'New York Methodist Hospital', 'St Johns Episcopal Hospital So Shore', 'SJRH - St Johns Division', 'Samaritan Medical Center', 'New York Presbyterian Hospital - Columbia Presbyterian Center', 'Syosset Hospital', 'Peconic Bay Medical Center', 'Gouverneur Hospital', 'South Nassau Communities Hospital', "St. Peter's Addiction Recovery Center", 'Mercy Medical Center', 'St Elizabeth Medical Center', 'Eastern Niagara Hospital - Newfane Division', 'St Catherine of Siena Hospital', 'Community Memorial Hospital Inc', 'Carthage Area Hospital Inc', "St. Mary's Healthcare - Amsterdam Memorial Campus", 'The Burdett Care Center', 'Interfaith Medical Center', 'Eastern Long Island Hospital', 'The Unity Hospital of Rochester', 'New York Presbyterian Hospital - Allen Hospital', 'Sisters of Charity Hospital - St Joseph Campus', 'Glens Falls Hospital', 'New York Presbyterian Hospital - New York Weill Cornell Center', 'Niagara Falls Memorial Medical Center', 'Mount Sinai Beth Israel', 'Good Samaritan Hospital of Suffern', 'NewYork-Presbyterian/Hudson Valley Hospital', 'New York Community Hospital of Brooklyn, Inc', 'Montefiore New Rochelle Hospital', 'Brooklyn Hospital Center - Downtown Campus', 'NYU Hospitals Center', "St Luke's Cornwall Hospital/Newburgh", 'Cobleskill Regional Hospital', "St. Mary's Hospital", 'Lewis County General Hospital', 'Arnot Ogden Medical Center', 'Montefiore Medical Center-Wakefield Hospital', 'Oswego Hospital - Alvin L Krakau Comm Mtl Health Center Div', 'Crouse Hospital - Commonwealth Division', 'Plainview Hospital', "O'Connor Hospital", 'Millard Fillmore Suburban Hospital', 'New York Hospital Medical Center of Queens', 'Henry J. Carter Specialty Hospital', 'Aurelia Osborn Fox Memorial Hospital', 'Vassar Brothers Medical Center', 'Southside Hospital', "Woman's Christian Association", 'New York Presbyterian Hospital - Westchester Division', 'St James Mercy Hospital', 'Nicholas H Noyes Memorial Hospital', 'Massena Memorial Hospital', 'Phelps Memorial Hospital Assn', 'Saratoga Hospital', 'Erie County Medical Center', 'Franklin Hospital', 'Helen Hayes Hospital', 'Rome Memorial Hospital, Inc', 'Metropolitan Hospital Center', 'Albany Medical Center Hospital', 'Coney Island Hospital', 'Highland Hospital', 'HealthAlliance Hospital Broadway Campus', 'Northern Westchester Hospital', 'Memorial Hosp of Wm F & Gertrude F Jones A/K/A Jones Memorial Hosp', 'North Central Bronx Hospital', 'University Hospital of Brooklyn', 'United Health Services Hospitals Inc. - Wilson Medical Center', 'Wyckoff Heights Medical Center', 'Eastern Niagara Hospital - Lockport Division', 'Montefiore Medical Center - Henry & Lucy Moses Div', "Long Island Jewish Schneiders Children's Hospital Division", 'Brookdale Hospital Medical Center', "St Joseph's Medical Center", 'Wyoming County Community Hospital', 'Mount Sinai Hospital - Mount Sinai Hospital of Queens', "Ellis Hospital - Bellevue Woman's Care Center Division", 'SJRH - Dobbs Ferry Pavillion', 'Cortland Regional Medical Center Inc', 'Summit Park Hospital-Rockland County Infirmary', 'Memorial Hospital for Cancer and Allied Diseases', 'St. Joseph Hospital', 'Sisters of Charity Hospital', 'SUNY Downstate Medical Center at LICH', 'North Shore University Hospital', 'Faxton-St Lukes Healthcare St Lukes Division', 'Buffalo General Hospital', 'Auburn Memorial Hospital', 'Blythedale Childrens Hospital', 'Bronx-Lebanon Hospital Center - Fulton Division', "St. Mary's Healthcare", 'St Josephs Hospital Health Center', 'Harlem Hospital Center', 'Montefiore Med Center - Jack D Weiler Hosp of A Einstein College Div', 'Bon Secours Community Hospital', 'United Health Services Hospitals Inc. - Binghamton General Hospital', 'Forest Hills Hospital', 'NYU Lutheran Medical Center', 'Strong Memorial Hospital', 'Woodhull Medical & Mental Health Center', 'Winthrop-University Hospital', 'Clifton-Fine Hospital', 'Mount Sinai Beth Israel Brooklyn', 'Catskill Regional Medical Center - G. Hermann Site', 'Albany Medical Center - South Clinical Campus', 'Oswego Hospital', 'Mount Sinai Hospital', 'Bertrand Chaffee Hospital', 'Cayuga Medical Center at Ithaca', 'Alice Hyde Medical Center', 'Geneva General Hospital', 'Northern Dutchess Hospital', 'Claxton-Hepburn Medical Center', 'Clifton Springs Hospital and Clinic', 'Winifred Masterson Burke Rehabilitation Hospital', 'Orange Regional Medical Center-Goshen Campus', 'Mary Imogene Bassett Hospital', 'Hospital for Special Surgery', 'Jamaica Hospital Medical Center', 'University Hospital', 'Southampton Hospital', 'SJRH - Park Care Pavilion', 'Long Island Jewish Medical Center', 'New York-Presbyterian/Lawrence Hospital', 'Mount Sinai St. Lukes', 'Moses-Ludington Hospital', 'Samaritan Hospital', 'Jacobi Medical Center', "Women And Children's Hospital Of Buffalo", 'Kings County Hospital Center', 'Good Samaritan Hospital Medical Center', 'Canton-Potsdam Hospital', 'Richmond University Medical Center', 'Glen Cove Hospital', 'River Hospital, Inc.', 'NYU Hospital for Joint Diseases', 'Schuyler Hospital', 'Delaware Valley Hospital Inc', 'New York-Presbyterian/Lower Manhattan Hospital', 'Elmhurst Hospital Center', 'Soldiers and Sailors Memorial Hospital of Yates County Inc', 'UPSTATE University Hospital at Community General', 'Queens Hospital Center', 'Little Falls Hospital', "St Joseph's MC-St Vincents Westchester Division", 'Olean General Hospital', 'F F Thompson Hospital', 'Flushing Hospital Medical Center', 'Nassau University Medical Center', 'Nathan Littauer Hospital', 'John T Mather Memorial Hospital of Port Jefferson New York Inc', 'Margaretville Hospital', 'Corning Hospital', 'Albany Memorial Hospital', 'Roswell Park Cancer Institute', 'Adirondack Medical Center-Saranac Lake Site', "HealthAlliance Hospital Mary's Avenue Campus", 'Lincoln Medical & Mental Health Center', 'TLC Health Network Lake Shore Hospital', 'Nyack Hospital', 'United Memorial Medical Center Bank Street Campus', 'St Josephs Hospital', 'Montefiore Mount Vernon Hospital', 'White Plains Hospital Center', 'Crouse Hospital', 'St Francis Hospital', 'Ellenville Regional Hospital', 'Rochester General Hospital', 'Cuba Memorial Hospital Inc', 'University Hospital SUNY Health Science Center', 'Newark-Wayne Community Hospital', 'Kenmore Mercy Hospital', 'Calvary Hospital Inc', 'Mercy Hospital of Buffalo', 'Bronx-Lebanon Hospital Center - Concourse Division', 'Maimonides Medical Center', 'Mount St Marys Hospital and Health Center', 'Catskill Regional Medical Center', 'Westchester Medical Center', 'Columbia Memorial Hospital', 'Putnam Hospital Center', 'St Peters Hospital', 'New York Eye and Ear Infirmary of Mount Sinai', 'Huntington Hospital', 'Brooks Memorial Hospital', 'Oneida Healthcare Center', 'Kingsbrook Jewish Medical Center', 'United Memorial Medical Center North Street Campus', 'Staten Island University Hosp-North', 'The Unity Hospital of Rochester-St Marys Campus', 'SBH Health System', 'Staten Island University Hosp-South', 'Mount Sinai Roosevelt'}
Out[11]:
216

Let's focus on Erie County

In [12]:
Erie_county_visits = s[s['Hospital County']=='Erie']# reduce dataset to hospitals in Erie County
print(set(Erie_county_visits['Facility Name']))# print them. Why is 'set' used in this line of code?
{'Kenmore Mercy Hospital', 'Mercy Hospital of Buffalo', 'Erie County Medical Center', 'Sisters of Charity Hospital', 'Bertrand Chaffee Hospital', 'Sisters of Charity Hospital - St Joseph Campus', 'Roswell Park Cancer Institute', 'Buffalo General Hospital', "Women And Children's Hospital Of Buffalo", 'Millard Fillmore Suburban Hospital'}

HW 5 - due Thursday May 3 by 11:59pm

Submit to UBLearns a 1 page pdf document that contains a figure related to the SPARC dataset and a paragraph (3-5 sentences) explaining your figure. Do not submit code. It must fit on 1 page.

For example, one could do the following:

In [13]:
g = Erie_county_visits.groupby('Facility Name')
for name,item in g: 
    print(name)
Bertrand Chaffee Hospital
Buffalo General Hospital
Erie County Medical Center
Kenmore Mercy Hospital
Mercy Hospital of Buffalo
Millard Fillmore Suburban Hospital
Roswell Park Cancer Institute
Sisters of Charity Hospital
Sisters of Charity Hospital - St Joseph Campus
Women And Children's Hospital Of Buffalo

What are all the different diagnoses

In [14]:
print(set(Erie_county_visits['CCS Diagnosis Description']))
{'CARDIAC ARREST & VF', 'OTHER BENIGN NEOPLASM', 'CYSTIC FIBROSIS', 'DEVELOPMENTAL DISORDERS', 'UMBILICAL CORD COMPL', 'OTHER BACTERIAL INFECTN', 'COMA/BRAIN DAMAGE', 'OPEN WOUND EXTREMITIES', 'THYROID DISORDER', 'CORONARY ATHEROSCLER', 'ABDOMINAL HERNIA', 'INTRACRANIAL INJURY', 'OTHER AFTERCARE', 'GASTRODUODENAL ULCER', 'EARLY LABOR', 'RECTAL/ANAL CANCER', 'NORML PREGNCY/DELIVRY', 'SKIN MELANOMAS', 'MENOPAUSAL DISORDER', 'OTHER KIDNEY DISEASE', 'OTHER URINARY CANCER', 'UNCLASSIFIED', 'PARALYSIS', 'OTH CONN TISSUE DISEASE', 'MULTIPLE MYELOMA', "PARKINSON'S DISEASE", 'BRAIN/NERV SYST CANCER', 'OTHER EYE DISORDER', 'BPHYPERPLASIA', 'RHEUMATOID ARTHRITIS', 'POLYHYDRAMNIOS ET AL', 'PULMNRY HEART DISEASE', 'PREG DIABETES/ABN GLUC', 'OTHR PREGNANCY COMPL', 'BILIARY TRACT DISEASE', 'CRUSH/INTERNAL INJURY', 'SECONDARY MALIGNANCY', 'FETAL DISTRESS', 'MALE GENITAL CANCER', 'GLAUCOMA', 'LUPUS', 'HYPERTENSION W/COMPL', 'MOOD DISORDERS', 'NONMALG BREAST DISORD', 'CHRONIC RENAL FAILURE', 'OTHR VEIN/LYMPH DISEASE', 'IMPULSE CNTRL DISORDRS', 'ARM FRACTURE', 'OTHR UPPR RESP DISEASE', 'MAL-POSITION/PRESNTATN', 'TUBERCULOSIS', 'INFLUENZA', 'BENIGN UTERIN NEOPLASM', 'DIVERTICUL-OSIS/ITIS', 'MOUTH DISORDER', 'OTH CONGENTL ANOMALY', 'PSYCHOTROPIC POISONING', 'UTERINE CANCER', 'BIRTH TRAUMA', 'ENDOMETRIOSIS', 'NON-EPITHELIAL CANCER', 'OTHR ENDOCRINE DISORDR', 'EPILEPSY/CONVULSIONS', 'OTHER CVD', 'ABDOMINAL PAIN', 'DIABETES W/O COMPL', 'MULTIPLE SCLEROSIS', 'SUPERFICL INJRY/CONTUSN', 'GI HEMORRHAGE', 'OTH COMP BIRTH/PUERPRM', 'THYROID CANCER', 'ACUTE POSTHMRG ANEMIA', 'ADMIN/SOCIAL ADMISSION', 'SCREEN/HIST MH/SA CODES', 'PNEUMONIA', 'GASTROENTRTS NONINFCT', 'HIV INFECTION', 'OTITIS MEDIA', 'CERVICAL CANCER', 'OTHR LOWR RESP DISEASE', 'ENTERITIS/ULCER COLITIS', 'ACQUIRD FOOT DEFORMITY', 'OTHER GI DISORDER', 'CHF', 'SCHIZ/OTH PSYCH DISORDR', 'GOUT/CRYSTAL ARTHRPTHY', 'COPD', 'OT FEMALE GENITL DISORD', 'PELVIC OBSTRUCTION', 'ENCEPHALITIS', 'JOINT INJURY', 'INFCTV ARTHRITS/OSTEOMY', 'CONDUCTION DISORDER', 'OTHER BONE DISEASE', 'SPINAL CORD INJURY', 'INTESTINAL INFECTION', 'PREGNANCY HEMORRHAG', 'LIVER/BILE DUCT CANCER', 'PRECEREBRAL OCCLUSION', 'BACK PROBLEM', 'HYPERLIPIDEMIA', 'MALIGNNT NEOPLASM NOS', 'HEART VALVE DISORDER', 'OTHER FRACTURE', 'PERITONITIS', 'UNKNOWN ORIGIN FEVER', 'PERSONALITY DISORDERS', 'ANAL/RECTAL COND', 'MYCOSES', 'BRONCHITIS', 'CHEMO/RADIATN THERAPY', 'URINARY TRACT INFECTION', 'OTHER BLADDR DISEASE', 'CHILDHOOD DISORDERS', 'OTHR CIRCULATRY DISEASE', 'ARTRL EMBOLISM/THROMB', 'SEXUAL INFECTIONS', 'PLEURISY', 'OTHER LIVER DISEASE', 'OTHR ACQUIRD DEFORMITY', 'ANEURYSM', 'POLIO/OTHER CNS INFECT', 'SYNCOPE', 'GI CONGENITAL ANOMALY', 'RESP DISTRSS SYNDROME', 'LYMPHADENITIS', 'ASTHMA', 'OTHR UPPR RESP INFECTN', 'PHLEBITIS', 'OTHR GI/PERITONL CANCER', 'HIP FRACTURE', 'OB PERINEAL/VULV TRAUMA', 'HYPERTENSION', 'ADJUSTMENT DISORDERS', 'GASTRITIS/DUODENITIS', 'OTHER HEART DISEASE', 'BLADDER CANCER', 'FATIGUE/MALAISE', 'ESOPHAGEAL CANCER', 'LIVEBORN', 'HODGKINS DISEASE', 'PREVIOUS C-SECTION', 'OTHR COGNITIVE DISORDRS', 'FORCEPS DELIVERY', 'OTHER NEOPLASM NOS', 'SKIN/SUBCUT TISS INFECT', 'ECTOPIC PREGNANCY', 'ALLERGIC REACTION', 'ANXIETY DISORDERS', 'SURGCL/MED CARE COMPL', 'OTHER GU COND', 'VIRAL INFECTION', 'BLINDNESS', 'IMMUNITY DISORDER', 'COAG/HEMRRGE DISORDER', 'OTHER INJURY/COND', 'TEETH/JAW DISORDER', 'TESTICULAR CANCER', 'PANCREAS DISORDER', 'TIA', 'CARDITIS/CARDIOMYOPTHY', 'INTESTINAL OBSTRUCTION', 'BEHAVIOR DISORDERS', 'BURNS', 'PREGNANCY HYPERTENSN', 'CARDIO/CIRC CONG ANMLY', 'SKULL/FACE FRACTURE', 'BIRTH ASPHYXIA', 'HEAD/NECK CANCER', 'SHORT GEST/LOW BRTHWT', 'OTH HERED/DEG CNS COND', 'CHEST PAIN', 'OTHR NUTRITION DISORDER', 'NEPHRITIS', 'SHOCK', 'WHITE BLOODCELL DISEASE', 'PROLONGED PREGNANCY', 'POSTABORTION COMPL', 'PERIPH/VSCRL ATHRSCLRS', 'REHAB/PROSTH FIT/ADJUST', 'BREAST CANCER', 'OVARIAN CANCER', 'ASPIRATION PNEUMONITIS', 'KIDNEY/RENAL CANCER', 'OTHR HEMATOLOGIC COND', 'RETINAL PROBLEMS', 'EYE INFECTION', 'DYSRHYTHMIA', 'LUNG DISEASE XTRN CAUSE', 'ANEMIA', 'BRONCHIAL/LUNG CANCER', 'SPRAIN/STRAIN', 'OTHR STOMACH DISORDER', 'FEMALE GENITAL CANCER', 'PANCREAS CANCER', 'OTHER PRIMARY CANCER', 'STOMACH CANCER', 'ACUTE RENAL FAILURE', 'NON-HODGKINS LYMPHOMA', 'OTHER SCREENING', 'TONSILLITIS', 'OTHER NERVE DISORDER', 'DIABETES W/COMPL', 'NUTRITIONAL DEFICIENCY', 'LEG FRACTURE', 'MENINGITIS', 'PROSTATE CANCER', 'OSTEOARTHRITIS', 'OTHR RESPIRATRY CANCER', 'HEPATITIS', 'FLUID/ELECTRLYT DISORDR', 'ALCOHOL-RELATD DISORDER', 'NERVS SYS CONG ANOMLY', 'DEVICE/IMPLNT/GRFT COMP', 'OTHER INFECTIONS', 'OTH MALE GENITL DISORDR', 'HEMMORHOIDS', 'CHRONIC SKIN ULCER', 'URINARY TRACT STONE', 'DIZZINESS/VERTIGO', 'MISC DISORDERS', 'VARICOSE VEIN', 'BONE/CONN TISSU CANCER', 'OTHER EAR DISORDER', 'ADULT RESPIRTRY FAILURE', 'ACUTE CVD', 'SICKLE CELL ANEMIA', 'ACUTE MYOCARDL INFARCT', 'GANGRENE', 'OPEN WOUND HEAD/TRUNK', 'NAUSEA/VOMITING', 'OTHER PERINATAL COND', 'ESOPHGEAL DISORDER', 'OTHER JOINT DISORDER', 'LEUKEMIAS', 'OVARIAN CYST', 'GU CONGENITAL ANOMALY', 'OTHR INFLAMM SKIN COND', 'PERINATAL JAUNDICE', 'LATE EFFECT CVD', 'MEDICAL EXAM/EVALUATN', 'APPENDICITIS', 'MENSTRUAL DISORDER', 'MALE GENITL INFLAMMTN', 'COLON CANCER', 'HEADACHE/MIGRAINE', 'PID', 'SUBSTANCE-RLTD DISORDER', 'FEMALE GENITL PROLAPSE', 'OTH MEDS/DRUG POISONG', 'SEPTICEMIA', 'PATHOLOGICAL FX', 'NONMEDICINAL POISONING', 'OTHER SKIN DISORDER'}

How many different diagnoses occured at each hostpital

In [15]:
g['CCS Diagnosis Description'].count()    

counter = 0
names=list()
diagnoses_counts = []
for name,item in g: 
    names.append(name)
    diagnoses_counts.append(item['CCS Diagnosis Description'].count())
print(names)        
print(diagnoses_counts)
['Bertrand Chaffee Hospital', 'Buffalo General Hospital', 'Erie County Medical Center', 'Kenmore Mercy Hospital', 'Mercy Hospital of Buffalo', 'Millard Fillmore Suburban Hospital', 'Roswell Park Cancer Institute', 'Sisters of Charity Hospital', 'Sisters of Charity Hospital - St Joseph Campus', "Women And Children's Hospital Of Buffalo"]
[802, 21928, 16669, 7713, 21186, 16685, 4466, 15153, 3957, 12742]
In [16]:
%pylab inline

bar(np.arange(len(names)), diagnoses_counts, align='center', alpha=0.5)
xticks(np.arange(len(names)), [name[0:15] for name in names],rotation=90)
ylabel('number of diagnoses')
title('Diversity of Diagnoses in Erie County')
Populating the interactive namespace from numpy and matplotlib
Out[16]:
Text(0.5,1,'Diversity of Diagnoses in Erie County')

value counts

In [17]:
s['Patient Disposition'].value_counts()
Out[17]:
Home or Self Care                        1581401
Home w/ Home Health Services              317027
Skilled Nursing Home                      223621
Expired                                    50913
Left Against Medical Advice                47012
Inpatient Rehabilitation Facility          41524
Short-term Hospital                        40817
Psychiatric Hospital or Unit of Hosp       12565
Hospice - Medical Facility                 12513
Hospice - Home                             10540
Another Type Not Listed                     8441
Facility w/ Custodial/Supportive Care       6268
Medicare Cert Long Term Care Hospital       3789
Court/Law Enforcement                       3731
Cancer Center or Children's Hospital        2504
Hosp Basd Medicare Approved Swing Bed       1544
Federal Health Care Facility                 725
Medicaid Cert Nursing Facility               152
Critical Access Hospital                     121
Name: Patient Disposition, dtype: int64
In [18]:
s['CCS Diagnosis Description'].value_counts()
Out[18]:
LIVEBORN                   231293
SEPTICEMIA                  96517
OSTEOARTHRITIS              59366
MOOD DISORDERS              58624
CHF                         56848
PNEUMONIA                   46113
SCHIZ/OTH PSYCH DISORDR     43908
DYSRHYTHMIA                 43628
ALCOHOL-RELATD DISORDER     41824
OTH COMP BIRTH/PUERPRM      39211
DEVICE/IMPLNT/GRFT COMP     38947
SKIN/SUBCUT TISS INFECT     38645
SUBSTANCE-RLTD DISORDER     36649
DIABETES W/COMPL            35084
ACUTE CVD                   34874
ASTHMA                      34609
BACK PROBLEM                34541
ACUTE MYOCARDL INFARCT      33357
OTHR PREGNANCY COMPL        32975
COPD                        32330
PROLONGED PREGNANCY         32292
CORONARY ATHEROSCLER        31813
CHEST PAIN                  31438
REHAB/PROSTH FIT/ADJUST     30756
SURGCL/MED CARE COMPL       30731
URINARY TRACT INFECTION     30244
PREVIOUS C-SECTION          28651
OB PERINEAL/VULV TRAUMA     26319
FLUID/ELECTRLYT DISORDR     26041
EPILEPSY/CONVULSIONS        25750
                            ...  
MENOPAUSAL DISORDER           350
SEXUAL INFECTIONS             348
SHORT GEST/LOW BRTHWT         347
RETINAL PROBLEMS              339
ACQUIRD FOOT DEFORMITY        331
OTHER URINARY CANCER          292
OTHR HEMATOLOGIC COND         286
MEDICAL EXAM/EVALUATN         268
SKIN MELANOMAS                235
POSTABORTION COMPL            233
RESP DISTRSS SYNDROME         231
VARICOSE VEIN                 225
OTHR RESPIRATRY CANCER        192
DEVELOPMENTAL DISORDERS       167
FORCEPS DELIVERY              156
ADMIN/SOCIAL ADMISSION        149
IMMUNITY DISORDER             140
TESTICULAR CANCER             130
IMMUNIZATION/SCREENING         84
SHOCK                          84
BIRTH ASPHYXIA                 79
SUICIDE/SELF-INFLCT INJ        77
GLAUCOMA                       69
MALE GENITAL CANCER            66
HYPERLIPIDEMIA                 59
BIRTH TRAUMA                   33
CONTRCPTV/PROCRTV MGT          26
CATARACT                       26
OSTEOPOROSIS                   21
FEMALE INFERTILITY             10
Name: CCS Diagnosis Description, Length: 262, dtype: int64

str.contains

In [19]:
sc = s[ s['CCS Diagnosis Description'].str.contains('CANCER') ]
sc['CCS Diagnosis Description'].value_counts()
Out[19]:
BRONCHIAL/LUNG CANCER      9447
COLON CANCER               5902
BREAST CANCER              5736
PROSTATE CANCER            4334
KIDNEY/RENAL CANCER        3322
PANCREAS CANCER            2859
BRAIN/NERV SYST CANCER     2703
RECTAL/ANAL CANCER         2700
UTERINE CANCER             2426
BLADDER CANCER             2384
HEAD/NECK CANCER           2368
LIVER/BILE DUCT CANCER     2167
STOMACH CANCER             2019
THYROID CANCER             1717
OTHR GI/PERITONL CANCER    1629
OVARIAN CANCER             1618
BONE/CONN TISSU CANCER     1226
ESOPHAGEAL CANCER           879
CERVICAL CANCER             838
OTHER PRIMARY CANCER        607
NON-EPITHELIAL CANCER       474
FEMALE GENITAL CANCER       418
OTHER URINARY CANCER        292
OTHR RESPIRATRY CANCER      192
TESTICULAR CANCER           130
MALE GENITAL CANCER          66
Name: CCS Diagnosis Description, dtype: int64
In [20]:
g = s.groupby('Hospital County')
for name,group in g:
    print(name,len(group))
Albany 63461
Allegany 2052
Bronx 183847
Broome 28213
Cattaraugus 5262
Cayuga 4984
Chautauqua 11132
Chemung 15803
Chenango 1761
Clinton 9868
Columbia 5528
Cortland 3745
Delaware 762
Dutchess 30932
Erie 121301
Essex 410
Franklin 4710
Fulton 3076
Genesee 4412
Herkimer 748
Jefferson 11092
Kings 258712
Lewis 1641
Livingston 2023
Madison 4729
Manhattan 412188
Monroe 104802
Montgomery 6500
Nassau 180741
Niagara 17400
Oneida 29735
Onondaga 76495
Ontario 11487
Orange 39572
Orleans 1372
Oswego 5092
Otsego 12449
Putnam 6650
Queens 199374
Rensselaer 12476
Richmond 58876
Rockland 32211
Saratoga 9167
Schenectady 21963
Schoharie 562
Schuyler 651
St Lawrence 10705
Steuben 6438
Suffolk 159765
Sullivan 4442
Tompkins 7226
Ulster 11537
Warren 13939
Wayne 5315
Westchester 119384
Wyoming 2256
Yates 759
In [21]:
g.agg({'Total Charges':'median'}).sort_values('Total Charges',ascending=False)
Out[21]:
Total Charges
Hospital County
Nassau 33457.220
Manhattan 31191.865
Suffolk 31143.000
Orange 27844.385
Rockland 26242.630
Putnam 25818.580
Albany 24941.200
Dutchess 24442.405
Westchester 22476.255
Richmond 21551.195
Saratoga 21055.910
Kings 20466.985
Bronx 20387.020
Queens 20376.820
Clinton 19832.975
Onondaga 18779.020
Sullivan 18411.120
Ulster 18330.850
Fulton 16380.865
Warren 15494.660
Erie 15248.480
Schenectady 14969.120
Oneida 14747.810
Chenango 14201.500
Broome 14152.400
Rensselaer 13882.760
Madison 13774.910
Monroe 13732.395
Chemung 12829.090
Columbia 12700.515
Montgomery 12375.345
Otsego 12328.140
Cayuga 12146.000
Franklin 11561.195
Jefferson 11143.970
Wayne 10740.780
Genesee 10601.320
Ontario 10369.530
Oswego 10178.085
Steuben 9896.750
Niagara 9362.760
Schoharie 9193.440
St Lawrence 9022.900
Cattaraugus 8897.075
Orleans 8696.010
Delaware 8687.275
Herkimer 8466.495
Yates 8105.930
Tompkins 8004.660
Livingston 7820.470
Essex 7799.880
Chautauqua 7425.375
Cortland 7211.710
Lewis 7115.270
Allegany 6864.775
Wyoming 6592.915
Schuyler 5909.140