JSON

JSON (JavaScript Object Notation) is a lightweight data-interchange format useful for non-tabular data. It is easy for humans to read and write. It is easy for machines to parse and generate. It supports self-documentation.

Useful to install JSON-formatting browser plugin, such as "JSONView", "JSON Viewer", "JSON Formatter".

Supplementary reference: Jennifer Widom database lectures.

Examples of use: Browser data, NHTSA Complaints database, and ... Jupyter Notebooks.

There is some correspondence with the Python data structures

  • object = dictionary (string:value)
  • array = list
  • value = string/number/true/false/null/object/array/

json module implements encoding/decoding using 'dumps' and 'loads'

Cautions:

  • JSON keys are always strings (not required in Python dictionaries)
  • JSON text is almost pasteable as Python code, but JSON "true/false" map to Python "True/False"

Exercise 1: load a Jupyter notebook

In [5]:
import requests
url = "http://www.acsu.buffalo.edu/~danet/Sp18/MTH448/class7/class7_files/jupyter_example.ipynb"
s = requests.get(url).text
s[0:500]
Out[5]:
'{\n "cells": [\n  {\n   "cell_type": "markdown",\n   "metadata": {},\n   "source": [\n    "# Simple Jupyter Notebook\\n",\n    "\\n",\n    "This is some text in a markdown cell"\n   ]\n  },\n  {\n   "cell_type": "code",\n   "execution_count": 6,\n   "metadata": {},\n   "outputs": [\n    {\n     "data": {\n      "text/plain": [\n       "[\'just\']"\n      ]\n     },\n     "execution_count": 6,\n     "metadata": {},\n     "output_type": "execute_result"\n    }\n   ],\n   "source": [\n    "#Here is some python code\\n",\n    "impor'
In [6]:
import json
d = json.loads(s)
In [8]:
len(d)
Out[8]:
4
In [10]:
type(d)
Out[10]:
dict
In [11]:
for k in d: 
    print(k)
cells
metadata
nbformat
nbformat_minor
In [13]:
d['metadata']
Out[13]:
{'kernelspec': {'display_name': 'Python 3',
  'language': 'python',
  'name': 'python3'},
 'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
  'file_extension': '.py',
  'mimetype': 'text/x-python',
  'name': 'python',
  'nbconvert_exporter': 'python',
  'pygments_lexer': 'ipython3',
  'version': '3.6.3'}}
In [14]:
d['nbformat']
Out[14]:
4
In [15]:
d['nbformat_minor']
Out[15]:
2
In [16]:
d['cells']
Out[16]:
[{'cell_type': 'markdown',
  'metadata': {},
  'source': ['# Simple Jupyter Notebook\n',
   '\n',
   'This is some text in a markdown cell']},
 {'cell_type': 'code',
  'execution_count': 6,
  'metadata': {},
  'outputs': [{'data': {'text/plain': ["['just']"]},
    'execution_count': 6,
    'metadata': {},
    'output_type': 'execute_result'}],
  'source': ['#Here is some python code\n',
   'import re \n',
   '\n',
   "my_string = 'This is just a string'\n",
   "myreg = 'just'\n",
   'matches = re.findall(myreg,my_string) \n',
   'matches']},
 {'cell_type': 'code',
  'execution_count': None,
  'metadata': {'collapsed': True},
  'outputs': [],
  'source': []}]

Obviously, all the notebook information is in the key 'cells'

Homework 1: Due Saturday Feb 25 at midnight

Write a function ipynb_to_py.py that uses json to convert a jupyter notebook '.ipynb' file into a python executable '.py' script.

It should enter include all code in the 'code cells' and it should include all the text in the markdown cells as commented text using.

Submit your code as a ipynb to UBLearns. I will test your code to make sure it works. That is, it converts any .ipynb into a working .py python executable script.

Below is a start...

In [64]:
import json

# open ipynb and load it as a long string
jupyter_notebook_filename = 'jupyter_example.ipynb'
file_object  = open(jupyter_notebook_filename) 
s = file_object.read()

# decode string into a dictionary using json
d = json.loads(s)

#print type of cells and the information in cells
cells = d['cells']
for cell in cells:
    print('type of cell = ' + cell['cell_type'])
    
    lines_in_cell = cell['source']
    for lines in lines_in_cell:
        print(lines)
    
# step 1: open new file_object that replaces .ipynb with .py'

# step 2: print the lines of the cells into the .py file
#        start line with # if the cell type if markdown
#        otherwise just write the line

# close the file_object
type of cell = markdown
# Simple Jupyter Notebook



This is some text in a markdown cell
type of cell = code
#Here is some python code

import re 



my_string = 'This is just a string'

myreg = 'just'

matches = re.findall(myreg,my_string) 

matches
type of cell = code