XML (eXtensible Markup Language)

What is XML? Wikipedia says: Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format which is both human-readable and machine-readable.

Examples of XML standards

GPX is an XML format for exchanging GPS data

Example: Winnipeg.gpx recorded using MyTracks on cell phone

KML for plotting things on Google Earth (Example: NCEDC earthquakes )

SVG for vector graphics (Example: simple.svg)

In [1]:
from lxml import etree

with open('simple.svg') as f:
    doc = etree.fromstring( f.read() )

for item in doc:
    print( item )
    print('\t',item.attrib )
<Element {http://www.w3.org/2000/svg}circle at 0x1058c65c8>
	 {'cx': '50', 'cy': '50', 'r': '40', 'stroke': 'gray', 'stroke-width': '8', 'fill': '#77cc77'}

And of course, HTML

Many sources provide data in their own ad-hoc XML format. Example: real-time Chicago bus information

In [2]:
with open('getBusesForRoute.xml') as f:
    doc = etree.fromstring( f.read() )
In [3]:
type(doc)
Out[3]:
lxml.etree._Element
In [4]:
for item in doc:
    print( type(item) )
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
<class 'lxml.etree._Element'>
In [5]:
for item in doc[:4]:
    print( item.tag, item.attrib )
    for jtem in item:
        print('\t',jtem.tag)
time {}
bus {}
	 id
	 rt
	 rtdd
	 d
	 dd
	 dn
	 lat
	 lon
	 pid
	 pd
	 run
	 fs
	 op
	 dip
	 bid
	 wid1
	 wid2
bus {}
	 id
	 rt
	 rtdd
	 d
	 dd
	 dn
	 lat
	 lon
	 pid
	 pd
	 run
	 fs
	 op
	 dip
	 bid
	 wid1
	 wid2
bus {}
	 id
	 rt
	 rtdd
	 d
	 dd
	 dn
	 lat
	 lon
	 pid
	 pd
	 run
	 fs
	 op
	 dip
	 bid
	 wid1
	 wid2