XML Schemas and validation of XML against them

Download my example xml (myxmls.xml) and its schema (myxmls.xsd)

Below is the XML data document:

In [ ]:
!-- THIS IS MY FIRST HAND-WRITTEN XML DOCUMENT WITH A SCHEMA -->

<mycourses
      xmlns="http://www.acsu.buffalo.edu/~danet/Sp18/MTH448/"+
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.acsu.buffalo.edu/~danet/Sp18/MTH448/class11/class11_files myxmls.xml"
      version="0.0" >

      <course>
              <number>MTH 463</number>
              <name>Data-Oriented Computing</name>
              <semester>201801</semester>
      </course>

      <course>
              <number>MTH 463</number>
              <name>Data-Oriented Computing</name>
              <semester>201801</semester>
      </course>

      <course>
              <number>MTH 649</number>
              <name>Partial Differential Equations</name>
              <semester>201809</semester>
      </course>

</mycourses>

Syntactic validation (i.e. testing that the XML is well-formed), can be done with firefox, some text editors, as well as xmllint.

Semantic validation: xmllint from libxml2-utils

Imposing semantic rules with the XML Schema: myxmls.xsd

My schema document:

In [ ]:
<?xml version="1.0"?>
<!-- THIS IS ALMOST MY FIRST XML SCHEMA -->

<!--  "http://www.w3.org/2001/XMLSchema" is a magic phrase, like "Open, Sesame".
              No connection is being made to that website. Other phrases that work are ...?

              xmlns:xs="http://www.w3.org/2001/XMLSchema"
              is grammatically similar to a Python statement like
              import foo as xs

              "http://www.acsu.buffalo.edu/~danet/Sp18/MTH448/class11/class11_files"
              The mandated use of a URL here as the namespace name is intended to ensure
              that namespace names are unique.

              "qualified" enforces format of the XML instance documents,
              requiring them to use qualified names for items in this namespace.

-->

<xs:schema
      xmlns:xs           ="http://www.w3.org/2001/XMLSchema"
      targetNamespace    ="http://www.acsu.buffalo.edu/~danet/Sp18/MTH448/class11/class11_files"
      elementFormDefault ="qualified">

      <xs:element name="mycourses">
              <xs:complexType>
                      <xs:sequence>
                              <xs:element name="course" maxOccurs="unbounded">
                                      <xs:complexType>
                                              <xs:all>
                                                      <xs:element name="number"   type="xs:string" />
                                                      <xs:element name="name"     type="xs:string" />
                                                      <xs:element name="semester" type="xs:integer" />
                                              </xs:all>
                                      </xs:complexType>
                              </xs:element>
                      </xs:sequence>
                      <xs:attribute name="version" type="xs:decimal" use="required" />
              </xs:complexType>
      </xs:element>


</xs:schema>

Now we validate the xml data document against the xsd schema document:

In [ ]:
$ xmllint --noout --schema myxmls.xsd myxmls.xml
myxmls.xml validates

If we modify the data so that it no longer conforms to the schema, xmllint will tell us.

In [ ]:
<course>
  <number>MTH 463</number>
  <name>Data-Oriented Computing</name>
  <semester>201601!</semester>
</course>

$ xmllint --noout --schema myxmls.xsd myxmls.xml
myxmls.xml:12: element semester: Schemas validity error :
Element '{http://blue.math.buffalo.edu/463/mycourses}semester':
'201601a' is not a valid value of the atomic type 'xs:integer'.
myxmls.xml fails to validate

or

In [ ]:
<course>
  <nombre>MTH 463</nombre>
  <name>Data-Oriented Computing</name>
  <semester>201601</semester>
</course>

$ xmllint --noout --schema myxmls.xsd myxmls.xml
myxmls.xml:10: element nombre: Schemas validity error :
Element '{http://blue.math.buffalo.edu/463/mycourses}nombre':
This element is not expected.
Expected is one of (
{http://blue.math.buffalo.edu/463/mycourses}number,
{http://blue.math.buffalo.edu/463/mycourses}name,
{http://blue.math.buffalo.edu/463/mycourses}semester ).
myxmls.xml fails to validate

Note that with both of the changes above, we still had well-formed XML.

Importantly, you can use regular expressions in your schema to impose strict structure

2. Homework 2 - XML and Schema

Details

  • Create and submit a matching xml dataset and schema.
  • The xml must be a custom dataset that you invent.
  • For guidance, see the example: myxmlre.xml and myxmlre.xsd

  • To receive full credit, your xml and schema must pass a validation test.

    • terminal-based xml validation:
      • xmllint --noout --schema myxmlre.xsd myxmlre.xml
    • python also has an xml validation package called xmlschema
      • you must install this package.
        • I installed by entering into the terminal: conda install -c conda-forge xmlschema
In [2]:
import xmlschema # google xmlschema and read.
my_schema = xmlschema.XMLSchema('myxmlre.xsd')
my_schema.is_valid('myxmlre.xml')
Out[2]:
True

Due Date

  • Sunday March 11 at 11:59pm

Submission

  • Submit to UBLearns a zipped folder called: YourLastNane_HW2.zip (e.g., Taylor_HW2.zip)
  • Inside the zipped folder should be 2 files:
    • HW2.xml
    • HW2.xsd
  • Points will be deducted if you do not use this naming format.