The eXtensible Markup Language (XML)

XML Schema

XML Schema allow for structural descriptions of XML documents.

Like DTDs, Schema are important because the allow the creation and sharing of abstract structural descriptions.  XML Schema have a number of advantages over DTDs.

Schema use XML syntax, so the same tools for processing XML data (e.g., parsers, DOM, SAX, XSLT, etc.) can be used for processing Schema.

Schema support a richer set of data types, allows users to derive their own data types, and supports inheritance.

Schema allow more flexible and powerful definitions of content models.

Schema are extensible, supporting easier reuse of parts of schemas in other schemas (e.g., easier use of Namespaces).

Schema have annotation elements that provide greater documentation of structural design.

XML Schema are powerful but complicated and the Schema Recommendation is long.  The World Wide Web Consortium has a primer on XML Schema at http://www.w3c.org/TR/xmlschema-0.  These notes contain some examples from that primer (as well as from the text).

Parsing

Xerces-j from the Apache XML project is based on technology donated by IBM.

With Schema, a document is parsed (and determined to be well-formed) before it is validated according to the Schema.  XML Schema work on abstract data models (as defined in the Infoset), not on instance documents.

Schema Components is a generic term for the blocks that comprise the abstract data model of the schema.  There are 12 components in three groups: Primary, Secondary and Helper Components.

The Schema root element includes the namespace for the Schema and sets up the xsd: prefix:
        <xsd:schema xmlns:xsd="http://www.w3/org/2001/XMLSchema">
        ...
        </xsd>

Primary Components

There are four primary components:
        element declarations
        simple type definitions
        complex type definitions
        attribute declarations

Element Declarations

The syntax for element declarations is:
        <xsd:element name="element_name" />
For example:
        <xsd:element name="date_of_visit" />

Types can be associated with elements using the following syntax:
        <xsd:element name="element_name" type="element_type" />
For example:
        <xsd:element name="date_of_visit" type="xsd:date" />

Cardinality can be given for the presence of the element in the data:
        <xsd:element name="element_name"
                minOccurs="cardinality" maxOccurs="cardinality">
For example:
        <xsd:element name="date_of_visit" type="xsd:date"
                minOccurs="0" maxOccurs="unbounded">
Both minOccurs and maxOccurs have a default of 1.

Simple Data Type Definitions

Schema have simple data types and complex data types.  There are three categories of simple types: primitive, derived, and user-defined.

The primitive data types are: string, boolean, number (arbitrary precision numbers with support for at least 18 decimal digits), float, double, decimal, duration, dateTime, time, date, gYear, gMonth, gDay, gYearMonth, gMonthDay, hexBinary, base64Binary, AnyURI, QName, and Notation.

The built-in derived types are: normalizedString, token, language, ID, IDREF, IDREFS, ENTITIES, NMTOKEN, NMTOKENS, name, NCName, ENTITY, integer, positiveInteger, negativeInteger, nonNegativeInteger, nonPositiveInteger, long, int, short, byte, unsignedLong, unsignedInt, unsignedShort, and unsignedByte.

User-defined simple data type can be created from existing simple data types.  For example:
        <xsd:simpleType name="myInteger">
                <xsd:restriction base="xsd:integer">
                        <xsd:minInclusive value="10000"/>
                        <xsd:maxInclusive value="99999"/>
                </xsd:restriction>
        </xsd:simpleType>
The creation of user-defined simple types, lists, and unions is described more fully later.

Complex Data Type Definitions

Complex data types allow elements to have content models with multiple component elements.  They support reuse and inheritence.

One of the operations for creating a complex data type is the Schema sequence.    For example:
        <xsd:complexType name="Address">
                <xsd:sequence>
                        <xsd:element name="Street1" />
                        <xsd:element name="Street2" />
                        <xsd:element name="City" />
                        <xsd:element name="State" />
                        <xsd:element name="Zip" />
                </xsd:sequence>
        </xsd:complexType>
        <xsd:element name="mailingAddress type="Address" />
        <xsd:element name="billingAddress type="Address" />

Complex types that are needed in only one element can be declared anonymously (without an explicit name):
        <xsd:element name="Name">
                <xsd:complexType>
                        <xsd:sequence>
                                <xsd:element name="FirstName" />
                                <xsd:element name="MiddleName"
                                        minOccurs="0" maxOccurs="unbounded" />
                                <xsd:element name="LastName" />
                        </xsd:sequence>
                </xsd:complexType>
        </xsd:element>

Attribute Declarations

The syntax for declaring attributes is:
        <xsd:attribute name="attribute_name> />
The declaration can specify various attibutes of the attribute.  For example:
        <xsd:attribute name="Paid" type="boolean" use="required" />
Of course, attributes are used to describe elements:
        <xsd:element name="Invoice">
                <xsd:complexType>
                        <xsd:attribute name="Paid" type="boolean"
                                use="required" default="False" />
                </xsd:complexType>
        </xsd:element>

Content Models

Consider four content models already seen in DTDs:  Any, Empty, Element-Only, and Mixed.

An element of any type can be declared using the following declaration:
        <xsd:element name="freeForm" type="xsd:anyType" />

An element of empty type can be delared using the following declartion:
        <xsd:element name="Price">
                <xsd:complexType>
                        <xsd:attribute name="value" type="xsd:decimal"/>
                </xsd:complexType>
        </xsd:element>
Types with attributes are necessarily complex.

Examples of elements with element content have been illustrated previously.

Specifying mixed content allows character data between child elements.  For example:
        <xsd:element name="Response">
            <xsd:complexType mixed="true">
                <xsd:sequence>
                    <xsd:element name="Header" />
                    <xsd:element name="Query">
                        <xsd:complexType>
                            <xsd:sequence>
                                <xsd:element name="CID" type="xsd:number"/>
                                <xsd:element name="OID" type="xsd:number"/>
                            </xsd:sequence>
                        </xsd:complexType>
                    </xsd:element>
                </xsd:sequence>
            </xsd:complexType>
        </xsd:element>
allows text to appear in Header and Query of Response.

Elements can be declared to extend previously declared elements with additional  elements or attributes.  For example:
        <xsd:element name="internationalPrice">
            <xsd:complexType>
                <xsd:complexContent>
                    <xsd:restriction base="Price">
                        <xsd:attribute name="currency" type="xsd:string"/>
                    </xsd:restriction>
                </xsd:complexContent>
            </xsd:complexType>
        </xsd:element>

Secondary Components

There are four secondary components of XML Schema:
    Model Group Definitions
    Attribute Groups
    Notation Declarations
    Identity Constraints.

Model Group Definitions

Model Group Definitions are named groups of components (including element declarations, wildcards, and other model gourps) that can be used in the content models of complex types.  In this, they are similar to Parameter Entities in DTDs.

The group element declaration must be declared at the top-level, as a child of the schema element.  The group elements can be listed subject to the choice or all elements.  For example:
        <xsd:group name="Services">
                <xsd:choice>
                        <xsd:element name="Room" />
                        <xsd:element name="Board" />
                        <xsd:element name="Package" />
                </xsd:choice>
        </xsd:group>
        <xsd:group name="Payment">
                <xsd:all>
                        <xsd:element name="Name" />
                        <xsd:element name="Account" />
                </xsd:all>
        </xsd:group>
        <xsd:element name="Record">
                <xsd:sequence>
                        <xsd:group ref="Services" />
                        <xsd:group ref="Payment" />
                </xsd:sequence>
        </xsd:element>

There are aspects on the use of the choice and all elements not described here.  The sequence element can also be used inside the group definition.  The choice and all elements also can be used outside of group declarations.

Attribute Groups

Attributes can be grouped much as elements are grouped.  For example:
        <xsd:attributeGroup name="Options">
                <xsd:attribute name="ExtColor" use="required"
                                                             type="Colorname"/>
                <xsd:attribute name="IntColor" use="required"
                                                            type="Colorname"/>
        </xsd:attributeGroup>
        <xsd:element name="Robot">
                <xsd:complexType>
                        <xsd:attribute name="RIN" type="xsd:ID"/>
                        <xsd:attributeGroup ref="Options"/>
                </xsd:complexType>
        </xsd:element>

Notation Declarations

As with DTDs, notation declarations associate a name with an application, e.g., an application for displaying an image.  For example:
    <xsd:notation name="jpg" public="image/jpeg" system="viewer.exe" />
and
    <xsd:element name="image">
        <xsd:complexType>
                <xsd:attribute name="src" type="xsd:AnyURI:" />
                <xsd:complexContent>
                        <xsd:extension base="xsd:hexBinary">
                                <xsd:attribute name="format">
                                        <xsd:simpleType>
                                                <xsd:restriction base="xsd:NOTATION">
                                                        <xsd:enumeration value="jpg"/>
                                                        <xsd:enumeration value="gif"/>
                                                </xsd:restriction>
                                        </xsd:simpleType>
                                </xsd:attribute>
                        </xsd:extension>
                </xsd:complexContent>
        </xsd:complexType>
    </xsd:element>
allows:
    <image src="http://cse.unl.edu/ex.gif" format="gif" />

Identity Constraints

Schema support ID, IDREF, and IDREFS as simple types for attributes.  It provides additional elements.  The unique element is used to ensure unique values for elements and attributes within a defined scope.  Two other powerful elements - key and keyref - can be used to establish relationships between elements, attributes, values, or even element/attribute combinations.

Helper Components

Helper Components are parts of other components.  Annotations are helper components that document the schema or pass hints to the application.  For example:
    <xsd:complexType  name="Employee">
        <xsd:annotation>
            <xsd:documentation>
                The Personnel Handbook (rev. 2.0, 04/1999) defines an employee
                ...
            </xsd:documentation>
        </xsd:annotation>
        ...
    </xsd:complexType>

Creating Data Types

A user-defined simple data type is derived from a built-in primitive data type or a built-in derived data type.  The second edition of the textbook has a useful chart of the built-in primitive and derived simple data types.

Data types are composed of three parts: a value space, a lexical space, and facets.  Value spaces have facets related to value, including equality, order, bounds, and cardinality.  Lexical spaces have facets related to text, including length, pattern, and whitespace normalization.  Some facets relate to both value and lexical spaces.

The fundamental facets are:
    length,
    minLength, maxLength,
    pattern,
    enumeration,
    minInclusive, maxInclusive, minExclusive, maxExclusive,
    whitespace (rules for whitespace normalization),
    totalDigits, and
    fractionDigits.

New data types can be derived using the restriction element to constrain facets of an existing type as seen previously.  Here is another example:
    <xsd:simpleType  name="USState">
        <xsd:restriction base="xsd:string">
            <xsd:enumeration value="AK"/>
            <xsd:enumeration value="AL"/>
            <xsd:enumeration value="AR"/>
            <!-- and so on ... -->
        </xsd:restriction>
    </xsd:simpleType>

New data types can be defined using the list and union elements.  For example:
    <xsd:simpleType name="USStateList>
        <xsd:list itemType="USState">
    </xsd:simpleType>
    <xsd:element name="Region" type="USStateList">
would allow:
    <Region>IA KS NE</Region>

Extensive Example

There are many syntax errors in the example in the second edition of the textbook.  For example, the closing element of the xsd:schema element lacks the '/'.  This example is from Professional Java XML, K. Ahmed (Wrox, 2001).  The XML document in the file Schema/contract.xml references the Schema file Schema/contract.xsd.  The SAX parser will parse and validate this document.