XML Schema
XML Schema allow for structural descriptions of XML documents.
Like DTDs, Schema are important because the allow the creation and sharing of abstract structural descriptions. XML Schema have a number of advantages over DTDs.
Schema use XML syntax, so the same tools for processing XML data (e.g., parsers, DOM, SAX, XSLT, etc.) can be used for processing Schema.
Schema support a richer set of data types, allows users to derive their own data types, and supports inheritance.
Schema allow more flexible and powerful definitions of content models.
Schema are extensible, supporting easier reuse of parts of schemas in other schemas (e.g., easier use of Namespaces).
Schema have annotation elements that provide greater documentation of structural design.
XML Schema are powerful but complicated and the Schema Recommendation is long. The World Wide Web Consortium has a primer on XML Schema at http://www.w3c.org/TR/xmlschema-0. These notes contain some examples from that primer (as well as from the text).
Parsing
Xerces-j from the Apache XML project is based on technology donated by IBM.
With Schema, a document is parsed (and determined to be well-formed) before it is validated according to the Schema. XML Schema work on abstract data models (as defined in the Infoset), not on instance documents.
Schema Components is a generic term for the blocks that comprise the abstract data model of the schema. There are 12 components in three groups: Primary, Secondary and Helper Components.
The Schema root element includes
the namespace for the Schema and sets up the xsd: prefix:
<xsd:schema xmlns:xsd="http://www.w3/org/2001/XMLSchema">
...
</xsd>
Primary Components
There are four primary components:
element declarations
simple type definitions
complex type definitions
attribute declarations
Element Declarations
The syntax for element declarations
is:
<xsd:element name="element_name" />
For example:
<xsd:element name="date_of_visit" />
Types can be associated with elements
using the following syntax:
<xsd:element name="element_name" type="element_type" />
For example:
<xsd:element name="date_of_visit" type="xsd:date" />
Cardinality can be given for the
presence of the element in the data:
<xsd:element name="element_name"
minOccurs="cardinality" maxOccurs="cardinality">
For example:
<xsd:element name="date_of_visit" type="xsd:date"
minOccurs="0" maxOccurs="unbounded">
Both minOccurs and maxOccurs have
a default of 1.
Simple Data Type Definitions
Schema have simple data types and complex data types. There are three categories of simple types: primitive, derived, and user-defined.
The primitive data types are: string, boolean, number (arbitrary precision numbers with support for at least 18 decimal digits), float, double, decimal, duration, dateTime, time, date, gYear, gMonth, gDay, gYearMonth, gMonthDay, hexBinary, base64Binary, AnyURI, QName, and Notation.
The built-in derived types are: normalizedString, token, language, ID, IDREF, IDREFS, ENTITIES, NMTOKEN, NMTOKENS, name, NCName, ENTITY, integer, positiveInteger, negativeInteger, nonNegativeInteger, nonPositiveInteger, long, int, short, byte, unsignedLong, unsignedInt, unsignedShort, and unsignedByte.
User-defined simple data type can
be created from existing simple data types. For example:
<xsd:simpleType name="myInteger">
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="10000"/>
<xsd:maxInclusive value="99999"/>
</xsd:restriction>
</xsd:simpleType>
The creation of user-defined simple
types, lists, and unions is described more fully later.
Complex Data Type Definitions
Complex data types allow elements to have content models with multiple component elements. They support reuse and inheritence.
One of the operations for creating
a complex data type is the Schema sequence. For example:
<xsd:complexType name="Address">
<xsd:sequence>
<xsd:element name="Street1" />
<xsd:element name="Street2" />
<xsd:element name="City" />
<xsd:element name="State" />
<xsd:element name="Zip" />
</xsd:sequence>
</xsd:complexType>
<xsd:element name="mailingAddress type="Address" />
<xsd:element name="billingAddress type="Address" />
Complex types that are needed in
only one element can be declared anonymously (without an explicit name):
<xsd:element name="Name">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="FirstName" />
<xsd:element name="MiddleName"
minOccurs="0" maxOccurs="unbounded" />
<xsd:element name="LastName" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Attribute Declarations
The syntax for declaring attributes
is:
<xsd:attribute name="attribute_name> />
The declaration can specify various
attibutes of the attribute. For example:
<xsd:attribute name="Paid" type="boolean" use="required" />
Of course, attributes are used
to describe elements:
<xsd:element name="Invoice">
<xsd:complexType>
<xsd:attribute name="Paid" type="boolean"
use="required" default="False" />
</xsd:complexType>
</xsd:element>
Content Models
Consider four content models already seen in DTDs: Any, Empty, Element-Only, and Mixed.
An element of any type can be declared
using the following declaration:
<xsd:element name="freeForm" type="xsd:anyType" />
An element of empty type can be
delared using the following declartion:
<xsd:element name="Price">
<xsd:complexType>
<xsd:attribute name="value" type="xsd:decimal"/>
</xsd:complexType>
</xsd:element>
Types with attributes are necessarily
complex.
Examples of elements with element content have been illustrated previously.
Specifying mixed content allows
character data between child elements. For example:
<xsd:element name="Response">
<xsd:complexType mixed="true">
<xsd:sequence>
<xsd:element name="Header" />
<xsd:element name="Query">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="CID" type="xsd:number"/>
<xsd:element name="OID" type="xsd:number"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
allows text to appear in Header
and Query of Response.
Elements can be declared to extend
previously declared elements with additional elements or attributes.
For example:
<xsd:element name="internationalPrice">
<xsd:complexType>
<xsd:complexContent>
<xsd:restriction base="Price">
<xsd:attribute name="currency" type="xsd:string"/>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
Secondary Components
There are four secondary components
of XML Schema:
Model Group
Definitions
Attribute Groups
Notation Declarations
Identity Constraints.
Model Group Definitions
Model Group Definitions are named groups of components (including element declarations, wildcards, and other model gourps) that can be used in the content models of complex types. In this, they are similar to Parameter Entities in DTDs.
The group element declaration must
be declared at the top-level, as a child of the schema element. The
group elements can be listed subject to the choice or all elements.
For example:
<xsd:group name="Services">
<xsd:choice>
<xsd:element name="Room" />
<xsd:element name="Board" />
<xsd:element name="Package" />
</xsd:choice>
</xsd:group>
<xsd:group name="Payment">
<xsd:all>
<xsd:element name="Name" />
<xsd:element name="Account" />
</xsd:all>
</xsd:group>
<xsd:element name="Record">
<xsd:sequence>
<xsd:group ref="Services" />
<xsd:group ref="Payment" />
</xsd:sequence>
</xsd:element>
There are aspects on the use of the choice and all elements not described here. The sequence element can also be used inside the group definition. The choice and all elements also can be used outside of group declarations.
Attribute Groups
Attributes can be grouped much as
elements are grouped. For example:
<xsd:attributeGroup name="Options">
<xsd:attribute name="ExtColor" use="required"
type="Colorname"/>
<xsd:attribute name="IntColor" use="required"
type="Colorname"/>
</xsd:attributeGroup>
<xsd:element name="Robot">
<xsd:complexType>
<xsd:attribute name="RIN" type="xsd:ID"/>
<xsd:attributeGroup ref="Options"/>
</xsd:complexType>
</xsd:element>
Notation Declarations
As with DTDs, notation declarations
associate a name with an application, e.g., an application for displaying
an image. For example:
<xsd:notation
name="jpg" public="image/jpeg" system="viewer.exe" />
and
<xsd:element
name="image">
<xsd:complexType>
<xsd:attribute name="src" type="xsd:AnyURI:" />
<xsd:complexContent>
<xsd:extension base="xsd:hexBinary">
<xsd:attribute name="format">
<xsd:simpleType>
<xsd:restriction base="xsd:NOTATION">
<xsd:enumeration value="jpg"/>
<xsd:enumeration value="gif"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
allows:
<image src="http://cse.unl.edu/ex.gif"
format="gif" />
Identity Constraints
Schema support ID, IDREF, and IDREFS as simple types for attributes. It provides additional elements. The unique element is used to ensure unique values for elements and attributes within a defined scope. Two other powerful elements - key and keyref - can be used to establish relationships between elements, attributes, values, or even element/attribute combinations.
Helper Components
Helper Components are parts of other
components. Annotations are helper components that document the schema
or pass hints to the application. For example:
<xsd:complexType
name="Employee">
<xsd:annotation>
<xsd:documentation>
The Personnel Handbook (rev. 2.0, 04/1999) defines an employee
...
</xsd:documentation>
</xsd:annotation>
...
</xsd:complexType>
Creating Data Types
A user-defined simple data type is derived from a built-in primitive data type or a built-in derived data type. The second edition of the textbook has a useful chart of the built-in primitive and derived simple data types.
Data types are composed of three parts: a value space, a lexical space, and facets. Value spaces have facets related to value, including equality, order, bounds, and cardinality. Lexical spaces have facets related to text, including length, pattern, and whitespace normalization. Some facets relate to both value and lexical spaces.
The fundamental facets are:
length,
minLength, maxLength,
pattern,
enumeration,
minInclusive,
maxInclusive, minExclusive, maxExclusive,
whitespace (rules
for whitespace normalization),
totalDigits,
and
fractionDigits.
New data types can be derived using
the restriction element to constrain facets of an existing type as seen
previously. Here is another example:
<xsd:simpleType
name="USState">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="AK"/>
<xsd:enumeration value="AL"/>
<xsd:enumeration value="AR"/>
<!-- and so on ... -->
</xsd:restriction>
</xsd:simpleType>
New data types can be defined using
the list and union elements. For example:
<xsd:simpleType
name="USStateList>
<xsd:list itemType="USState">
</xsd:simpleType>
<xsd:element
name="Region" type="USStateList">
would allow:
<Region>IA
KS NE</Region>
Extensive Example
There are many syntax errors in
the example in the second edition of the textbook. For example, the
closing element of the xsd:schema element lacks the '/'. This example
is from Professional Java XML, K. Ahmed (Wrox, 2001). The
XML document in the file Schema/contract.xml
references the Schema file Schema/contract.xsd.
The SAX parser will parse and validate this document.