The eXtensible Markup Language (XML)

Document Type Definitions

Document Type Definitions (DTDs) are specifications of rules about
how XML data is structured.  DTDs are being developed for many
application domains and facilitate the creation and sharing of
XML data.  DTDs also facilitate data validation, i.e., insuring
that the data follows rules appropriate for the domain.

A DTD is identified with the DOCTYPE declaration, which follows the
XML declaration and precedes document elements.  The DTDs can be
given in external files and/or internally to the document.

DTDs can be given internally using the following syntax:

    <!DOCTYPE document_element [ internal_subset_declarations ] >

For example:

    <!DOCTYPE message [
        <!ELEMENT message (source, (error | warning)+)>
        <!ELEMENT source (#PCDATA)>
        <!ELEMENT error (#PCDATA)>
        <!ATTLIST error level (low | high) "low">
        <!ELEMENT warning (#PCDATA)>
    ]>

DTDs also can be given in external files.  This is appropriate if
the DTD is shared by various documents.  The SYSTEM notation
gives the DTD location explicitly.  For example,

    <!DOCTYPE message SYSTEM "file:message.dtd">

The location can be a URL, URI, or URN.

The PUBLIC notation can be used to provide an adequate indicator
to a document that is presumed to be available generally in the
operating environment.  For example,

    <!DOCTYPE message PUBLIC "CSE/XML/message">

A URL or URI can be provided in case the application can't find
the DTD from the reference.  For example,

    <!DOCTYPE message PUBLIC "CSE/XML/message"
        "http://cse.unl.edu/XML/message.dtd">

The SYSTEM and PUBLIC identifiers can also be followed by an
additional internal DTD subset.

DTD Declarations

The basic syntax of a DTD declaration is similar to XML notation:

    <!keyword parameter_list>

There are four basic keywords:

        ELEMENT - XML elements, the nouns.
        ATTLIST - XML element attributes, the adjectives.
        ENTITY - Character references, macros for content.
        NOTATION - Non-XML content, e.g. binary data.

Elements

Elements are central to data models and vocabularies.  The syntax gives
the element name and either a content category or a content model:

    <!ELEMENT name content_category >
    <!ELEMENT name (content_model) >

There are five content categories:

        EMPTY - The element has no contents.
        <!ELEMENT ping EMPTY>

        ANY - The element can have any contents.
        <!ELEMENT attachments ANY>

        #PCDATA - The element can have text-only contents.
        <!ELEMENT warning (#PCDATA)>

        element - The element can have child element(s).
        <!ELEMENT message (source, (error | warning)+)>

        mixed - The element can have mixed contents.
        <!ELEMENT source (#PCDATA | ip_address)>

The expression operators used to specify a content model are:
 

Type Character Description Meaning
binary , comma strict sequence
binary | vertical bar choice
unary ? question mark appears zero or once
unary * asterisk appears zero or more
unary + plus appears once or more

Note that without any unary modifier, a content item appears once.
For example, the specification

    (((a , b) | (c+, d)), e?)

would allow the strings ab, abe, cd, ccd, cccd, ..., cde, ccde, cccde, ....

Attributes

Attribute declarations in DTDs are made with the ATTLIST keyword.  The syntax is:

    <!ATTLIST element_name  attribute_declaration_list>

The attribute declarations have the syntax:

    attribute_name  attribute_type  [default_value(s)]

For example, the element automobile with attributes of color, make, model, and year would be declared as:

    <!ELEMENT automobile EMPTY>
    <!ATTLIST automobile
        color  CDATA  #IMPLIED
        make  CDATA  #REQUIRED
        model  CDATA  #IMPLIED
        year  CDATA  #IMPLIED  >

The attribute types are:
 

Attribute Type Description
CDATA Character data (text)
Enumerated values Explicit list of choices
ID Unique name for element instance
IDREF Link to another element's ID
IDREFS List of IDREF attributes
NMTOKEN Name Token (use NameChar characters)
NMTOKENS List of NMTOKEN
ENTITY Unparsed external entity
ENTITIES List of ENTITY attributes
NOTATION Notation reference

The choices for defaults are:
 

Attribute Default Description
#REQUIRED Value must be given
#IMPLIED Optional value
#FIXED fixed_value Optional, value fixed
default_value Optional, default available

Here are a few brief examples, refer to the text book for more details.

For
    <!ATTLIST  automobile  make  CDATA  #REQUIRED>
the element declaration
    <automobile>...</automobile>
is invalid.  The make attribute (of type CDATA) must be present, for example
    <automobile  make="Ford">...</automobile>

The declaration
    <!ATTLIST  automobile  color  CDATA  #IMPLIED  "red">
is not allowed because there is no default with #IMPLIED.  Instead, the declaration should be
    <!ATTLIST  automobile  color  CDATA "red">

If the only version of a report is 1.0, then the declaration could be
    <!ATTLIST  report  version  CDATA  #FIXED  "1.0">
In this case, the element declaration could either have or not have the version attribute, but if the version attribute is present then any value other than "1.0" would make the data invalid.

Enumerated attibute values must be legal XML name tokens (comprised of only NameChar characters).  For example:
    <!ATTLIST  automobile  make
        (Ford | Lincoln | Mercury)
        #REQUIRED>

The ID, IDREF, and IDREFS attibute types are very useful.  For example:
    <!ELEMENT  Teacher  EMPTY>
    <!ATTLIST  Teacher  SSN  ID  #REQUIRED>
    <!ELEMENT  Student  EMPTY>
    <!ATTLIST  Student SID ID #REQUIRED>
    <!ELEMENT  Class EMPTY>
    <!ATTLIST  Class
        Instructor  IDREF  #REQUIRED
        Roster  IDREFS #REQUIRED>
would allow:
    <Teacher  SSN="123-45-6789" />
    <Student  SID="111-22-3333" />
    <Student  SID="444-55-6666" />
    <Class  Instructor="123-456`-6789"
        Roster="111-22-3333 444-55-6666" />
This notation supports one-to-one and one-to-many mappings.  The application must separate the individual references from the IDREF value.

Entities

Entities are useful for replacable (or modular) items.

General entities are the simplest type of XML entity.  The syntax is:
    <!ENTITY  name  "replacement_text">
For example:
    <!ELEMENT  signature  (#PCDATA)>
    <!ENTITY  name.full  "Stephen E. Reichenbach">
    <!ENTITY  title  "Associate Professor">
would allow:
    <signature>
    &name.full;,   &title;
    <signature>

Parameter entities are used exclusively in DTDs (and must be parsed entities).  The syntax is:
    <!ENTITY  %  name  "replacement_text">
For example:
    <!ENTITY  %  address_block  "address1  CDATA  #REQUIRED  address2  CDATA  #IMPLIED  city  CDATA  #REQUIRED  state  zipcode  CDATA  #IMPLIED>
would allow:
    <!ATTLIST  customer
        %address_block;
        customer_id  ID  #REQUIRED>
    <!ATTLIST  vendor
        %address_block;
        categories IDREFS #IMPLIED>

Both general and parameter entities can have replacement text that resides in an external file.  For example, a document with several parts could be assembled in this way:
    <!ENTITY  chapter1  SYSTEM  "chapter1.xml">
    <!ENTITY  chapter2  SYSTEM  "chapter2.xml">
and
    <chapter>&chapter1;<chapter>
    <chapter>&chapter2;<chapter>

There are convenient public entity reference sets, for example for Greek characters.

Notations

Notations allow non-XML data.   For example,
    <!NOTATION  png  SYSTEM  "viewer.exe">
    <!ELEMENT  Image  EMPTY>
    <!ATTLIST  Image
        type NOTATION  (gif | jpg)  "gif"
        src  CDATA  #REQUIRED >
allows
    <Image  type="jpg"  src="portrait.jpg" />

Entities also can be used with unparsed (non-XML) data.  For example:
    <!NOTATION  gif  SYSTEM  "viewer.exe">
    <!ENTITY  logo  SYSTEM  "logo.png"  NDATA  gif>
When the entity is referenced, the parser will supply the name, location, and type of the png file to the application.

Conditional Sections

Sections of a DTD can be conditionally or selectively included or excluded.   The syntax is:
    <![INCLUDE[ included_section ]]>
    <![IGNORE[ ignored_section ]]>
Conditional statements can be used with an ENTITY declaration.  For example:
    <!ENTITY  %  debug  "INCLUDE">
    <!ENTITY  %  nodebug  "IGNORE">
 allows:
    <![%debug;[
        <!ELEMENT  message  (information, comment)>
    ]]>
    <![%nodebug;[
        <!ELEMENT  message  (information)>
    ]]>

Issues

DTDs use their own syntax, which is different than the syntax used for XML data.

DTDs are not extensible.  XML parsers can't build DTDs dynamically.

DTDs can't be segmented and have difficulties with Namespaces.

Entities provide only limited extensibility.

DTDs have limited support for data typing.

DTDs don't support inheritence.

DTDs don't support the Document Object Model (DOM).

Textbook Example

The textbook provides an example DTD and XML.