Document Type Definitions
Document Type Definitions (DTDs)
are specifications of rules about
how XML data is structured.
DTDs are being developed for many
application domains and facilitate
the creation and sharing of
XML data. DTDs also facilitate
data validation, i.e., insuring
that the data follows rules appropriate
for the domain.
A DTD is identified with the DOCTYPE
declaration, which follows the
XML declaration and precedes document
elements. The DTDs can be
given in external files and/or
internally to the document.
DTDs can be given internally using the following syntax:
<!DOCTYPE document_element [ internal_subset_declarations ] >
For example:
<!DOCTYPE
message [
<!ELEMENT message (source, (error | warning)+)>
<!ELEMENT source (#PCDATA)>
<!ELEMENT error (#PCDATA)>
<!ATTLIST error level (low | high) "low">
<!ELEMENT warning (#PCDATA)>
]>
DTDs also can be given in external
files. This is appropriate if
the DTD is shared by various documents.
The SYSTEM notation
gives the DTD location explicitly.
For example,
<!DOCTYPE message SYSTEM "file:message.dtd">
The location can be a URL, URI, or URN.
The PUBLIC notation can be used
to provide an adequate indicator
to a document that is presumed
to be available generally in the
operating environment. For
example,
<!DOCTYPE message PUBLIC "CSE/XML/message">
A URL or URI can be provided in
case the application can't find
the DTD from the reference.
For example,
<!DOCTYPE
message PUBLIC "CSE/XML/message"
"http://cse.unl.edu/XML/message.dtd">
The SYSTEM and PUBLIC identifiers
can also be followed by an
additional internal DTD subset.
DTD Declarations
The basic syntax of a DTD declaration is similar to XML notation:
<!keyword parameter_list>
There are four basic keywords:
ELEMENT - XML elements, the nouns.
ATTLIST - XML element attributes, the adjectives.
ENTITY - Character references, macros for content.
NOTATION - Non-XML content, e.g. binary data.
Elements
Elements are central to data models
and vocabularies. The syntax gives
the element name and either a content
category or a content model:
<!ELEMENT
name content_category >
<!ELEMENT
name (content_model) >
There are five content categories:
EMPTY - The element has no contents.
<!ELEMENT ping EMPTY>
ANY - The element can have any contents.
<!ELEMENT attachments ANY>
#PCDATA - The element can have text-only contents.
<!ELEMENT warning (#PCDATA)>
element - The element can have child element(s).
<!ELEMENT message (source, (error | warning)+)>
mixed - The element can have mixed contents.
<!ELEMENT source (#PCDATA | ip_address)>
The expression operators used to
specify a content model are:
Type | Character | Description | Meaning |
binary | , | comma | strict sequence |
binary | | | vertical bar | choice |
unary | ? | question mark | appears zero or once |
unary | * | asterisk | appears zero or more |
unary | + | plus | appears once or more |
Note that without any unary modifier,
a content item appears once.
For example, the specification
(((a , b) | (c+, d)), e?)
would allow the strings ab, abe, cd, ccd, cccd, ..., cde, ccde, cccde, ....
Attributes
Attribute declarations in DTDs are made with the ATTLIST keyword. The syntax is:
<!ATTLIST element_name attribute_declaration_list>
The attribute declarations have the syntax:
attribute_name attribute_type [default_value(s)]
For example, the element automobile with attributes of color, make, model, and year would be declared as:
<!ELEMENT
automobile EMPTY>
<!ATTLIST
automobile
color CDATA #IMPLIED
make CDATA #REQUIRED
model CDATA #IMPLIED
year CDATA #IMPLIED >
The attribute types are:
Attribute Type | Description |
CDATA | Character data (text) |
Enumerated values | Explicit list of choices |
ID | Unique name for element instance |
IDREF | Link to another element's ID |
IDREFS | List of IDREF attributes |
NMTOKEN | Name Token (use NameChar characters) |
NMTOKENS | List of NMTOKEN |
ENTITY | Unparsed external entity |
ENTITIES | List of ENTITY attributes |
NOTATION | Notation reference |
The choices for defaults are:
Attribute Default | Description |
#REQUIRED | Value must be given |
#IMPLIED | Optional value |
#FIXED fixed_value | Optional, value fixed |
default_value | Optional, default available |
Here are a few brief examples, refer to the text book for more details.
For
<!ATTLIST
automobile make CDATA #REQUIRED>
the element declaration
<automobile>...</automobile>
is invalid. The make attribute
(of type CDATA) must be present, for example
<automobile
make="Ford">...</automobile>
The declaration
<!ATTLIST
automobile color CDATA #IMPLIED "red">
is not allowed because there is
no default with #IMPLIED. Instead, the declaration should be
<!ATTLIST
automobile color CDATA "red">
If the only version of a report
is 1.0, then the declaration could be
<!ATTLIST
report version CDATA #FIXED "1.0">
In this case, the element declaration
could either have or not have the version attribute, but if the version
attribute is present then any value other than "1.0" would make the data
invalid.
Enumerated attibute values must
be legal XML name tokens (comprised of only NameChar characters).
For example:
<!ATTLIST
automobile make
(Ford | Lincoln | Mercury)
#REQUIRED>
The ID, IDREF, and IDREFS attibute
types are very useful. For example:
<!ELEMENT
Teacher EMPTY>
<!ATTLIST
Teacher SSN ID #REQUIRED>
<!ELEMENT
Student EMPTY>
<!ATTLIST
Student SID ID #REQUIRED>
<!ELEMENT
Class EMPTY>
<!ATTLIST
Class
Instructor IDREF #REQUIRED
Roster IDREFS #REQUIRED>
would allow:
<Teacher
SSN="123-45-6789" />
<Student
SID="111-22-3333" />
<Student
SID="444-55-6666" />
<Class
Instructor="123-456`-6789"
Roster="111-22-3333 444-55-6666" />
This notation supports one-to-one
and one-to-many mappings. The application must separate the individual
references from the IDREF value.
Entities
Entities are useful for replacable (or modular) items.
General entities are the
simplest type of XML entity. The syntax is:
<!ENTITY
name "replacement_text">
For example:
<!ELEMENT
signature (#PCDATA)>
<!ENTITY
name.full "Stephen E. Reichenbach">
<!ENTITY
title "Associate Professor">
would allow:
<signature>
&name.full;,
&title;
<signature>
Parameter entities are used
exclusively in DTDs (and must be parsed entities). The syntax is:
<!ENTITY
% name "replacement_text">
For example:
<!ENTITY
% address_block "address1 CDATA #REQUIRED
address2 CDATA #IMPLIED city CDATA #REQUIRED
state zipcode CDATA #IMPLIED>
would allow:
<!ATTLIST
customer
%address_block;
customer_id ID #REQUIRED>
<!ATTLIST
vendor
%address_block;
categories IDREFS #IMPLIED>
Both general and parameter entities
can have replacement text that resides in an external file. For example,
a document with several parts could be assembled in this way:
<!ENTITY
chapter1 SYSTEM "chapter1.xml">
<!ENTITY
chapter2 SYSTEM "chapter2.xml">
and
<chapter>&chapter1;<chapter>
<chapter>&chapter2;<chapter>
There are convenient public entity reference sets, for example for Greek characters.
Notations
Notations allow non-XML data.
For example,
<!NOTATION
png SYSTEM "viewer.exe">
<!ELEMENT
Image EMPTY>
<!ATTLIST
Image
type NOTATION (gif | jpg) "gif"
src CDATA #REQUIRED >
allows
<Image
type="jpg" src="portrait.jpg" />
Entities also can be used with unparsed
(non-XML) data. For example:
<!NOTATION
gif SYSTEM "viewer.exe">
<!ENTITY
logo SYSTEM "logo.png" NDATA gif>
When the entity is referenced,
the parser will supply the name, location, and type of the png file to
the application.
Conditional Sections
Sections of a DTD can be conditionally
or selectively included or excluded. The syntax is:
<![INCLUDE[
included_section ]]>
<![IGNORE[
ignored_section ]]>
Conditional statements can be used
with an ENTITY declaration. For example:
<!ENTITY
% debug "INCLUDE">
<!ENTITY
% nodebug "IGNORE">
allows:
<![%debug;[
<!ELEMENT message (information, comment)>
]]>
<![%nodebug;[
<!ELEMENT message (information)>
]]>
Issues
DTDs use their own syntax, which is different than the syntax used for XML data.
DTDs are not extensible. XML parsers can't build DTDs dynamically.
DTDs can't be segmented and have difficulties with Namespaces.
Entities provide only limited extensibility.
DTDs have limited support for data typing.
DTDs don't support inheritence.
DTDs don't support the Document Object Model (DOM).
Textbook Example
The textbook provides an example
DTD and XML.