The eXtensible Markup Language (XML)

Document Object Model (DOM)

The W3C defines the DOM this way:

The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page.

Basically, the DOM is a tree-based application programmers' interface (API) for documents, including XML, HTML, and other hierarchical documents.

There are a number of beneficial aspects of DOM. It ensures proper grammar and well-formedness of the document. It abstracts content in a logical and consistent structure that is not bound to any grammar (e.g., XML). It simplifies internal document manipulation as compared to serialized representations. Its structure is convenient for both hierarchical and relational data models.

The power and flexibility of the DOM comes at a price. A consideration in using DOM is that it loads the entire document into memory and parses it into a document tree. This can require four to five times as much memory as the document itself.

Early document models (so-called DOM Level 0) for HTML were implemented in IE3.0 and Netscape, but the object model was not standardized and the implementations were incompatible. Work on XHTML is intended to guide the evolution of HTML into compatibility with XML.

The W3C DOM activity statement describes the formal DOM process and the status of the three levels. DOM Level 1 (Recommendation Oct 1998, 2nd edition September 2000) was developed to support both XML 1.0 and HTML. The DOM Level 2 Core Specification (Recommendation October 2000) and related recommendations has support for namespaces and CSS and added Views, Events, Style, and Traversal and Range. The DOM Level 3 Core Specification (Working Draft, Jun3 2001) finishes support for namespaces (with XML Infoset and XML Base) and adds support for events, abstract schema (including DTDs), IO, and XPath.

Microsoft's XML parser (MSXML 3.0, Nov 2000) supports DOM Level 1 and has features of DOM Level 2 support (as well as SAX2). These will be enhanced in MSXML4.0 now in preliminary release. There is support for JavaScript, VBScript, Perl, VB, Java, C++, and other languages. The Apache XML work has produced an open-source Xerces parser that supports Level 2 features and is installed on cree.unl.edu. The Java API for XML Processing (JAXP) provides access to DOM, based on Apache Crison, but can use pluggable conformant implementations such as Xerces.

DOM Details

Dom is class-oriented and hierarchical. Both textbooks show the graphically illustrate the class hierarchy of DOM and provide a reference for the classes and class properties and methods. The main base class is Node. The Dlement, Atttr, Text, CDATASection, Entity, EntityReference, Processing Instruction, Comment, Document, DocumentType, DocumentFragment, and Notation objects are all derived from Node. Through our study of XML, the general nature of these classes.

Also, there are many good websites for DOM including www.w3schools.com.

Reading XML into DOM with JAXP

Here is an example of reading an XML file into DOM with JAXP from Professional Java XML:

// Import the W3C DOM classes
import org.w3c.dom.*;

// We are going to use JAXP's classes for DOM I/O
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import javax.xml.transform.dom.*;

// Import other Java classes
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.net.URL;

public class ReadXML
{
   public static void main( String[] args )
   {
      Document doc = null;
      boolean validation = true;

      if( args.length == 0 )
      {
         String usage = "Usage:\tReadXML filename.xml [true|false]\n";
         usage += "\tThe second parameter switches validation, default is true (on)";

         System.out.println( usage );
         return;
      }

      String source = args[0];
      if( args.length == 2 )
      {
         // If anything's there override the default
         validation = Boolean.valueOf(args[1]).booleanValue();
      }

      try
      {
         System.out.println( "Reading from "+source+" with"+
            (validation?"":"out")+" validation." );

         // This is JAXP's way to create a new empty Document
         // The lines within the {} braces are NOT DOM level 1
         {
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            dbf.setValidating(validation);
            DocumentBuilder db = dbf.newDocumentBuilder();
            doc = db.parse(source);
         }

Node root = (Node)doc.getDocumentElement();

System.out.println("Done, here's the same XML re-output: -\n");

Building a DOM Tree

Here is an example from the Professional Java XML that generates a DOM tree and then creates an XML document from it.

// Import the W3C DOM classes
import org.w3c.dom.*;

// We are going to use JAXP's classes for DOM I/O
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import javax.xml.transform.dom.*;

// Import other Java classes
import java.io.PrintWriter;
import java.io.IOException;

public class CreatePresident
{
   public static void main( String[] args )
   {
      Document doc;

      Element president;
      Element person;
      Element firstName;
      Element surname;

      try
      {
         // This is JAXP's way to create a new empty Document
         // The lines within the {} braces are NOT DOM level 1
         {
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db = dbf.newDocumentBuilder();
            doc = db.newDocument();
         }

// We'll start by creating a "<Person>"
person = doc.createElement("Person");

// No create the "<FirstName>" element
firstName = doc.createElement("FirstName");

// Create a Text node "George" and add it to the "FirstName" tag
firstName.appendChild( doc.createTextNode("George") );

// Add the "<FirstName>" tag to "<Person>"
person.appendChild(firstName);

         // Same as above
         surname = doc.createElement("Surname");
         surname.appendChild( doc.createTextNode("Bush") );
         person.appendChild(surname);

president = doc.createElement("President");

         // Set the "Country" attribute in "<Presedent>"
         president.setAttribute("Country","US");
         president.appendChild( person );

// Add everything to the XmlDocument (doc)
doc.appendChild( president );

         // This is JAXP's way to output a DOM to a OutputStream
         // The lines within the {} braces are NOT DOM level 1
         {
            TransformerFactory tFactory = TransformerFactory.newInstance();
            Transformer transformer = tFactory.newTransformer();
            transformer.transform(new DOMSource(doc), new StreamResult(System.out));
         }
      }
      catch( ParserConfigurationException pcEx )
      {
         System.out.println("ParserConfigurationException: "+pcEx.getMessage());
         pcEx.printStackTrace();
      }
      catch( TransformerConfigurationException tcEx )
      {
         System.out.println("TransformerConfigurationException: "+tcEx.getMessage());
         tcEx.printStackTrace();
      }
      catch( TransformerException tEx )
      {
         System.out.println("TransformerException: "+tEx.getMessage());
         tEx.printStackTrace();
      }
   }
}

Note that there is no whitespace in the output of this program. The DOM parser can be set to ignore element content whitespace.

Here is another example from the second edition of the textbook, BasicDOMExample1.java, and another example from Professional Java XML using Java Swing GUI API, XML2JTree.java.

Both the first and second editions of the book have examples using scripting. The first edition example uses a combination of VBScript and JScript. The second edition example, example.html, uses JScript. It is a very nice little program that illustrates the use of HTML, XML, JavaScript, DOM, XSLT, and XPath.

DOM Specification Changes

DOM Level 2 and Level 3 have some changes to better support namespaces, to facilitate document manipulation, to support different views of data, to better handle events, to deal with stylesheets, and to suport traversal and ranges. We don't have time in this class to look at all of them. To deal with the evolution and adoption of such features, DOM now has a hasFeature() method to indicate if a feature is available.

DOM Events

In DOM 2, the support for events is particularly noteworthy. Many of the basic events are supported, including:

Mouse Events
- click
- mousedown
- mouseup
- mouseover
- mousemove
- mouseout
All mouse events bubble and all are cancellable except mousemove.

UI Events
- DOMFocusIn
- DOMFocusOut
- DOMActivate
All UI events bubble and activate events are cancellable.

Mutation Events
- deal with changes to the document such as insertion, etc.
Mutation eventes bubble and are not cancellable.

There are methods to addEventListener(), removeEventListener(), createEvent(), intEvent(), dispatchEvent(), and stopPropagation().

DOM 3 is adding keyboard events and event targets for multiple views.

DOM Traversal

The NodeIterator interface takes a subset of an XML document and sequences the nodes. The interface supports easy movement forward or backward in the sequence of document nodes with the nextNode() and previousNode() methods.

The TreeWalker interface also produces a sequence and allows nextNode() and previousNode() navigation. In addition, it allows tree-based movements parentNode(), firstChild(), lastChild(), previousSibling(), and nextSibling().

Both NodeIterator and TreeWalker provide means to specify the types of nodes (elements, attributes, text, etc.) that are seen in the sequence. Also, there is a NodeFilter interface with an acceptNode() interface that allows user filtering.

There is a simple NodeIterator and TreeWalker example (in Java) in the textbook.