Internet Systems and Programming - XHTML

XHTML

The eXtensible HyperText Markup Language (XHTML) has been developed to replace the HyperText Markup Language (HTML), which has been the principal document format of the WWW.

The W3C is overseeing development of XHTML. XHTML 1.0 (January 2000; 2nd ed., August 2002) is a reformulation of HTML 4.01 (12/24/1999). The Schema for XHTML is specified in XML, which makes it easier to process and maintain.

Some of the main differences between XHTML and HTML are:

XHTML documents must be well-formed, i.e., have proper nesting with no overlapping tags. For example, the expression "<b>Bold<i>Bold-Italic</b>Italic</i>" is not well-formed.
XHTML markup is case-sensitive, with elements and attribute names in lower-case.
XHTML non-empty elements must have end tags. In HTML, the paragraph tag "<p>" was often used without an end tag.
XHTML empty elements can have a special start and end tag, e.g., "<br />".

There are issues of older browsers with XHTML, so authors should be aware of transition compatibility issues. XHTML reformulates HTML 4.01 in XML using three alternative document type definitions: XHTML-1.0-Strict, XHTML-1.0-Transitional, and XHTML-1.0-Frameset.

The W3C runs an XHTML validation service that will check that documents are valid according the the XHTML specification.

The W3C is working on Modularization of XHTML, beginning with XHTML 1.1 - Module-based XHTML (May 2001). XHTML 2 is in its sixth Working Draft (July 2004). Some of the issues for future XHTML evolution are:

Modularity - Subsetting XHTML
Extensibility - Adding to XHTML
Document Profiles - Customizing for document sets

XHTML 2.0 is being developed to have greater support for rich, portable web-based applications. XHTML 2.0 is based on HTML 4, XHTML 1.0, and XHTML 1.1, but is not intended to be backward compatible.

Markup Languages

The purpose of markup is to communicate metadata for a document, i.e., data about the data in the document.

Markup languages typically use tags to delimit and describe pieces of a document.

The Generalized Markup Language (GML) was developed at IBM in 1969. GML was a self-referential meta-language for arbitrary data, i.e., it could describe languages, grammars, and vocabularies for markup.

The Standardized Generalized Markup Language (SGML) developed from GML and was adopted as a standard (ISO 8879) in 1986. Here is a simple example of an SGML document first describing the structure of the document and then specifying the content.

      <!DOCTYPE email [
      <!ELEMENT email 0 0 ((to & from & date & subject?), text) >
      <!ELEMENT text - 0 (para+) >
      <!ELEMENT para 0 0 (#PCDATA) >
      <!ELEMENT (to, from, date, subject) - 0 (#PCDATA) >
      ]>
      <date>10/12/99
      >to>you@yours.com
      <from>me@mine.com
      <text>I just mailed to say...

HTML is a simple SGML-based language widely used for creating documents for the WWW. HTML is not a meta-language in that it cannot be used to describe languages for markup. This page is a simple example of an HTML document, providing meta-data and structured information. XHTML is the evolution of HTML.

Simple Example

Here is a simple XHTML example:

<?xml version = "1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns = "http://www.w3.org/1999/xhtml">
   <head>
      <title>Internet and WWW How to Program - Welcome</title>
   </head>
   <body>
      <p>Welcome to XHTML!</p>
   </body>
</html>

Note the document is actually an XML document that conforms to the XHTML Document Type Description (DTD). We will look at XML later in the course.

The HTML document (with XML namespace) has an html element with two child elements: a head and a body. This head has a single element: the title. This body has a single element: a paragraph.

Tag Structure

Elements have start tags of the form "<tagname>" and end tags of the form "</tagname>". The contents of the element are between the start and end tags. Empty tags (with no contents) may have an alternate form "<tagname/>".

Elements may have attributes with names and values given in the start tag. The form of the start tag is: <tagname attribute1name="attribute1value" attribute2name="attribute2value">

Comments are enclosed between "<--" and "-->" and may not contain "--" or end with "-".

There are many XHTML tags. There are many useful online sources for information about XHTML. One useful reference is http://www.w3schools.com/xhtml/xhtml_reference.asp.

Document Structure

An XHTML document must have the <!DOCTYPE> element and an <html> element. The <html> element contains a head and a body.

Head Elements

The <head> element has meta-data about the entire document. It can contain child elements, including:

title - document title
base - the base URL for the page
meta - generic element for metadata, used by search engines to index documents

Other elements that may appear in the head for scripting, styling, linking, etc.

Body Elements

The body contains the document with markup to provide information about parts of the document.

Some body attributes are:

background - URI of background image
bgcolor - background color (RGB)
text - text color
link, vlink, alink - colors for hyperlinks, visited links, and activated links

Some elements for defining blocks of text in the body are:

p - paragraph
div - page section

Text Formatting Elements

Some elements for formatting sections of text are:

br - line break
hr - horizontal rule
blockquote - indented block of quoted text
img - for including images with src attribute
map - for creating image maps
pre - preformatted text, leave whitespace

Lists

XHTML supports unordered (bulleted) lists, ordered (numbered) lists, and definition lists.

h1, h2, h3, h4, h5, h6 - section header text
ul, ol, li - unordered list, ordered list, list item
dl, dt, dd - definition list, term, definition

Lists can be nested.

Font Elements

Some elements for specifying font or special text are:

tt - fixed width font
i - italics
b - bold
em - emphasis (often italics)
strong - strong (often bold)
cite - citation (usually italics)
code - program code (usually fixed width)
sub - subscript
sup - superscript
small, big - font size

Tables

XHTML supports tables with the table element. Table sub-elements include:

th - table header cell
tr - table row
td - table data cell

There are various table attributes for setting borders, spacing, alignment, etc. There are other elements for setting captions, header and footers, grouping columns, etc.

Framesets

Frame sets allow the division of pages into separate regions, for example a common use of frames is to create a menu column on the left side of a page, a top frame for general presentation, and a bottom frame for content. This approach is used in the UNL CSE WWW pages.

Forms

The XHTML <form> element allows for user input elements. Some related elements are:

input - form control
select - option selector
textarea - multi-line text field
label - form label
button - push button
optgroup - option group
option - selectable choice

We will use forms frequently, so closely study this material.

Hyperlinks and Anchors

Of course, support for links between documents is a central feature of XHTML. The anchor tag <a> sets the origin of the link and the href attribute defines the target of the link.

Styling

The <style> element allows inclusion of styling specifications. We will look at CSS in particular.

Scripting

The <script> element allows inclusion of program scripts to be executed in the browser or user-agent environment. We will look at JavaScript in particular.

Examples with Forms

Here is an example from the textook with a simple form. Here is the same example using the formresponder.cgi.

Here is an example from Deitel et al. with a more complex form.

Here is a third example from Deitel et al. illustrating forms.

Here is an example from Deitel et al. using framesets.

Document Tree

One great advantage of the well-formedness of XHTML (that is not necessarily true of HTML) is that the document can be structured in a tree. Because all elements of an XHTML document are completely contained in another element (except the root <html> element), each element can be structured as the child node of the element that contains it.

For example, the elements of the XHTML example:

<html xmlns = "http://www.w3.org/1999/xhtml">
   <head>
      <title>Internet and WWW How to Program - Welcome</title>
   </head>
   <body>
      <p>Welcome to XHTML!</p>
   </body>
</html>

can be structured in the following tree:

    html
        - head
            - title
        - body
            - p

This tree has only element nodes, but the document tree can be extended to have different types of children, including attributes, text content, and other node types in addition to element nodes. We will learn more about this approach in the Document Object Model (DOM).