The eXtensible HyperText Markup Language (XHTML) has been developed to replace the HyperText Markup Language (HTML), which has been the principal document format of the WWW.
The W3C is overseeing development of XHTML. XHTML 1.0 (January 2000; 2nd ed., August 2002) is a reformulation of HTML 4.01 (12/24/1999). The Schema for XHTML is specified in XML, which makes it easier to process and maintain.
Some of the main differences between XHTML and HTML are:
XHTML documents must be well-formed, i.e., have proper nesting with no overlapping tags. For example, the expression "<b>Bold<i>Bold-Italic</b>Italic</i>" is not well-formed.
XHTML markup is case-sensitive, with elements and attribute names in lower-case.
XHTML non-empty elements must have end tags. In HTML, the paragraph tag "<p>" was often used without an end tag.
XHTML empty elements can have a special start and end tag, e.g., "<br />".
There are issues of older browsers with XHTML, so authors should be aware of transition compatibility issues. XHTML reformulates HTML 4.01 in XML using three alternative document type definitions: XHTML-1.0-Strict, XHTML-1.0-Transitional, and XHTML-1.0-Frameset.
The W3C runs an XHTML validation service that will check that documents are valid according the the XHTML specification.
The W3C is working on Modularization of XHTML, beginning with XHTML 1.1 - Module-based XHTML (May 2001). XHTML 2 is in its sixth Working Draft (July 2004). Some of the issues for future XHTML evolution are:
Modularity - Subsetting XHTML
Extensibility - Adding to XHTML
Document Profiles - Customizing for document sets
XHTML 2.0 is being developed to have greater support for rich, portable web-based applications. XHTML 2.0 is based on HTML 4, XHTML 1.0, and XHTML 1.1, but is not intended to be backward compatible.
The purpose of markup is to communicate metadata for a document, i.e., data about the data in the document.
Markup languages typically use tags to delimit and describe pieces of a document.
The Generalized Markup Language (GML) was developed at IBM in 1969. GML was a self-referential meta-language for arbitrary data, i.e., it could describe languages, grammars, and vocabularies for markup.
The Standardized Generalized Markup Language (SGML) developed from GML and was adopted as a standard (ISO 8879) in 1986. Here is a simple example of an SGML document first describing the structure of the document and then specifying the content.
<!DOCTYPE email [
<!ELEMENT email 0 0
((to & from & date & subject?), text) >
<!ELEMENT text - 0
(para+) >
<!ELEMENT para 0 0
(#PCDATA) >
<!ELEMENT (to, from,
date, subject) - 0 (#PCDATA) >
]>
<date>10/12/99
>to>you@yours.com
<from>me@mine.com
<text>I just mailed
to say...
HTML is a simple SGML-based language widely used for creating documents for the WWW. HTML is not a meta-language in that it cannot be used to describe languages for markup. This page is a simple example of an HTML document, providing meta-data and structured information. XHTML is the evolution of HTML.
Here is a simple XHTML example:
<?xml version = "1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!-- (C) Copyright 2002 by Deitel & Associates, Inc. and Prentice Hall. -->
<html xmlns = "http://www.w3.org/1999/xhtml">
<head>
<title>Internet and WWW How to Program - Welcome</title>
</head>
<body>
<p>Welcome to XHTML!</p>
</body>
</html>
Note the document is actually an XML document that conforms to the XHTML Document Type Description (DTD). We will look at XML later in the course.
The HTML document (with XML namespace) has an html element with two child elements: a head and a body. This head has a single element: the title. This body has a single element: a paragraph.
Elements have start tags of the form "<tagname>" and end tags of the form "</tagname>". The contents of the element are between the start and end tags. Empty tags (with no contents) may have an alternate form "<tagname/>".
Elements may have attributes with names and values given in the start tag. The form of the start tag is: <tagname attribute1name="attribute1value" attribute2name="attribute2value">
Comments are enclosed between "<--" and "-->" and may not contain "--" or end with "-".
There are many XHTML tags. There are many useful online sources for information about XHTML. One useful reference is http://www.w3schools.com/xhtml/xhtml_reference.asp.
An XHTML document must have the <!DOCTYPE> element and an <html> element. The <html> element contains a head and a body.
The <head> element has meta-data about the entire document. It can contain child elements, including:
title - document title
base - the base URL for the page
meta - generic element for metadata, used by search engines to index documents
Other elements that may appear in the head for scripting, styling, linking, etc.
The body contains the document with markup to provide information about parts of the document.
Some body attributes are:
background - URI of background image
bgcolor - background color (RGB)
text - text color
link, vlink, alink - colors for hyperlinks, visited links, and activated links
Some elements for defining blocks of text in the body are:
p - paragraph
div - page section
Some elements for formatting sections of text are:
br - line break
hr - horizontal rule
blockquote - indented block of quoted text
img - for including images with src attribute
map - for creating image maps
pre - preformatted text, leave whitespace
XHTML supports unordered (bulleted) lists, ordered (numbered) lists, and definition lists.
h1, h2, h3, h4, h5, h6 - section header text
ul, ol, li - unordered list, ordered list, list item
dl, dt, dd - definition list, term, definition
Lists can be nested.
Some elements for specifying font or special text are:
tt - fixed width font
i - italics
b - bold
em - emphasis (often italics)
strong - strong (often bold)
cite - citation (usually italics)
code - program code (usually fixed width)
sub - subscript
sup - superscript
small, big - font size
XHTML supports tables with the table element. Table sub-elements include:
th - table header cell
tr - table row
td - table data cell
There are various table attributes for setting borders, spacing, alignment, etc. There are other elements for setting captions, header and footers, grouping columns, etc.
Frame sets allow the division of pages into separate regions, for example a common use of frames is to create a menu column on the left side of a page, a top frame for general presentation, and a bottom frame for content. This approach is used in the UNL CSE WWW pages.
The XHTML <form> element allows for user input elements. Some related elements are:
input - form control
select - option selector
textarea - multi-line text field
label - form label
button - push button
optgroup - option group
option - selectable choice
We will use forms frequently, so closely study this material.
Of course, support for links between documents is a central feature of XHTML. The anchor tag <a> sets the origin of the link and the href attribute defines the target of the link.
The <style> element allows inclusion of styling specifications. We will look at CSS in particular.
The <script> element allows inclusion of program scripts to be executed in the browser or user-agent environment. We will look at JavaScript in particular.
Here is an example from the textook with a simple form. Here is the same example using the formresponder.cgi.
Here is an example from Deitel et al. with a more complex form.
Here is a third example from Deitel et al. illustrating forms.
Here is an example from Deitel et al. using framesets.
One great advantage of the well-formedness of XHTML (that is not necessarily true of HTML) is that the document can be structured in a tree. Because all elements of an XHTML document are completely contained in another element (except the root <html> element), each element can be structured as the child node of the element that contains it.
For example, the elements of the XHTML example:
<html xmlns = "http://www.w3.org/1999/xhtml">
<head>
<title>Internet and WWW How to Program - Welcome</title>
</head>
<body>
<p>Welcome to XHTML!</p>
</body>
</html>
can be structured in the following tree:
html
- head
- title
- body
- p
This tree has only element nodes, but the document tree can be extended to have different types of children, including attributes, text content, and other node types in addition to element nodes. We will learn more about this approach in the Document Object Model (DOM).