XPath
XPath is the XML language for addressing subsets of an XML document. This is particularly useful in transforming or representing the document (e.g., visually with facilities such as the eXtenisble Stylesheet Language (XSL) and XSL Transformation (XSLT).
XPath syntax resembles computer
system filename specifications. For example, from Essential XML:
Beyond Markup, by D. Box et al. (Addison-Wesley, 2000), the
specification:
/guitars/guitar/model
would locate all model elements
that are children of the guitar elements, that are themselves children
of the root guitars element.
An XPath expression (such as the example above) evalues to a value that must be string, boolean, number, or node-set. A node-set is an ordered collection of nodes. Any of these four types can be coerced to a string, boolean, or number. For example, if a node-set is coerced to a boolean, the value is true if the node-set in non-empty and false if it is empty. There are XPath functions that operate on each of these value types.
The types of nodes are:
root
element
attribute
text
processing instruction
comment
namespace
The most important type of XPath
expression is location path. Absolute location paths begin
with a forward slash '/', as in the example above, and specify navigation
beginning at the root node in the model tree. Relative location path
expressions do not begin with a forward slash and begin at the node where
the loaction path is applied. For example:
guitar/model
Location paths can be written in
verbose or abbreviated format. The abbreviated format is simpler
and saves space, but some features are accessible only using the verbose
format. The forward slashes separate location steps, read left-to-right.
Each step consists of three parts, with this syntax:
axis-identifier::node-test[predicate1]...
with two colons ("::") between
the axis-identifier and the node-test. For example:
child::guitar/child::model[attribute::name="number"]
The axis-identifier specifies the direction of the step. There are 13 axes, 9 forward (child, parent, descendant, descendant-or-self, following-sibling, following, attribute, namespace, and self) and 4 reverse (ancestor, ancestor-or-self, preceding-sibling, preceding).
The node-test refines the node-set identified by the axis-identifier. For example, the descendant axis would identify a node-set with all descendant nodes. The node-test can filter the axis-test node-set by name or by type.
Node-tests by name are the most
common. Names with namespace prefixes are resolved using the evaluation
context. For example:
child::grind:model
would identifies all model child
nodes in the namespace identified by the prefix grind. There is a
wildcard character, so
child::grind:*
would indentify all child nodes
in the namespace identified by grind.
Node-tests by type can also be used.
For example:
child::text()
identifies all child text nodes
of the context node. Node-tests aren't necessary for attributes and
namespaces because there they have separate axes.
Predicates provide additional filtering.
For example:
/descendant::guitar[child::model/child::text()
= 'Strat']
The textbook lists and describes
functions for each value type:
node-set functions
count()
id()
last()
local-name()
name()
namespace-uri()
position()
string functions
concat()
contains()
normalize-space()
starts-with()
string()
string-length()
substring()
substring-before()
substring-after ()
boolean functions
boolean()
true()
false()
lang()
not()
numeric
ceiling()
floor()
number()
round()
sum()
XPath expressions can be abbreviated. In particular, the child keyword as the axis-path specification and the following double colon can be omitted.
XPath 1.0 is the current recommendation
(Nov 1999). The XPath 2.0 Requirements and XPath 2.0 Data Model are
Working Drafts (Feb 2001 and June 2001, respectively). XPath 2.0
should support regular expressions using the notation defined in the XML
Schema. XPath is a foundation for both XSLT (covered next) and XPointer
(covered later). Specifications are being written to access a DOM
tree using XPath.