The eXtensible Markup Language (XML)

XPath

XPath is the XML language for addressing subsets of an XML document.  This is particularly useful in transforming or representing the document (e.g., visually with facilities such as the eXtenisble Stylesheet Language (XSL) and XSL Transformation (XSLT).

XPath syntax resembles computer system filename specifications.  For example, from Essential XML: Beyond Markup, by D. Box et al. (Addison-Wesley, 2000), the specification:
    /guitars/guitar/model
would locate all model elements that are children of the guitar elements, that are themselves children of the root guitars element.

An XPath expression (such as the example above) evalues to a value that must be string, boolean, number, or node-set.  A node-set is an ordered collection of nodes.  Any of these four types can be coerced to a string, boolean, or number.  For example, if a node-set is coerced to a boolean, the value is true if the node-set in non-empty and false if it is empty.  There are XPath functions that operate on each of these value types.

The types of nodes are:
    root
    element
    attribute
    text
    processing instruction
    comment
    namespace

The most important type of XPath expression is location path.  Absolute location paths begin with a forward slash '/', as in the example above, and specify navigation beginning at the root node in the model tree.  Relative location path expressions do not begin with a forward slash and begin at the node where the loaction path is applied.  For example:
    guitar/model

Location paths can be written in verbose or abbreviated format.  The abbreviated format is simpler and saves space, but some features are accessible only using the verbose format.  The forward slashes separate location steps, read left-to-right.  Each step consists of three parts, with this syntax:
    axis-identifier::node-test[predicate1]...
with two colons ("::") between the axis-identifier and the node-test.  For example:
    child::guitar/child::model[attribute::name="number"]

The axis-identifier specifies the direction of the step.  There are 13 axes, 9 forward (child, parent, descendant, descendant-or-self, following-sibling, following, attribute, namespace, and self)  and 4 reverse (ancestor, ancestor-or-self, preceding-sibling, preceding).

The node-test refines the node-set identified by the axis-identifier.  For example, the descendant axis would identify a node-set with all descendant nodes.  The node-test can filter the axis-test node-set by name or by type.

Node-tests by name are the most common.  Names with namespace prefixes are resolved using the evaluation context.  For example:
    child::grind:model
would identifies all model child nodes in the namespace identified by the prefix grind.  There is a wildcard character, so
    child::grind:*
would indentify all child nodes in the namespace identified by grind.

Node-tests by type can also be used.  For example:
    child::text()
identifies all child text nodes of the context node.  Node-tests aren't necessary for attributes and namespaces because there they have separate axes.

Predicates provide additional filtering.  For example:
    /descendant::guitar[child::model/child::text() = 'Strat']
The textbook lists and describes functions for each value type:
    node-set functions
        count()
        id()
        last()
        local-name()
        name()
        namespace-uri()
        position()
    string functions
        concat()
        contains()
        normalize-space()
        starts-with()
        string()
        string-length()
        substring()
        substring-before()
        substring-after ()
    boolean functions
        boolean()
        true()
        false()
        lang()
        not()
    numeric
        ceiling()
        floor()
        number()
        round()
        sum()

XPath expressions can be abbreviated.  In particular, the child keyword as the axis-path specification and the following double colon can be omitted.

XPath 1.0 is the current recommendation (Nov 1999).  The XPath 2.0 Requirements and XPath 2.0 Data Model are Working Drafts (Feb 2001 and June 2001, respectively).  XPath 2.0 should support regular expressions using the notation defined in the XML Schema.  XPath is a foundation for both XSLT (covered next) and XPointer (covered later).  Specifications are being written to access a DOM tree using XPath.