Web crawler _xpath Learning (1)

Last Update:2016-04-02 Source: Internet

Author: User

Tags xslt xpath contains xquery

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

(1) Introduction:

XPath is a language for finding information in an XML document, and XPath can be used to traverse elements and attributes in an XML document.

XPath is the main element of the XSLT standard, and XQuery and XPointer are built on top of the XPath expression at the same time.

Therefore, the understanding of XPath is the foundation of many advanced XML applications.

XPath is the XML Path language, which is a language used to determine the location of a portion of an XML (a subset of standard generic Markup Language) documents. XPath is an XML-based tree structure that provides the ability to find nodes in a data structure tree. At first, the intention of XPath was to use it as a common grammatical model between XPointer and XSL. But XPath is quickly used by developers as a small query language.

What is XPath

*xpath navigating in an XML document using a path expression

*xpath contains a library of standard functions

*xpath is the main element in XSLT

*xpath is a standard

An XPath path expression

XPath uses a path expression to pick a node or set of nodes in an XML document. These path expressions are very similar to the expressions we see in the regular computer file system.

XPath Standard Functions

XPath contains more than 100 built-in functions. These functions are used for string values, numeric, date and time comparisons, node and QName processing, sequence processing, logical values, and so on.

XPath is used in XSLT

XPath is the primary element in the XSLT standard. Without knowledge of XPath, you cannot create an XSLT document.

Both XQuery and XPointer are built on top of an XPath expression. XQuery 1.0 and XPath 2.0 share the same data model and support the same functions and operators.

XPath is the standard

XPath became the world's standard on November 16, 1999.

XPath is designed for use by XSLT, XPointer, and other XML parsing software.

There are currently two versions of XPath1.0 and XPath2.0. Among them, Xpath1.0 was 1999, and the XPATH2.0 standard was established in the year 2007.

(2) XPath node

In XPath, there are seven types of nodes: elements, attributes, text, namespaces, processing instructions, annotations, and document nodes (or become root nodes).

Nodes (node)

The XML document is treated as a node tree. The root of a tree is called a document node or root node.

Take a look at the following XML document:

1 <?XML version= "1.0" encoding= "Iso-8859-1"?>2 3 <Bookstore>4 5 < Book>6   <titleLang= "en">Harry Potter</title>7   <author>J K. Rowling</author> 8   < Year>2005</ Year>9   < Price>29.99</ Price>Ten </ Book> One  A </Bookstore>

　　For an example of a node in the XML document above:

 <bookstore>  (document node)<author>J K. Rowling</author>   (element node) lang= "en" (Attribute node)

Base value (or atomic value, Atomic value)

The base value is a node that has no parent or no child.

Examples of basic values:

J K. Rowling "en"

Project (item)

A project is a base value or node.

Node relationships

* Father (parent)

Each element and attribute has a parent.

In the following example, the book element is the parent of the title, author, year, and price elements:

1 < Book>2   <title>Harry Potter</title>3   <author>J K. Rowling</author>4   < Year>2005</ Year>5   < Price>29.99</ Price>6 </ Book>

* SUB (children)

An element node can have 0, one, or more of a child.

In the example above, the title, author, year, and price elements are the children of the book element.

* Compatriots (Sibling)

Nodes that have the same parent

In the example above, the title, author, year, and price elements are all compatriots:

* Ancestors (Ancestor)

The parent of a node, parent, and so on.

In the following example, the ancestor of the title element is the book element and the bookstore element:

1 <Bookstore>2 3 < Book>4   <title>Harry Potter</title>5   <author>J K. Rowling</author>6   < Year>2005</ Year>7   < Price>29.99</ Price>8 </ Book>9 Ten </Bookstore>

Descendants (descendant)

The child of a node, the child of a child, and so on.

In the example above, the descendants of bookstore are the book, title, author, year, and price elements.

(3) XPath syntax

XPath uses a path expression to pick a node or set of nodes in an XML document. A node is picked up either along a path or a step (steps).

XML Instance Document

We will use this XML document in the following example:

1 <?XML version= "1.0" encoding= "Iso-8859-1"?>2 3 <Bookstore>4 5 < Book>6   <titleLang= "Eng">Harry Potter</title>7   < Price>29.99</ Price>8 </ Book>9 Ten < Book> One   <titleLang= "Eng">Learning XML</title> A   < Price>39.95</ Price> - </ Book> -  the </Bookstore>

Select a node

XPath uses a path expression to select a node in the XML document. A node is selected by a path or step.

The most useful path expressions are listed below:

Instance

In the table below, we have listed some path expressions and the results of the expressions:

predicate (predicates)

To find a particular node or a node that contains a specified value.

The predicate is embedded in square brackets.

Instance

In the table below, we have listed some path expressions and the results of the expressions:

Select Unknown node

XPath wildcard characters can be used to select unknown XML elements:

Instance

In the table below, we have listed some path expressions and the results of the expressions:

Select several paths

By using the ' | ' in a path expression operator, you can select a number of paths.

Instance:

Web crawler _xpath Learning (1)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More