The expression language used by XPath consists of two parts: Expression Language and path search expression of xpath.
1. Expression Language, which consists of +,-, *,/, or, and, not, and numeric values, strings, and functions. It complies with the regular syntax.
2. the XPath path search expression is used to search for an XML node in an XML document. for example,/books/book, search for all book byte points under the books node. for more information, see the description of xpath in www.w3c.org.
To implement a complete XPath parser, the basic is to implement Expression Language parsing and XPath path search. Because only the XPath path search section is implemented currently, we will only briefly introduce the implementation of path search.
the XPath path expression consists of three main components:
separator: "//", "/", '[', ']', ": ",...
node name: for example, the/books/book path. Both books and books are node names.
attribute name: for example,/books/book [@ Title = "123 *"]. Title is an attribute name.
the XPath path expression fully complies with the regular expression, there are three methods to implement lexical analysis and syntax analysis: state machine syntax analysis, regular expression matching, and Lex/YACC. the implementation and maintenance of state machine syntax analysis are very convenient. The disadvantage is that Code implementation is not very easy to expand; Regular Expression matching is easy to implement, it is also easy to expand. The disadvantage is poor performance. Lex/YACC is also easy to implement and expand. The disadvantage is that it does not support Unicode, while XPath is an integral part of XML. Unicode is a basic requirement. Because the XPath syntax has been quite complete, I chose the state machine method to implement lexical and syntax analysis. The simple architecture is as follows:
xpathtoken is used to extract a token from the input XPath expression, which may be a separator, a node name, or an attribute name.
xpathparser, use xpathtoken to read each token, then assemble the token into a meaningful XPath syntax, and perform a syntax check at the same time;
xpathdocument, according to the parsed XPath syntax, search for corresponding nodes from an XML document.
the implementation of xpathdocument is the difficulty of xpath path search. Because xpathdocument needs to traverse XML nodes according to XPath expressions, therefore, it is recommended that XML Parser fully implement the function of traversing the XML tree in xml dom level2, for example:
A simple XML document structure is as follows:
book1
author1
book2
author2
Use the following XPath expression: // book/title. The target is to search for the title of all books.
When the XPath parser parses the XPath expression and performs a search, perform the following steps:
1. Create an xpathparser object and import the XPath expression // book/title.
2. Create an xmldocument object and import the XML document
3. Traverse xmldocument according to xpathparser. First, xmlparser finds "// book", which means to find all nodes named "book. You can use getelementbyname of xmldocument to retrieve all book nodes. xmlparser then finds "/Title", which means to retrieve all the title nodes in the current result set. You need to traverse the current result set and query the title node under it.
The above description is the detail record in the implementation of an XPATH path parser. On the whole, it is easier to implement the XPath path parser, while the Expression Language of xpath is much more complicated, and I am working hard.