XPath is a language used to search for information in XML documents. The following describes how to parse xml using xpath and dom4j in java. for details, refer to the following four methods for parsing XML files.
There are four classic methods to parse XML files. There are two basic parsing methods: SAX and DOM. SAX is based on event stream parsing, and DOM is based on XML document tree structure parsing. On this basis, in order to reduce the amount of DOM and SAX encoding, JDOM emerged. the advantage of JDOM is that the 20-80 principle (the Pareto rule) greatly reduces the amount of code. In general, JDOM meets the simple functions to be implemented, such as parsing and creation. However, at the underlying layer, JDOM still uses the SAX (most commonly used), DOM, and Xanan documents. DOM4J is a very good Java xml api with excellent performance, powerful functionality, and extreme ease of use. it is also an open source software. Now you can see that more and more Java software are using DOM4J to read and write XML. it is particularly worth mentioning that Sun's JAXM is also using DOM4J. The use of the four methods will be described in detail in Baidu.
2. Brief introduction to XPath
XPath is a language used to search for information in XML documents. XPath is used to navigate through elements and attributes in XML documents and traverse elements and attributes. XPath is the main element of W3C XSLT standards, and XQuery and XPointer are also built on XPath expressions. Therefore, understanding XPath is the basis of many advanced XML applications. XPath is very similar to the SQL language for database operations, or JQuery, which allows developers to easily grasp what is needed in the document. DOM4J also supports the use of XPath.
3. use XPath in DOM4J
DOM4J uses XPath to parse XML documents. First, you must reference two JAR packages in the project:
Dom4j-1.6.1.jar: DOM4J software package, http://sourceforge.net/projects/dom4j /;
Jaxen-xx.xx.jar: This package is usually not added, causing exceptions (java. lang. NoClassDefFoundError: org/jaxen/JaxenException), http://www.jaxen.org/releases.html.
3.1 namespace interference
When processing an xml file that is converted from an excel file or another format file, the result cannot be obtained through XPath parsing. This is usually caused by the existence of the namespace. The following XML file is used as an example to perform simple search using XPath = "// Workbook/Worksheet/Table/Row [1]/Cell [1]/Data [1, usually there is no result. This is caused by namespace (xmlns = "urn: schemas-microsoft-com: office: spreadsheet.
Code knock
Sunny
3.2 XPath parsing of xml files with namespaces
Method 1 (read1 () function): use local-name () and namespace-uri () in the XPath syntax to specify the node name and namespace you want to use. Writing XPath expressions is troublesome.
Method 2 (read2 () function): Set the XPath namespace and use the setNamespaceURIs () function.
Method 3 (read3 (): Set the namespace of DocumentFactory (). The setXPathNamespaceURIs () function is used (). The XPath expressions of the two and three methods are relatively simple to write.
The fourth method (read4 () function): The method is the same as the third method, but the XPath expression is different (the specific embodiment of the program), mainly to verify the differences between the XPath expressions, mainly refers to the degree of completeness, whether it will affect the retrieval efficiency.
(The above four methods use DOM4J and XPath to parse XML files)
Method 5 (read5 () function): parse XML files using DOM and XPath, mainly to verify performance differences.
Nothing can better illustrate the problem than the code! Decisive code!
PackageXPath; importjava. io. IOException; importjava. io. inputStream; importjava. util. hashMap; importjava. util. list; importjava. util. map; importjavax. xml. parsers. documentBuilder; importjavax. xml. parsers. documentBuilderFactory; importjavax. xml. parsers. parserConfigurationException; importjavax. xml. xpath. XPathConstants; importjavax. xml. xpath. XPathExpression; importjavax. xml. xpath. XPathExpressionException; importjavax. xml. xpath. XPathFactory; importorg. dom4j. document; importorg. dom4j. extends entexception; importorg. dom4j. element; importorg. dom4j. XPath; importorg. dom4j. io. SAXReader; importorg. w3c. dom. nodeList; importorg. xml. sax. SAXException;/*** DOM4JDOMXMLXPath */publicclassTestDom4jXpath {publicstaticvoidmain (String [] args) {read1 (); read2 (); read3 (); read4 (); // read3 () the method is the same, but the XPath expression is different from read5 ();} publicstaticvoidread1 () {/** uselocal-name () andnamespace-uri () inXPath */try {longstartTime = System. currentTimeMillis (); SAXReaderreader = newSAXReader (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); Documentdoc = reader. read (in);/* Stringxpath = "// * [local-name () = 'workbook' andnamespace-uri () = 'urn: schemas-microsoft-com: office: spreadsheet '] "+"/* [local-name () = 'worksheet'] "+"/* [local-name () = 'table'] "+"/* [local-name () = 'row'] [4] "+"/* [local-name () = 'cell '] [3] "+"/* [local-name () = 'data'] [1] "; */Stringxpath = "// * [local-name () = 'row'] [4]/* [local-name () = 'cell '] [3]/* [local-name () = 'data'] [1] "; System. err. println ("===== uselocal-name () andnamespace-uri () inXPath ==="); System. err. println ("XPath:" + xpath); @ SuppressWarnings ("unchecked") List
List = doc. selectNodes (xpath); for (Objecto: list) {Elemente = (Element) o; Stringshow = e. getStringValue (); System. out. println ("show =" + show); longendTime = System. currentTimeMillis (); System. out. println ("program Running time:" + (endTime-startTime) + "ms") ;}} catch (incluentexceptione) {e. printStackTrace () ;}} publicstaticvoidread2 () {/** setxpathnamespace (setNamespaceURIs) */try {longstartTime = System. currentTimeMillis (); Mapmap = newHashMap (); map. put ("Workbook", "urn: schemas-microsoft-com: office: spreadsheet"); SAXReaderreader = newSAXReader (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); Documentdoc = reader. read (in); Stringxpath = "// Workbook: Row [4]/Workbook: Cell [3]/Workbook: Data [1]"; System. err. println ("===== usesetNamespaceURIs () tosetxpathnamespace ==="); System. err. println ("XPath:" + xpath); XPathx = doc. createXPath (xpath); x. setNamespaceURIs (map); @ SuppressWarnings ("unchecked") List
List = x. selectNodes (doc); for (Objecto: list) {Elemente = (Element) o; Stringshow = e. getStringValue (); System. out. println ("show =" + show); longendTime = System. currentTimeMillis (); System. out. println ("program Running time:" + (endTime-startTime) + "ms") ;}} catch (incluentexceptione) {e. printStackTrace () ;}} publicstaticvoidread3 () {/** setDocumentFactory () namespace (setXPathNamespaceURIs) */try {longstartTime = System. currentTimeMillis (); Mapmap = newHashMap (); map. put ("Workbook", "urn: schemas-microsoft-com: office: spreadsheet"); SAXReaderreader = newSAXReader (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); reader. getDocumentFactory (). setXPathNamespaceURIs (map); Documentdoc = reader. read (in); Stringxpath = "// Workbook: Row [4]/Workbook: Cell [3]/Workbook: Data [1]"; System. err. println ("===== usesetXPathNamespaceURIs () tosetDocumentFactory () namespace ==="); System. err. println ("XPath:" + xpath); @ SuppressWarnings ("unchecked") List
List = doc. selectNodes (xpath); for (Objecto: list) {Elemente = (Element) o; Stringshow = e. getStringValue (); System. out. println ("show =" + show); longendTime = System. currentTimeMillis (); System. out. println ("program Running time:" + (endTime-startTime) + "ms") ;}} catch (incluentexceptione) {e. printStackTrace () ;}} publicstaticvoidread4 () {/** is the same as the read3 () method, but the XPath expression is different */try {longstartTime = System. currentTimeMillis (); Mapmap = newHashMap (); map. put ("Workbook", "urn: schemas-microsoft-com: office: spreadsheet"); SAXReaderreader = newSAXReader (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); reader. getDocumentFactory (). setXPathNamespaceURIs (map); Documentdoc = reader. read (in); Stringxpath = "// Workbook: Worksheet/Workbook: Table/Workbook: Row [4]/Workbook: Cell [3]/Workbook: Data [1]"; system. err. println ("===== usesetXPathNamespaceURIs () tosetDocumentFactory () namespace ==="); System. err. println ("XPath:" + xpath); @ SuppressWarnings ("unchecked") List
List = doc. selectNodes (xpath); for (Objecto: list) {Elemente = (Element) o; Stringshow = e. getStringValue (); System. out. println ("show =" + show); longendTime = System. currentTimeMillis (); System. out. println ("program Running time:" + (endTime-startTime) + "ms") ;}} catch (incluentexceptione) {e. printStackTrace () ;}} publicstaticvoidread5 () {/** DOMandXPath */try {longstartTime = System. currentTimeMillis (); DocumentBuilderFactorydbf = DocumentBuilderFactory. newInstance (); dbf. setNamespaceAware (false); DocumentBuilderbuilder = dbf. newDocumentBuilder (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); org. w3c. dom. documentdoc = builder. parse (in); XPathFactoryfactory = XPathFactory. newInstance (); javax. xml. xpath. XPathx = factory. newXPath (); // select the name attribute Stringxpath = "// Workbook/Worksheet/Table/Row [4]/Cell [3]/Data [1]" for all class elements; system. err. println ("===== DomXPath ==="); System. err. println ("XPath:" + xpath); XPathExpressionexpr = x. compile (xpath); NodeListnodes = (NodeList) expr. evaluate (doc, XPathConstants. NODE); for (inti = 0; I
For more articles about xml parsing using xpath and dom4j in java, refer to the PHP Chinese website!