Java parses xml using xpath and dom4j

Last Update:2017-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

XPath is a language used to search for information in XML documents. The following describes how to parse xml using xpath and dom4j in java. for details, refer to the following four methods for parsing XML files.

There are four classic methods to parse XML files. There are two basic parsing methods: SAX and DOM. SAX is based on event stream parsing, and DOM is based on XML document tree structure parsing. On this basis, in order to reduce the amount of DOM and SAX encoding, JDOM emerged. the advantage of JDOM is that the 20-80 principle (the Pareto rule) greatly reduces the amount of code. In general, JDOM meets the simple functions to be implemented, such as parsing and creation. However, at the underlying layer, JDOM still uses the SAX (most commonly used), DOM, and Xanan documents. DOM4J is a very good Java xml api with excellent performance, powerful functionality, and extreme ease of use. it is also an open source software. Now you can see that more and more Java software are using DOM4J to read and write XML. it is particularly worth mentioning that Sun's JAXM is also using DOM4J. The use of the four methods will be described in detail in Baidu.

2. Brief introduction to XPath

XPath is a language used to search for information in XML documents. XPath is used to navigate through elements and attributes in XML documents and traverse elements and attributes. XPath is the main element of W3C XSLT standards, and XQuery and XPointer are also built on XPath expressions. Therefore, understanding XPath is the basis of many advanced XML applications. XPath is very similar to the SQL language for database operations, or JQuery, which allows developers to easily grasp what is needed in the document. DOM4J also supports the use of XPath.

3. use XPath in DOM4J

DOM4J uses XPath to parse XML documents. First, you must reference two JAR packages in the project:

Dom4j-1.6.1.jar: DOM4J software package, http://sourceforge.net/projects/dom4j /;

Jaxen-xx.xx.jar: This package is usually not added, causing exceptions (java. lang. NoClassDefFoundError: org/jaxen/JaxenException), http://www.jaxen.org/releases.html.

3.1 namespace interference

When processing an xml file that is converted from an excel file or another format file, the result cannot be obtained through XPath parsing. This is usually caused by the existence of the namespace. The following XML file is used as an example to perform simple search using XPath = "// Workbook/Worksheet/Table/Row [1]/Cell [1]/Data [1, usually there is no result. This is caused by namespace (xmlns = "urn: schemas-microsoft-com: office: spreadsheet.

3.2 XPath parsing of xml files with namespaces

Method 1 (read1 () function): use local-name () and namespace-uri () in the XPath syntax to specify the node name and namespace you want to use. Writing XPath expressions is troublesome.

Method 2 (read2 () function): Set the XPath namespace and use the setNamespaceURIs () function.

Method 3 (read3 (): Set the namespace of DocumentFactory (). The setXPathNamespaceURIs () function is used (). The XPath expressions of the two and three methods are relatively simple to write.

The fourth method (read4 () function): The method is the same as the third method, but the XPath expression is different (the specific embodiment of the program), mainly to verify the differences between the XPath expressions, mainly refers to the degree of completeness, whether it will affect the retrieval efficiency.

(The above four methods use DOM4J and XPath to parse XML files)

Method 5 (read5 () function): parse XML files using DOM and XPath, mainly to verify performance differences.

Nothing can better illustrate the problem than the code! Decisive code!

PackageXPath; importjava. io. IOException; importjava. io. inputStream; importjava. util. hashMap; importjava. util. list; importjava. util. map; importjavax. xml. parsers. documentBuilder; importjavax. xml. parsers. documentBuilderFactory; importjavax. xml. parsers. parserConfigurationException; importjavax. xml. xpath. XPathConstants; importjavax. xml. xpath. XPathExpression; importjavax. xml. xpath. XPathExpressionException; importjavax. xml. xpath. XPathFactory; importorg. dom4j. document; importorg. dom4j. extends entexception; importorg. dom4j. element; importorg. dom4j. XPath; importorg. dom4j. io. SAXReader; importorg. w3c. dom. nodeList; importorg. xml. sax. SAXException;/*** DOM4JDOMXMLXPath */publicclassTestDom4jXpath {publicstaticvoidmain (String [] args) {read1 (); read2 (); read3 (); read4 (); // read3 () the method is the same, but the XPath expression is different from read5 ();} publicstaticvoidread1 () {/** uselocal-name () andnamespace-uri () inXPath */try {longstartTime = System. currentTimeMillis (); SAXReaderreader = newSAXReader (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); Documentdoc = reader. read (in);/* Stringxpath = "// * [local-name () = 'workbook' andnamespace-uri () = 'urn: schemas-microsoft-com: office: spreadsheet '] "+"/* [local-name () = 'worksheet'] "+"/* [local-name () = 'table'] "+"/* [local-name () = 'row'] [4] "+"/* [local-name () = 'cell '] [3] "+"/* [local-name () = 'data'] [1] "; */Stringxpath = "// * [local-name () = 'row'] [4]/* [local-name () = 'cell '] [3]/* [local-name () = 'data'] [1] "; System. err. println ("===== uselocal-name () andnamespace-uri () inXPath ==="); System. err. println ("XPath:" + xpath); @ SuppressWarnings ("unchecked") List
 
  
List = doc. selectNodes (xpath); for (Objecto: list) {Elemente = (Element) o; Stringshow = e. getStringValue (); System. out. println ("show =" + show); longendTime = System. currentTimeMillis (); System. out. println ("program Running time:" + (endTime-startTime) + "ms") ;}} catch (incluentexceptione) {e. printStackTrace () ;}} publicstaticvoidread2 () {/** setxpathnamespace (setNamespaceURIs) */try {longstartTime = System. currentTimeMillis (); Mapmap = newHashMap (); map. put ("Workbook", "urn: schemas-microsoft-com: office: spreadsheet"); SAXReaderreader = newSAXReader (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); Documentdoc = reader. read (in); Stringxpath = "// Workbook: Row [4]/Workbook: Cell [3]/Workbook: Data [1]"; System. err. println ("===== usesetNamespaceURIs () tosetxpathnamespace ==="); System. err. println ("XPath:" + xpath); XPathx = doc. createXPath (xpath); x. setNamespaceURIs (map); @ SuppressWarnings ("unchecked") List
  
   
List = x. selectNodes (doc); for (Objecto: list) {Elemente = (Element) o; Stringshow = e. getStringValue (); System. out. println ("show =" + show); longendTime = System. currentTimeMillis (); System. out. println ("program Running time:" + (endTime-startTime) + "ms") ;}} catch (incluentexceptione) {e. printStackTrace () ;}} publicstaticvoidread3 () {/** setDocumentFactory () namespace (setXPathNamespaceURIs) */try {longstartTime = System. currentTimeMillis (); Mapmap = newHashMap (); map. put ("Workbook", "urn: schemas-microsoft-com: office: spreadsheet"); SAXReaderreader = newSAXReader (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); reader. getDocumentFactory (). setXPathNamespaceURIs (map); Documentdoc = reader. read (in); Stringxpath = "// Workbook: Row [4]/Workbook: Cell [3]/Workbook: Data [1]"; System. err. println ("===== usesetXPathNamespaceURIs () tosetDocumentFactory () namespace ==="); System. err. println ("XPath:" + xpath); @ SuppressWarnings ("unchecked") List
   
    
List = doc. selectNodes (xpath); for (Objecto: list) {Elemente = (Element) o; Stringshow = e. getStringValue (); System. out. println ("show =" + show); longendTime = System. currentTimeMillis (); System. out. println ("program Running time:" + (endTime-startTime) + "ms") ;}} catch (incluentexceptione) {e. printStackTrace () ;}} publicstaticvoidread4 () {/** is the same as the read3 () method, but the XPath expression is different */try {longstartTime = System. currentTimeMillis (); Mapmap = newHashMap (); map. put ("Workbook", "urn: schemas-microsoft-com: office: spreadsheet"); SAXReaderreader = newSAXReader (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); reader. getDocumentFactory (). setXPathNamespaceURIs (map); Documentdoc = reader. read (in); Stringxpath = "// Workbook: Worksheet/Workbook: Table/Workbook: Row [4]/Workbook: Cell [3]/Workbook: Data [1]"; system. err. println ("===== usesetXPathNamespaceURIs () tosetDocumentFactory () namespace ==="); System. err. println ("XPath:" + xpath); @ SuppressWarnings ("unchecked") List
    
     
List = doc. selectNodes (xpath); for (Objecto: list) {Elemente = (Element) o; Stringshow = e. getStringValue (); System. out. println ("show =" + show); longendTime = System. currentTimeMillis (); System. out. println ("program Running time:" + (endTime-startTime) + "ms") ;}} catch (incluentexceptione) {e. printStackTrace () ;}} publicstaticvoidread5 () {/** DOMandXPath */try {longstartTime = System. currentTimeMillis (); DocumentBuilderFactorydbf = DocumentBuilderFactory. newInstance (); dbf. setNamespaceAware (false); DocumentBuilderbuilder = dbf. newDocumentBuilder (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); org. w3c. dom. documentdoc = builder. parse (in); XPathFactoryfactory = XPathFactory. newInstance (); javax. xml. xpath. XPathx = factory. newXPath (); // select the name attribute Stringxpath = "// Workbook/Worksheet/Table/Row [4]/Cell [3]/Data [1]" for all class elements; system. err. println ("===== DomXPath ==="); System. err. println ("XPath:" + xpath); XPathExpressionexpr = x. compile (xpath); NodeListnodes = (NodeList) expr. evaluate (doc, XPathConstants. NODE); for (inti = 0; I
     
      For more articles about xml parsing using xpath and dom4j in java, refer to the PHP Chinese website!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java parses xml using xpath and dom4j

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Java parses xml using xpath and dom4j

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support