Java parses xml using xpath and dom4j

Source: Internet
Author: User
XPath is a language used to search for information in XML documents. The following describes how to parse xml using xpath and dom4j in java. for details, refer to the following four methods for parsing XML files.

There are four classic methods to parse XML files. There are two basic parsing methods: SAX and DOM. SAX is based on event stream parsing, and DOM is based on XML document tree structure parsing. On this basis, in order to reduce the amount of DOM and SAX encoding, JDOM emerged. the advantage of JDOM is that the 20-80 principle (the Pareto rule) greatly reduces the amount of code. In general, JDOM meets the simple functions to be implemented, such as parsing and creation. However, at the underlying layer, JDOM still uses the SAX (most commonly used), DOM, and Xanan documents. DOM4J is a very good Java xml api with excellent performance, powerful functionality, and extreme ease of use. it is also an open source software. Now you can see that more and more Java software are using DOM4J to read and write XML. it is particularly worth mentioning that Sun's JAXM is also using DOM4J. The use of the four methods will be described in detail in Baidu.

2. Brief introduction to XPath

XPath is a language used to search for information in XML documents. XPath is used to navigate through elements and attributes in XML documents and traverse elements and attributes. XPath is the main element of W3C XSLT standards, and XQuery and XPointer are also built on XPath expressions. Therefore, understanding XPath is the basis of many advanced XML applications. XPath is very similar to the SQL language for database operations, or JQuery, which allows developers to easily grasp what is needed in the document. DOM4J also supports the use of XPath.

3. use XPath in DOM4J

DOM4J uses XPath to parse XML documents. First, you must reference two JAR packages in the project:

Dom4j-1.6.1.jar: DOM4J software package, http://sourceforge.net/projects/dom4j /;

Jaxen-xx.xx.jar: This package is usually not added, causing exceptions (java. lang. NoClassDefFoundError: org/jaxen/JaxenException), http://www.jaxen.org/releases.html.

3.1 namespace interference

When processing an xml file that is converted from an excel file or another format file, the result cannot be obtained through XPath parsing. This is usually caused by the existence of the namespace. The following XML file is used as an example to perform simple search using XPath = "// Workbook/Worksheet/Table/Row [1]/Cell [1]/Data [1, usually there is no result. This is caused by namespace (xmlns = "urn: schemas-microsoft-com: office: spreadsheet.

   
      
     
       
     Code knock  
           
   
     
       
     Sunny  
           
   
   
 

3.2 XPath parsing of xml files with namespaces

Method 1 (read1 () function): use local-name () and namespace-uri () in the XPath syntax to specify the node name and namespace you want to use. Writing XPath expressions is troublesome.

Method 2 (read2 () function): Set the XPath namespace and use the setNamespaceURIs () function.

Method 3 (read3 (): Set the namespace of DocumentFactory (). The setXPathNamespaceURIs () function is used (). The XPath expressions of the two and three methods are relatively simple to write.

The fourth method (read4 () function): The method is the same as the third method, but the XPath expression is different (the specific embodiment of the program), mainly to verify the differences between the XPath expressions, mainly refers to the degree of completeness, whether it will affect the retrieval efficiency.

(The above four methods use DOM4J and XPath to parse XML files)

Method 5 (read5 () function): parse XML files using DOM and XPath, mainly to verify performance differences.

Nothing can better illustrate the problem than the code! Decisive code!

PackageXPath; importjava. io. IOException; importjava. io. inputStream; importjava. util. hashMap; importjava. util. list; importjava. util. map; importjavax. xml. parsers. documentBuilder; importjavax. xml. parsers. documentBuilderFactory; importjavax. xml. parsers. parserConfigurationException; importjavax. xml. xpath. XPathConstants; importjavax. xml. xpath. XPathExpression; importjavax. xml. xpath. XPathExpressionException; importjavax. xml. xpath. XPathFactory; importorg. dom4j. document; importorg. dom4j. extends entexception; importorg. dom4j. element; importorg. dom4j. XPath; importorg. dom4j. io. SAXReader; importorg. w3c. dom. nodeList; importorg. xml. sax. SAXException;/*** DOM4JDOMXMLXPath */publicclassTestDom4jXpath {publicstaticvoidmain (String [] args) {read1 (); read2 (); read3 (); read4 (); // read3 () the method is the same, but the XPath expression is different from read5 ();} publicstaticvoidread1 () {/** uselocal-name () andnamespace-uri () inXPath */try {longstartTime = System. currentTimeMillis (); SAXReaderreader = newSAXReader (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); Documentdoc = reader. read (in);/* Stringxpath = "// * [local-name () = 'workbook' andnamespace-uri () = 'urn: schemas-microsoft-com: office: spreadsheet '] "+"/* [local-name () = 'worksheet'] "+"/* [local-name () = 'table'] "+"/* [local-name () = 'row'] [4] "+"/* [local-name () = 'cell '] [3] "+"/* [local-name () = 'data'] [1] "; */Stringxpath = "// * [local-name () = 'row'] [4]/* [local-name () = 'cell '] [3]/* [local-name () = 'data'] [1] "; System. err. println ("===== uselocal-name () andnamespace-uri () inXPath ==="); System. err. println ("XPath:" + xpath); @ SuppressWarnings ("unchecked") List
 
  
List = doc. selectNodes (xpath); for (Objecto: list) {Elemente = (Element) o; Stringshow = e. getStringValue (); System. out. println ("show =" + show); longendTime = System. currentTimeMillis (); System. out. println ("program Running time:" + (endTime-startTime) + "ms") ;}} catch (incluentexceptione) {e. printStackTrace () ;}} publicstaticvoidread2 () {/** setxpathnamespace (setNamespaceURIs) */try {longstartTime = System. currentTimeMillis (); Mapmap = newHashMap (); map. put ("Workbook", "urn: schemas-microsoft-com: office: spreadsheet"); SAXReaderreader = newSAXReader (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); Documentdoc = reader. read (in); Stringxpath = "// Workbook: Row [4]/Workbook: Cell [3]/Workbook: Data [1]"; System. err. println ("===== usesetNamespaceURIs () tosetxpathnamespace ==="); System. err. println ("XPath:" + xpath); XPathx = doc. createXPath (xpath); x. setNamespaceURIs (map); @ SuppressWarnings ("unchecked") List
  
   
List = x. selectNodes (doc); for (Objecto: list) {Elemente = (Element) o; Stringshow = e. getStringValue (); System. out. println ("show =" + show); longendTime = System. currentTimeMillis (); System. out. println ("program Running time:" + (endTime-startTime) + "ms") ;}} catch (incluentexceptione) {e. printStackTrace () ;}} publicstaticvoidread3 () {/** setDocumentFactory () namespace (setXPathNamespaceURIs) */try {longstartTime = System. currentTimeMillis (); Mapmap = newHashMap (); map. put ("Workbook", "urn: schemas-microsoft-com: office: spreadsheet"); SAXReaderreader = newSAXReader (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); reader. getDocumentFactory (). setXPathNamespaceURIs (map); Documentdoc = reader. read (in); Stringxpath = "// Workbook: Row [4]/Workbook: Cell [3]/Workbook: Data [1]"; System. err. println ("===== usesetXPathNamespaceURIs () tosetDocumentFactory () namespace ==="); System. err. println ("XPath:" + xpath); @ SuppressWarnings ("unchecked") List
   
    
List = doc. selectNodes (xpath); for (Objecto: list) {Elemente = (Element) o; Stringshow = e. getStringValue (); System. out. println ("show =" + show); longendTime = System. currentTimeMillis (); System. out. println ("program Running time:" + (endTime-startTime) + "ms") ;}} catch (incluentexceptione) {e. printStackTrace () ;}} publicstaticvoidread4 () {/** is the same as the read3 () method, but the XPath expression is different */try {longstartTime = System. currentTimeMillis (); Mapmap = newHashMap (); map. put ("Workbook", "urn: schemas-microsoft-com: office: spreadsheet"); SAXReaderreader = newSAXReader (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); reader. getDocumentFactory (). setXPathNamespaceURIs (map); Documentdoc = reader. read (in); Stringxpath = "// Workbook: Worksheet/Workbook: Table/Workbook: Row [4]/Workbook: Cell [3]/Workbook: Data [1]"; system. err. println ("===== usesetXPathNamespaceURIs () tosetDocumentFactory () namespace ==="); System. err. println ("XPath:" + xpath); @ SuppressWarnings ("unchecked") List
    
     
List = doc. selectNodes (xpath); for (Objecto: list) {Elemente = (Element) o; Stringshow = e. getStringValue (); System. out. println ("show =" + show); longendTime = System. currentTimeMillis (); System. out. println ("program Running time:" + (endTime-startTime) + "ms") ;}} catch (incluentexceptione) {e. printStackTrace () ;}} publicstaticvoidread5 () {/** DOMandXPath */try {longstartTime = System. currentTimeMillis (); DocumentBuilderFactorydbf = DocumentBuilderFactory. newInstance (); dbf. setNamespaceAware (false); DocumentBuilderbuilder = dbf. newDocumentBuilder (); InputStreamin = TestDom4jXpath. class. getClassLoader (). getResourceAsStream ("XPath \ XXX. xml "); org. w3c. dom. documentdoc = builder. parse (in); XPathFactoryfactory = XPathFactory. newInstance (); javax. xml. xpath. XPathx = factory. newXPath (); // select the name attribute Stringxpath = "// Workbook/Worksheet/Table/Row [4]/Cell [3]/Data [1]" for all class elements; system. err. println ("===== DomXPath ==="); System. err. println ("XPath:" + xpath); XPathExpressionexpr = x. compile (xpath); NodeListnodes = (NodeList) expr. evaluate (doc, XPathConstants. NODE); for (inti = 0; I
     
      

For more articles about xml parsing using xpath and dom4j in java, refer to the PHP Chinese website!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.