Java uses XPath and dom4j parsing Xml_java

Source: Internet
Author: User
Tags xpath

1 Analysis of XML files in 4 ways

There are usually four classical methods for parsing XML files. There are two basic parsing methods, one called Sax and the other called Dom. Sax is based on parsing of event streams, and Dom is based on XML document tree structure parsing. On this basis, in order to reduce the number of DOM, Sax coding, there are jdom, the advantage is that the 20-80 principle (Pareto Law), greatly reducing the amount of code. In general, Jdom is used to meet the requirements of simple functions, such as parsing, creating and so on. But on the ground floor, Jdom still uses sax (most commonly used), DOM, Xanan documents. The other is dom4j, a very, very good Java XML API with excellent performance, powerful features and extreme ease of use, as well as an open source software. Now you can see that more and more Java software is using dom4j to read and write XML, and it is particularly worth mentioning that even Sun's JAXM is using DOM4J. Specific four methods of use, Baidu, there will be a number of detailed introduction.

2 XPath Simple introduction

XPath is a language that looks for information in an XML document. XPath is used to navigate through elements and attributes in an XML document, and to traverse elements and attributes. XPath is the main element of the XSLT standard for the consortium, and XQuery and XPointer are also built on XPath expressions. Therefore, the understanding of XPath is the foundation of many advanced XML applications. XPath is very similar to the SQL language for database operations, or jquery, which makes it easier for developers to grab what they need in a document. DOM4J also supports the use of XPath.

3 dom4j using XPath

DOM4J parsing an XML document using XPath, you first need to refer to two jar packages in your project:

dom4j-1.6.1.jar:dom4j software package, download address http://sourceforge.net/projects/dom4j/;

Jaxen-xx.xx.jar: Usually do not add this package, will throw an exception (java.lang.noclassdeffounderror:org/jaxen/jaxenexception), download the address http:// Www.jaxen.org/releases.html.

3.1 Disturbances of the namespace (namespace)

When you work with an XML file that is converted by an Excel file or other format file, you typically experience a situation where an XPath parsing does not result. This situation is usually caused by the presence of namespaces. For example, the following XML file is used for simple retrieval via xpath= "//workbook/worksheet/table/row[1]/cell[1]/data[1]", usually without results. This is caused by the namespace namespace (xmlns= "Urn:schemas-microsoft-com:office:spreadsheet").

Copy Code code as follows:

<workbook xmlns= "Urn:schemas-microsoft-com:office:spreadsheet" xmlns:o= "Urn:schemas-microsoft-com:office:o ffice "xmlns:x=" Urn:schemas-microsoft-com:office:excel "xmlns:ss=" Urn:schemas-microsoft-com:office:spreadsheet " Xmlns:html= "HTTP://WWW.W3.ORG/TR/REC-HTML40" >
<worksheet ss:name= "Sheet1" >
<table ss:expandedcolumncount= "Bayi" ss:expandedrowcount= "687" x:fullcolumns= "1" x:fullrows= "1" Ss:D Efaultcolumnwidth= "52.5" ss:defaultrowheight= "15.5625" >
<row ss:autofitheight= "0" >
<Cell>
<data ss:type= "String" > Knock Code Mouse </Data>
</Cell>
</Row>
<row ss:autofitheight= "0" >
<Cell>
<data ss:type= "String" >Sunny</Data>
</Cell>
</Row>
</Table>
</Worksheet>
</Workbook>

3.2 XPath parsing of XML files with namespaces

The first method (Read1 () function) specifies the node name and namespace you want to use using the Local-name () and Namespace-uri () that are brought in XPath syntax. XPath expressions are more cumbersome to write.

The second method (Read2 () function): Sets the namespace of the XPath, using the Setnamespaceuris () function.

The third method (Read3 () function): Sets the namespace of the Documentfactory () and uses a function of Setxpathnamespaceuris (). Two and 32 methods of XPath expression writing is relatively simple.

The fourth method (Read4 () function): The method is the same as the third, but the XPath expression is different (the program specifically embodies), mainly to verify the different XPath expressions, mainly refers to the degree of completeness, whether it will affect the efficiency of the search.

(All four of these methods are parsed by DOM4J and XPath to XML files)

The Fifth method (Read5 () function): Parse the XML file with the DOM combined with XPath, primarily to verify performance differences.

Nothing can explain the problem more than the code! Decisively on the code!

Copy Code code as follows:

Packagexpath;
Importjava.io.IOException;
Importjava.io.InputStream;
Importjava.util.HashMap;
Importjava.util.List;
Importjava.util.Map;

Importjavax.xml.parsers.DocumentBuilder;
Importjavax.xml.parsers.DocumentBuilderFactory;
Importjavax.xml.parsers.ParserConfigurationException;
importjavax.xml.xpath.XPathConstants;
Importjavax.xml.xpath.XPathExpression;
Importjavax.xml.xpath.XPathExpressionException;
Importjavax.xml.xpath.XPathFactory;

Importorg.dom4j.Document;
Importorg.dom4j.DocumentException;
Importorg.dom4j.Element;
Importorg.dom4j.XPath;
Importorg.dom4j.io.SAXReader;
Importorg.w3c.dom.NodeList;
Importorg.xml.sax.SAXException;

/**
*dom4jdomxmlxpath
*/
publicclasstestdom4jxpath{
Publicstaticvoidmain (String[]args) {
Read1 ();
Read2 ();
Read3 ();
Read4 (); the//read3 () method is the same, but the XPath expression is different
Read5 ();
}

Publicstaticvoidread1 () {
/*
*uselocal-name () Andnamespace-uri () Inxpath
*/
try{
Longstarttime=system.currenttimemillis ();
Saxreaderreader=newsaxreader ();
Inputstreamin=testdom4jxpath.class.getclassloader (). getResourceAsStream ("Xpath\\xxx.xml");
Documentdoc=reader.read (in);
/*stringxpath= "//*[local-name () = ' Workbook ' andnamespace-uri () = ' Urn:schemas-microsoft-com:office:spreadsheet ']"
+ "/*[local-name () = ' worksheet ']"
+ "/*[local-name () = ' Table ']"
+ "/*[local-name () = ' Row '][4]"
+ "/*[local-name () = ' Cell '][3]"
+ "/*[local-name () = ' Data '][1]";
Stringxpath= "//*[local-name () = ' Row '][4]/*[local-name () = ' Cell '][3]/*[local-name () = ' Data '][1] ';
System.err.println ("=====uselocal-name () Andnamespace-uri () inxpath====");
System.err.println ("XPath:" +xpath);
@SuppressWarnings ("Unchecked")
List<element>list=doc.selectnodes (XPath);
for (objecto:list) {
Elemente= (Element) o;
Stringshow=e.getstringvalue ();
System.out.println ("show=" +show);
Longendtime=system.currenttimemillis ();
SYSTEM.OUT.PRINTLN ("Program Running Time:" + (Endtime-starttime) + "MS");
}
}catch (Documentexceptione) {
E.printstacktrace ();
}
}

Publicstaticvoidread2 () {
/*
*setxpathnamespace (Setnamespaceuris)
*/
try{
Longstarttime=system.currenttimemillis ();
Mapmap=newhashmap ();
Map.put ("Workbook", "Urn:schemas-microsoft-com:office:spreadsheet");
Saxreaderreader=newsaxreader ();
Inputstreamin=testdom4jxpath.class.getclassloader (). getResourceAsStream ("Xpath\\xxx.xml");
Documentdoc=reader.read (in);
Stringxpath= "//workbook:row[4]/workbook:cell[3]/workbook:data[1]";
System.err.println ("=====usesetnamespaceuris () tosetxpathnamespace====");
System.err.println ("XPath:" +xpath);
Xpathx=doc.createxpath (XPath);
X.setnamespaceuris (map);
@SuppressWarnings ("Unchecked")
List<element>list=x.selectnodes (DOC);
for (objecto:list) {
Elemente= (Element) o;
Stringshow=e.getstringvalue ();
System.out.println ("show=" +show);
Longendtime=system.currenttimemillis ();
SYSTEM.OUT.PRINTLN ("Program Running Time:" + (Endtime-starttime) + "MS");
}
}catch (Documentexceptione) {
E.printstacktrace ();
}
}

Publicstaticvoidread3 () {
/*
*setdocumentfactory () namespace (Setxpathnamespaceuris)
*/
try{
Longstarttime=system.currenttimemillis ();
Mapmap=newhashmap ();
Map.put ("Workbook", "Urn:schemas-microsoft-com:office:spreadsheet");
Saxreaderreader=newsaxreader ();
Inputstreamin=testdom4jxpath.class.getclassloader (). getResourceAsStream ("Xpath\\xxx.xml");
Reader.getdocumentfactory (). Setxpathnamespaceuris (map);
Documentdoc=reader.read (in);
Stringxpath= "//workbook:row[4]/workbook:cell[3]/workbook:data[1]";
System.err.println ("=====usesetxpathnamespaceuris () tosetdocumentfactory () namespace====");
System.err.println ("XPath:" +xpath);
@SuppressWarnings ("Unchecked")
List<element>list=doc.selectnodes (XPath);
for (objecto:list) {
Elemente= (Element) o;
Stringshow=e.getstringvalue ();
System.out.println ("show=" +show);
Longendtime=system.currenttimemillis ();
SYSTEM.OUT.PRINTLN ("Program Running Time:" + (Endtime-starttime) + "MS");
}
}catch (Documentexceptione) {
E.printstacktrace ();
}
}

Publicstaticvoidread4 () {
/*
* Same as the Read3 () method, but the XPath expression is different
*/
try{
Longstarttime=system.currenttimemillis ();
Mapmap=newhashmap ();
Map.put ("Workbook", "Urn:schemas-microsoft-com:office:spreadsheet");
Saxreaderreader=newsaxreader ();
Inputstreamin=testdom4jxpath.class.getclassloader (). getResourceAsStream ("Xpath\\xxx.xml");
Reader.getdocumentfactory (). Setxpathnamespaceuris (map);
Documentdoc=reader.read (in);
Stringxpath= "//workbook:worksheet/workbook:table/workbook:row[4]/workbook:cell[3]/workbook:data[1]";
System.err.println ("=====usesetxpathnamespaceuris () tosetdocumentfactory () namespace====");
System.err.println ("XPath:" +xpath);
@SuppressWarnings ("Unchecked")
List<element>list=doc.selectnodes (XPath);
for (objecto:list) {
Elemente= (Element) o;
Stringshow=e.getstringvalue ();
System.out.println ("show=" +show);
Longendtime=system.currenttimemillis ();
SYSTEM.OUT.PRINTLN ("Program Running Time:" + (Endtime-starttime) + "MS");
}
}catch (Documentexceptione) {
E.printstacktrace ();
}
}

Publicstaticvoidread5 () {
/*
*domandxpath
*/
try{
Longstarttime=system.currenttimemillis ();
Documentbuilderfactorydbf=documentbuilderfactory.newinstance ();
Dbf.setnamespaceaware (FALSE);
Documentbuilderbuilder=dbf.newdocumentbuilder ();
Inputstreamin=testdom4jxpath.class.getclassloader (). getResourceAsStream ("Xpath\\xxx.xml");
Org.w3c.dom.documentdoc=builder.parse (in);
Xpathfactoryfactory=xpathfactory.newinstance ();
Javax.xml.xpath.xpathx=factory.newxpath ();
Select the Name property of all class elements
Stringxpath= "//workbook/worksheet/table/row[4]/cell[3]/data[1]";
System.err.println ("=====domxpath====");
System.err.println ("XPath:" +xpath);
Xpathexpressionexpr=x.compile (XPath);
Nodelistnodes= (nodelist) expr.evaluate (Doc,xpathconstants.node);
For (Inti=0;i<nodes.getlength (); i++) {
System.out.println ("show=" +nodes.item (i) getnodevalue ());
Longendtime=system.currenttimemillis ();
SYSTEM.OUT.PRINTLN ("Program Running Time:" + (Endtime-starttime) + "MS");
}
}catch (Xpathexpressionexceptione) {
E.printstacktrace ();
}catch (Parserconfigurationexceptione) {
E.printstacktrace ();
}catch (Saxexceptione) {
E.printstacktrace ();
}catch (Ioexceptione) {
E.printstacktrace ();
}
}
}

PS: Here again for you to provide several online tools on XML operations for your reference to use:

Online Xml/json Mutual Conversion tool:
Http://tools.jb51.net/code/xmljson

Online format xml/on-line compression of XML:
Http://tools.jb51.net/code/xmlformat

XML online compression/formatting tool:
http://tools.jb51.net/code/xml_format_compress

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.