parsing and generating XML documents using DOM4J in Java _java

Source: Internet
Author: User

First, the preface

DOM4J is a very good Java open source API, mainly used to read and write XML documents, with excellent performance, powerful, and very convenient to use the characteristics. In addition, XML is often used for data exchange carriers, such as calling WebService pass parameters, and data synchronization operations, etc., so using DOM4J parsing XML is very necessary.

Ii. Conditions of preparation

Dom4j.jar

Download Address: http://sourceforge.net/projects/dom4j/

Third, the use of dom4j combat

1. Parsing XML documents

Realize the idea:

<1> returns a Document object after being passed to Saxreader based on the read XML path;

<2> then manipulate the document object to obtain the following node and the child node information;

The specific code is as follows:

Import Java.io.File; 
Import Javaiofileinputstream; 
Import Javaioinputstream; 
Import Javautiliterator; 
 
Import javautillist; 
Import orgdom4jdocument; 
Import Orgdom4jdocumenthelper; 
Import orgdom4jelement; 
 
Import Orgdom4jiosaxreader; /** * Use DOM4J to parse XML documents * @author Administrator */public class Dom4jparsexmldemo {public void parseXml01 ( {try{//convert XML below src to input stream inputstream InputStream = new FileInputStream (New File ("D:/project/dynamicwe  
      B/src/resource/modulexml ")); InputStream InputStream = Thisgetclass () getresourceasstream ("/modulexml");//can also find XML//create Saxreader read based on the class's compiled file relative path 
      , specifically for reading XML saxreader Saxreader = new Saxreader ();  
      According to Saxreader's read rewrite method, it is possible to read either through the InputStream input stream or from the file object to read the//document Document = Saxreaderread (InputStream); Document document = Saxreaderread (new file ("D:/project/dynamicweb/src/resource/modulexml");//must specify the absolute path of the file/ 
      /Alternatively, it is possible to use the XML converter provided by Documenthelper.Document document = Documenthelperparsetext ("<?xml version=\ 0\" encoding=\ "utf-8\" ><modules "id=\" 
       
      ><module> This is the module label text Message </module></modules> ");  
      Gets the root node object Element rootelement = Documentgetrootelement (); SYSTEMOUTPRINTLN ("Root node Name:" + rootelementgetname ());//Get the name of the node systemoutprintln ("How many properties of the root node:" + Rootelementattributecoun T ())//Get the number of node properties Systemoutprintln (the value of the root Node id attribute: + rootelementattributevalue ("id"))//Get the value of the node's property ID Systemoutprin TLN (text within the root node: + rootelementgettext ());//If the element has child nodes, returns an empty string, otherwise the text//rootelementgettext () in the node is returned because the TAB key is used between the label and the label and the 
      Line break layout, this is also a text so show the effect of line wrapping. Systemoutprintln ("The text in the root node (1):" + Rootelementgettexttrim ())//Remove the tab and line breaks between the label and the label, not the space before and after the content systemoutprintln (" Root knot node text content: "+ rootelementgetstringvalue ()); 
       
      Returns textual information for all child nodes of the current node recursively. 
      Gets the child node element element = Rootelementelement ("module"); if (element!= null) {systemoutprintln ("Text of child node:" + Elementgettext ());//Because both child nodes and root nodes are element objects, they operate in the same way}//But in some cases XML is more complex, the specification is not uniform, a node does not exist directly Javalangnullpointer Exception, so get to the element object first to determine whether it is an empty rootelementsetname ("root");//support to modify the node name Systemoutprintln ("After the root node has been modified 
      The name: "+ rootelementgetname ()); Rootelementsettext ("text"); 
    Also modify the text within the label as well Systemoutprintln ("The text after the root node is modified:" + Rootelementgettext ());  
    catch (Exception e) {eprintstacktrace (); 
    } public static void Main (string[] args) {Dom4jparsexmldemo demo = new Dom4jparsexmldemo (); 
  DemoparseXml01 (); 
 } 
}

In addition, the above XML is below SRC, module01.xml as follows:

<?xml version= "0" encoding= "UTF-8"?> <modules id= 
"123" > 
  <module> This is the text information for the module label </ Module> 

Next executes the main method of the class, and the console effect is as follows:

From this to know:

<1>DOM4J read XML files in a variety of ways;

<2> Remove the text and label names of the element object are very simple;

<3> It is convenient to modify the text and label names of elements, but it is not written to the disk XML file.

This is simply the element that gets the root of the XML, and then loops the document object with the iterator iterator.

The specific code is as follows:

public void ParseXml02 () {try{//convert XML below src to input stream inputstream InputStream = Thisgetclass () getresourceasstrea 
    M ("/modulexml"); 
      Creates a Saxreader reader that is specifically used to read XML Saxreader Saxreader = new Saxreader (); 
       
      According to Saxreader's read rewrite method, it is possible to read either through the InputStream input stream or through the file object to read document document = Saxreaderread (InputStream); 
    Element rootelement = Documentgetrootelement (); 
    iterator<element> modulesiterator = rootelementelements ("module") iterator (); Rootelementelement ("name"); Gets the//rootelementelements ("name") of a child element, gets the collection of Moudule nodes of the root node, and returns the list collection type//rooteleme Ntelements ("module") iterator (), and each element of the returned list is iterated to a child node, all returned to a iterator collection while (Modulesiteratorhasnext ()) {Elem 
      ent moduleelement = Modulesiteratornext (); 
      Element nameelement = moduleelementelement ("name"); 
      Systemoutprintln (Nameelementgetname () + ":" + Nameelementgettext ()); 
      Element valueelement = moduleelementelement ("value"); SystemoutpriNtln (Valueelementgetname () + ":" + Valueelementgettext ()); 
      Element descriptelement = moduleelementelement ("descript"); 
    Systemoutprintln (Descriptelementgetname () + ":" + Descriptelementgettext ());  
    } catch (Exception e) {eprintstacktrace ();  
 }  
}

In addition, the above XML is below SRC, module02.xml as follows:

<?xml version= "1.0" encoding= "UTF-8"?> <modules id= 
"123" > 
  <module>   
    <name>oa </name> 
    <value> System basic Configuration </value> 
    <descript> Basic configuration of the system root directory </descript> 
  </ Module> 

Next executes the main method of the class, and the console effect is as follows:

From this to know:

<1>DOM4J iterative XML child elements are very efficient and convenient;

But the above is just a simple iteration of the XML's child node elements, but if the XML rules are more complex, such as the next module03.xml to be tested, this is as follows:

<?xml version= "1.0" encoding= "UTF-8"?> <modules id= 
"123" > 
   <module> This is the text information for the module label </ module> 
  <module id= "" > 
    <name>oa</name> 
    <value> system basic Configuration </value> 
    <descript> basic configuration of the system root directory </descript> 
    <module> This is the text information for the Child module label </module> 
  </ module> 
   <module> 
    <name> Management configuration </name> 
    <value>none</value> 
    <descript> Management configuration Instructions </descript> 
    <module id= "" "> 
      <name> System Management </name> 
      <value>0</value> 
      <descript>Config</descript> 
      <module id= "" "> 
        <name> Department number </name> 
        <value>20394</value> 
        <descript> number </descript> 
      </module> 
    </module> 
  </module> 
</modules> 

Because their structure is not the same, the direct iteration will be an error:

Java.lang.NullPointerException

So this time need to be careful to use, each time you can not put the elements directly into the iteration. The specific implementation code is as follows:

public void ParseXml03 () {try{//convert XML below src to input stream inputstream InputStream = Thisgetclass () getresourceasstrea 
    M ("/modulexml"); 
      Creates a Saxreader reader that is specifically used to read XML Saxreader Saxreader = new Saxreader (); 
       
      According to Saxreader's read rewrite method, it is possible to read either through the InputStream input stream or through the file object to read document document = Saxreaderread (InputStream); 
      Element rootelement = Documentgetrootelement (); if (rootelementelements ("module")!= null) {//Because the first module tag only has no child nodes in the content, direct iterator () is javalangnullpointerexception, so 
      Need to implement list<element> elementlist = rootelementelements ("module") separately; for (Element element:elementlist) {if (!elementgettexttrim () Equals ("")) {systemoutprintln ("1" "+ El 
        Ementgettexttrim ()); 
          }else{Element nameelement = elementelement ("name"); 
          Systemoutprintln ("2" "+ nameelementgetname () +": "+ Nameelementgettext ()); 
          Element valueelement = elementelement ("value"); SystEmoutprintln ("2" "+ valueelementgetname () +": "+ Valueelementgettext ()); 
          Element descriptelement = elementelement ("descript"); 
           
          Systemoutprintln ("2" "+ descriptelementgetname () +": "+ Descriptelementgettext ()); 
          list<element> subelementlist = elementelements ("module"); for (Element subelement:subelementlist) {if (!subelementgettexttrim () Equals ("")) {SYSTEMOUTP 
            Rintln ("3" "+ Subelementgettexttrim ()); 
              }else{Element subnameelement = subelementelement ("name"); 
              Systemoutprintln ("3" "+ subnameelementgetname () +": "+ Subnameelementgettext ()); 
              Element subvalueelement = subelementelement ("value"); 
              Systemoutprintln ("3" "+ subvalueelementgetname () +": "+ Subvalueelementgettext ()); 
              Element subdescriptelement = subelementelement ("descript"); Systemoutprintln ("" 3 "" + subdescriptelementGetName () + ":" + Subdescriptelementgettext ());  
    catch (Exception e) {eprintstacktrace ()}}}}} 
 }  
}

Next executes the main method of the class, and the console effect is as follows:

OK, this will solve the problem of NULL references in the iteration document.

In addition, the code can be refactored, because the operation of the loop to remove the child elements is repeated, you can use recursion to improve, but the readability will become almost.

If you need to get all the textual information in XML at some point, or other people pass the XML format is not canonical, such as the name of the case in the label, although the XML is case-insensitive, but must be in pairs, so in order to avoid this situation, simply can replace all the label names to uppercase, the specific code is as follows:

public static void Main (string[] args) { 
  String str = ' <?xml version=\ ' 0\ ' encoding=\ ' utf-8\ '? ><modules id=\ "123\" ><module> This is the text information <name>oa</name><value> system basic configuration of module tags </value>< Descript> basic configuration of the system root directory </descript></module></modules> "; 
  Systemoutprintln (Strreplaceall ("<[^<]*>", "_"));  
  Pattern pattern = patterncompile ("<[^<]*>"); 
  Matcher Matcher = Patternmatcher (str); 
  while (Matcherfind ()) { 
    str = strreplaceall (matchergroup (0), Matchergroup (0) toUpperCase ()); 
  Systemoutprintln (str); 
    

After running the effect diagram is as follows:

2. Generate XML Document

DOM4J can parse XML, and it is also sure to generate XML, and it is easier to use.

Realize the idea:

<1>documenthelper provides a way to create a Document object;

<2> manipulate this Document object to add nodes and the text, name, and attribute values under the node;

<3> then writes the encapsulated document object to disk using the XmlWriter recorder;

The specific code is as follows:

Import Java.io.FileWriter; 
Import javaioioexception; 
 
Import Javaiowriter; 
Import orgdom4jdocument; 
Import Orgdom4jdocumenthelper; 
Import orgdom4jelement; 
 
Import Orgdom4jioxmlwriter; 
    /** * Generate XML Document using DOM4J * @author Administrator */public class Dom4jbuildxmldemo {public void build01 () { 
      Try {//documenthelper provides a way to create a Document Object document = Documenthelpercreatedocument (); 
      Add node information Element rootelement = documentaddelement ("modules"); 
      You can continue to add child nodes, or you can specify the content Rootelementsettext ("This is the textual information for the module tag"); 
       
      Element element = rootelementaddelement ("module"); 
      Element nameelement = elementaddelement ("name"); 
      Element valueelement = elementaddelement ("value"); 
      Element descriptionelement = elementaddelement ("description"); 
      Nameelementsettext ("name"); 
      Nameelementaddattribute ("Language", "Java");//Add attribute value Valueelementsettext ("value") to the node; Valueelementaddattribute ("Language", "C#"); 
      Descriptionelementsettext ("description"); 
      Descriptionelementaddattribute ("Language", "SQL Server"); Systemoutprintln (Documentasxml ()); 
      Convert Document object directly to string output Writer FileWriter = new FileWriter ("C:\\modulexml"); 
      DOM4J provides specially written files to the object XMLWriter XMLWriter XMLWriter = new XMLWriter (fileWriter); 
      Xmlwriterwrite (document); 
      Xmlwriterflush (); 
      Xmlwriterclose (); SYSTEMOUTPRINTLN ("XML document added successfully!") 
    "); 
    catch (IOException e) {eprintstacktrace (); 
    } public static void Main (string[] args) {Dom4jbuildxmldemo demo = new Dom4jbuildxmldemo (); 
  DEMOBUILD01 (); 
 } 
}

The effect of running the code is as follows:
Then go to the C disk to see if the creation was successful, and found that the content in the XML file is the same as the output of the console.

In addition, the XML is generated without specifying the encoding format, but the UTF-8 is displayed, indicating that this is the default encoding format, and if you want to specify that you can add document.setxmlencoding ("GBK") before writing to the disk;

The above is the entire content of this article, I hope to help you learn, but also hope that we support the cloud habitat community.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.