Common methods for updating XML documents in Java programming

Source: Internet
Author: User
Tags format constructor final implement include pack web services java web
Xml| programming This article briefly discusses four common methods for updating XML documents in Java language programming, and analyzes the pros and cons of these four methods. Secondly, the paper also discusses how to control the format of XML document output by Java program.



JAXP is the acronym for the Java API for XML processing, which is a programming interface written in the Java language for XML document processing. JAXP supports standards such as DOM, SAX, and XSLT. To enhance the flexibility of JAXP usage, the developer specifically designed a pluggability Layer for JAXP, with the support of Pluggability Layer, which allows JAXP to implement both the DOM API and the various XML parsers of the SAX API ( XML Parser, such as the Apache Xerces, work together and work together with XSLT processors that perform XSLT standards (XSLT Processor, such as Apache Xalan). The advantage of applying pluggability layer is that we only need to familiarize ourselves with the definition of the various programming interfaces of JAXP, without having a thorough understanding of the specific XML parser and XSLT processor used. For example, in a Java program, using JAXP to invoke the XML parser Apache crimson to process the XML document, if we want to use another XML parser (such as Apache Xerces) to improve the performance of the program, The original program code may not need any change, it can be used directly (all you need to do is add the jar file containing the Apache Xerces code to the environment variable classpath, and it will include Apache The jar file for the Crimson Code is deleted in the environment variable CLASSPATH.



Currently, JAXP has been widely used, which can be said to be the standard API for processing XML documents in the Java language. In learning to use JAXP, some beginners the problem is often raised: The program I wrote updates the DOM tree, but when the program exits, the original XML document doesn't change, or is it the same, how do you implement a synchronous update of the original XML document and Dom tree? I think , there seems to be no corresponding interface/method/class in Jaxp, which is a problem that many beginners are puzzled about. The thrust of this article is to solve this problem by simply introducing several common methods for synchronizing the original XML document and Dom tree. To narrow the scope of the discussion, the XML parsers involved in this article include only Apache Crimson and Apache Xerces, while the XSLT processor uses only Apache Xalan.



Method One: Read and write XML documents directly



This is perhaps the most stupid and original way. After the program acquires the DOM tree, the DOM tree is updated using various methods of the node interface of the DOM model, and the next step should be to update the original XML document. We can use a recursive approach or apply the Treewalker class to traverse the entire DOM tree, while each node/element of the DOM tree is written to a previously open original XML document, and when Dom tree is traversed completely, Dom The tree and the original XML document implement a synchronous update. In practice, this method is rarely used, but if you want to implement your own XML parser programmatically, this method is still possible.



Method Two: Use XmlDocument class



Using the XmlDocument class? There is no such thing in JAXP! is the author mistaken? It is the use of the XmlDocument class, or rather, the write () method of the XmlDocument class.



As mentioned above, JAXP can be used in conjunction with a variety of XML parsers, and this time we chose the XML parser as Apache Crimson. XmlDocument (Org.apache.crimson.tree.XmlDocument) is a class of Apache Crimson that is not included in standard JAXP. No wonder there is no trace of XmlDocument in Jaxp's documents. Now the question is, how do you apply the XmlDocument class to implement the ability to update XML documents? The following three write () methods are provided in the XmlDocument class (according to the latest version of crimson------Apache Crimson 1.1.3):







public void Write (OutputStream out) throws IOException

public void Write (Writer out) throws IOException

public void Write (Writer out, String encoding) throws IOException







The main function of the three write () methods is to output the contents of DOM tree to specific output media, such as file output stream, application program console, and so on. So how do you use the three write () methods? See the following Java program snippet:





String name= "Fancy";

Documentbuilder parser;

Documentbuilderfactory factory = Documentbuilderfactory.newinstance ();

Try

{

Parser = Factory.newdocumentbuilder ();

Document doc = Parser.parse ("User.xml");

Element newlink=doc.createelement (name);

Doc.getdocumentelement (). appendchild (NewLink);

((XmlDocument) doc). Write (New FileOutputStream ("Xuser1.xml"));

}

catch (Exception e)

{

To log it

}







In the above code, you first create a Document object doc, get the full DOM tree, and then apply the node interface's AppendChild () method and append a new node (fancy) to the end of Dom tree. Finally, the write (OutputStream out) method of the XmlDocument class is invoked to output the contents of the DOM tree to the xuser.xml (in fact, it can also be exported to User.xml to update the original XML document, where, for comparison purposes, Output to the Xuser.xml file). It is important to note that the write () method cannot be directly invoked directly against Document object doc because the JAXP document interface does not define any write () methods, so the Doc object must be cast to the XmlDocument object. You can then invoke the Write () method, which uses the write (OutputStream out) method, which uses the default UTF-8 encoding to output the contents of the DOM tree to a specific output medium, if the DOM tree contains Chinese characters, Then the output may be garbled, that is, the so-called "Chinese character problem \", the solution is to use the write (Writer out, String encoding) method, explicitly specify the encoding of the output, such as the second parameter set to "GB2312", then there is no " Chinese character problem \ ", the output result can display the Chinese character normally.



For a complete example, please refer to the following documents: Addrecord.java (see annex), User.xml (see annex). The operating environment for this example is: Windows XP Professional, JDK 1.3.1. In order to be able to compile and run Addrecord.java this program, you need to go to the URL http://xml.apache.org/dist/crimson/to download Apache Crimson, The obtained Crimson.jar files are added to the environment variable CLASSPATH.



Attention:



The predecessor of the Apache Crimson is Sun Project X Parser, which somehow evolved from X Parser to Apache Crimson, and so far many of the code for Apache Crimson have been ported directly from the X Parser. For example, the XmlDocument class used above, it is com.sun.xml.XmlDocument in the X parser, to the Apache Crimson, becomes the Org.apache.crimson.tree.XmlDocument class, in fact, the vast majority of their code is the same, may be package statements and import statements and the beginning of the file lience different. Early JAXP was bundled with X parser, so some older programs used the Com.sun.xml package, and if you recompile them now, you might not be able to pass, and that's why. Later JAXP and the Apache Crimson bundled together, such as JAXP 1.1, if you use JAXP 1.1, you don't need to download the Apache crimson, you can also compile the above example (Addrecord.java) normally. The latest JAXP 1.2 EA (Early Access) is a new way to use the better Apache Xalan and Apache Xerces, respectively, as XSLT processors and XML parsers, not directly supporting Apache Crimson, So if your development environment uses JAXP 1.2 ea or Java XML Pack (which contains JAXP 1.2 ea), then you will not be able to compile the example above directly (Addrecord.java), you need to download and install Apache Crimson Extra.



Method Three: Using Transformerfactory and Transformer classes



The standard way to update the original XML document in JAXP is to invoke the XSLT engine, that is, to use the Transformerfactory and transformer classes. Take a look at the following Java code snippet:





First, you create a Domsource object that can be an argument to a Document object

Doc represents the changed DOM tree.

Domsource doms = new Domsource (DOC);



Creates a file object that represents the output media for the data contained in DOM tree, which is an XML file.

File F = new file ("Xmloutput.xml");



Creates a Streamresult object in which the arguments of the constructor can be taken as a file object.

Streamresult sr = new Streamresult (f);



The following calls the XSLT engine in JAXP to implement the ability to output data from DOM tree to an XML file.

The input of the XSLT engine is the Domsource object, and the output is the Streamresut object.

Try

{

Create a Transformerfactory object first, and then create the transformer object. Transformer

Class is equivalent to an XSLT engine. Normally we use it to process XSL files, but here we make

Use it to output XML documents.

Transformerfactory tf=transformerfactory.newinstance ();

Transformer T=tf.newtransformer ();



The key step is to invoke the transform () method of the Transformer object (the XSLT engine), the first

A parameter is a Domsource object, and the second parameter is a Streamresult object.

T.transform (DOMS,SR);

}

catch (Transformerconfigurationexception Tce)

{

System.out.println ("Transformer Configuration exception\n-----");

Tce.printstacktrace ();

}

catch (Transformerexception te)

{

System.out.println ("Transformer exception\n---------");

Te.printstacktrace ();

}





In practical applications, we can use the traditional DOM API to get DOM tree from XML documents, then perform various operations on DOM tree according to actual requirements, get the final Document object, and then create Domsource object from this Document object , the rest of the thing is to copy the above code, after the program is run, Xmloutput.xml is the result you need (of course, you can change the parameters of the Streamresult class constructor, specify different output media, rather than a uniform XML document).



The biggest advantage of this approach is that you can control the content output from DOM tree to the output media as you like, but relying on the Transformerfactory class and the Transformer class does not implement this functionality, and you also need to rely on the help of the Outputkeys class. For a complete example, please refer to the following documents: Addrecord2.java (see annex), User.xml (see annex). The operating environment for this example is: Windows XP Professional, JDK 1.3.1. In order to be able to compile and run Addrecord2.java this program, you need to go to the URL http://java.sun.com to download the install JAXP 1.1 or Java XML Pack (Java XML Pack already contains JAXP).



Outputkeys class



The Javax.xml.transform.OutputKeys class is used in conjunction with the Java.util.Properties class to control how the JAXP XSLT engine (transformer Class) outputs the XML document format. Take a look at the following code fragment:





Create a Transformerfactory object first, and then create the transformer object.

Transformerfactory tf=transformerfactory.newinstance ();

Transformer T=tf.newtransformer ();



Gets the output property of the Transformser object, which is the default output property of the XSLT engine, which is a

Java.util.Properties object.

Properties Properties = T.getoutputproperties ();



Sets the new output property: The output character encoding is GB2312, which supports the Chinese character, which the XSLT engine outputs

If the XML document contains Chinese characters, it can be displayed normally without the so-called "kanji problem \".

Note the string constant outputkeys.encoding of the Outputkeys class.

Properties.setproperty (outputkeys.encoding, "GB2312");



/Update the output properties of the XSLT engine.

T.setoutputproperties (properties);



Invokes the XSLT engine, outputting the contents of the DOM tree to the output media as set in the Output property.

T.transform (Domsource_object,streamresult_object);









From the above program code, it is easy to see that by setting the output properties of the XSLT engine (transformer Class), you can control the output format of the content in the DOM tree, which is helpful for customizing the output. So what are the output attributes that Jaxp's XSLT engine (transformer Class) can set? The Javax.xml.transform.OutputKeys class defines a number of string constants, which are output properties that can be set freely, and the common output properties are as follows:



public static Final Java.lang.String method





Can be set to "XML", "HTML", "text" equivalent.



public static final Java.lang.String VERSION





The version number of the following specification, if method is set to "XML", its value should be set to "1.0" and if method is set to "HTML" its value should be set to "4.0" and if method is set to "text", then the output property is ignored.



public static final Java.lang.String ENCODING





Set the output of the encoding used, such as \ "GB2312", "UTF-8" and so on, if it is set to "GB2312", you can solve the so-called "Chinese character problem \".



public static final Java.lang.String omit_xml_declaration



Sets whether the XML declaration is ignored when outputting to an XML document, which is similar to the following:



<?xml version= "1.0" standalone= "yes" encoding= "Utf-8"?>





Such a code. Its optional value is "yes" and "no".



public static final Java.lang.String INDENT





Ident sets whether the XSLT engine automatically adds extra spaces when outputting an XML document, and its optional value is "yes" and "no".



public static final Java.lang.String Media_type



Media_type sets the MIME type of the output document.



What if you set the output properties of the XSLT engine? Let's summarize the following:



The first is to get a collection of default output properties for the XSLT engine (transformer Class), which requires the use of the Getoutputproperties () method of the transformer class, which returns a value of a Java.util.Properties object.



Properties Properties = Transformer.getoutputproperties ();





Then you set the new output properties, such as:



Properties.setproperty (outputkeys.encoding, "GB2312");

Properties.setproperty (Outputkeys.method, "html");

Properties.setproperty (outputkeys.version, "4.0");

...............................................................





Finally, the collection of default output properties for the XSLT engine (transformer Class) is updated, which requires the use of the Setoutputproperties () method of the Transformer class, which is a Java.util.Properties object.



We've written a new program that uses the Outputkeys class to control the output properties of the XSLT engine, which is roughly the same schema as the previous program (Addrecord3.java), but with a slightly different output. For complete code please refer to the following documents: Addrecord3.java (see annex), User.xml (see annex). The operating environment for this example is: Windows XP Professional, JDK 1.3.1. In order to be able to compile and run Addrecord3.java this program, you need to go to the URL http://java.sun.com to download the install JAXP 1.1 or Java XML Pack (the Java XML Pack contains JAXP).



Method four: Using Xalan XML Serializer



Method Four is actually a variant of method three that requires the support of Apache Xalan and Apache Xerces to run. The example code looks like this:





First, you create a Domsource object that can be an argument to a Document object

Doc represents the changed DOM tree.

Domsource Domsource = new Domsource (DOC);



Creates a Domresult object that temporarily saves the output of the XSLT engine.

Domresult Domresult = new Domresult ();



The following calls the XSLT engine in JAXP to implement the ability to output data from DOM tree to an XML file.

The input of the XSLT engine is the Domsource object, and the output is the Domresut object.

Try

{

Create a Transformerfactory object first, and then create the transformer object. Transformer

Class is equivalent to an XSLT engine. Normally we use it to process XSL files, but here we make

Use it to output XML documents.

Transformerfactory tf=transformerfactory.newinstance ();

Transformer T=tf.newtransformer ();



Set the properties of the XSLT engine (essential, otherwise, "Kanji problem \ \").

Properties Properties = T.getoutputproperties ();

Properties.setproperty (outputkeys.encoding, "GB2312");

T.setoutputproperties (properties);



The key step is to invoke the transform () method of the Transformer object (the XSLT engine), the first

A parameter is a Domsource object, and the second parameter is a Domresult object.

T.transform (Domsource,domresult);



Create the default Xalan XML serializer, which will be temporarily stored in the Domresult object

The contents of (Domresult) are exported to the output media as output streams.

Serializer Serializer = Serializerfactory.getserializer

(Outputproperties.getdefaultmethodproperties ("xml"));



Setting the output properties of the Xalan XML serializer is an essential step, or it can generate

The so-called "Chinese character problem".

Properties Prop=serializer.getoutputformat ();

Prop.setproperty ("Encoding", "GB2312");

Serializer.setoutputformat (prop);



Creates a file object that represents the output media for the data contained in DOM tree, which is an XML file.

File F = new file ("Xuser3.xml");



To create the FOS for the file output stream object, note the parameters of the constructor.

FileOutputStream fos=new FileOutputStream (f);



Sets the output stream for the Xalan XML serializer.

Serializer.setoutputstream (FOS);



Serialized output results.

Serializer.asdomserializer (). Serialize (Domresult.getnode ());

}

catch (Exception Tce)

{

Tce.printstacktrace ();

}







This method is not very common, and it seems to be a little superfluous, so we will not start the discussion. For a complete example, please refer to the following documents: Addrecord4.java (see annex), User.xml (see annex). The operating environment for this example is: Windows XP Professional, JDK 1.3.1. In order to be able to compile and run Addrecord4.java this program, you need to go to the URL http://xml.apache.org/dist/to download and install Apache Xalan and Apache Xerces.



Or go to the URL http://java.sun.com/xml/download.html to download the install Java XML Pack. Because the latest Java XML Pack (Winter version 01) contains Apache Xalan and Apache Xerces technology.



Conclusion:



This article briefly discusses four ways to update XML documents in the Java language programming. The first approach is to read and write directly to the XML file, which is cumbersome and error-prone and rarely used, unless you need to develop your own XML Parser, otherwise this method will not be used. The second approach is to use the Apache Crimson XmlDocument class, which is extremely simple and easy to use, and if you choose Apache Crimson as an XML parser, you might as well use this approach, but this approach seems inefficient ( From the inefficient Apache Crimson), in addition, the high version of Jaxp or Java XML Pack, JWSDP does not directly support Apache Crimson, which means this method is not common. The third approach is to use the JAXP XSLT engine (transformer Class) to output XML documents, which may be the standard method, and are flexible to use, especially if you can easily control the output format, we recommend this approach. The fourth method is a variant of the third method, using the Xalan XML serializer, the introduction of serialization operations, for a large number of document modification/output has advantages, unfortunately, to repeatedly set the XSLT engine properties and XML serializer output properties, more trouble, and relies on Apache Xalan and Apache Xerces Technology, the versatility is slightly insufficient.



In addition to the four methods discussed above, there are many ways to actually apply other APIs (such as Jdom, Castor, xml4j, Oracle XML Parser V2) to update XML documents, which are not discussed here.



References and sources of information:



[1] The Java Web Services Tutorial, Sun Microsystems Inc.



[2]http://xml.apache.org,apache XML Project (Crimson, Xerces, Xalan)



[3]http://www.jguru.com,xml Forum



[4]http://forum.java.sun.com,java Technology & XML Forum





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.