This article briefly discusses four common methods for updating XML documents in Java programming, and analyzes the advantages and disadvantages of these four methods. Secondly, this article also discusses how to control the format of XML documents output by Java programs.
JAXP is the abbreviation of Java API for XML Processing. JAXP supports DOM, SAX, XSLT, and other standards. To enhance the flexibility of JAXP, developers have designed a Pluggability Layer for JAXP. With the support of Pluggability Layer, JAXP can work with various XML Parser (such as Apache Xerces) that implement DOM APIs and SAX APIs, it can also work with the XSLT Processor that executes the XSLT standard, such as Apache Xalan. The advantage of applying Pluggability Layer is that we only need to be familiar with the definition of each JAXP programming interface, rather than having to have a deep understanding of the specific XML Parser and XSLT processor used. For example, in a Java program, JAXP calls the XML Parser Apache Crimson to process XML documents. If we want to use another XML Parser (such as Apache Xerces ), in order to improve the performance of the program, the original program code may not need to be changed, you can use it directly (all you need to do is add the jar file containing Apache Xerces code to the environment variable CLASSPATH, delete the jar file containing Apache Crimson code in the environment variable CLASSPATH ).
Currently, JAXP is widely used. It can be said that it is a standard API for processing XML documents in Java. Some beginners often ask the following question when learning to use JAXP: The program I wrote updates the DOM Tree, but when the program exits, the original XML document has not changed, or is it the old one? How can we synchronously update the original XML document and the DOM Tree? In my opinion, JAXP does not seem to provide corresponding interfaces, methods, or classes, which is a confusing problem for many beginners. This article aims to solve this problem and briefly introduces several common methods for synchronously updating the original XML document and DOM Tree. To narrow down the scope of the discussion, the XML Parser involved in this article only includes Apache Crimson and Apache Xerces, while the XSLT processor only uses Apache Xalan.
Method 1: directly read and write XML documents
This is perhaps the most stupid and primitive method. After the program obtains the DOM Tree, It updates the DOM Tree by using each method of the Node interface of the DOM model. The next step is to update the original XML document. We can use recursive methods or the TreeWalker class to traverse the entire DOM Tree. At the same time, every node/element of the DOM Tree is written to the pre-opened original XML document in sequence, after the DOM Tree is completely traversed, the DOM Tree and the original XML document are synchronously updated. In reality, this method is rarely used, but if you want to program your own XML parser, this method may still be useful.
Method 2: Use the XmlDocument class
Use the XmlDocument class? This class is clearly unavailable in JAXP! Is the author wrong? No error! The XmlDocument class is used. Specifically, the write () method of the XmlDocument class is used.
As mentioned above, JAXP can be used together with a variety of XML parser. The XML Parser we selected this time is Apache Crimson. XmlDocument (org. apache. crimson. tree. xmlDocument) is a class of Apache Crimson and is not included in the standard JAXP. No wonder the XmlDocument class cannot be found in the JAXP document. Now the problem arises. How can I use the XmlDocument class to update XML documents? The XmlDocument class provides the following three write () methods (according to Crimson's latest version ------ Apache Crimson 1.1.3 ):
Public void write (OutputStream out) throws IOException Public void write (Writer out) throws IOException Public void write (Writer out, String encoding) throws IOException |
The above three write () methods are mainly used to output the content in the DOM Tree to a specific output medium, such as the file output stream and application console. So how can we use the above three write () methods? See the following Java code snippet:
String name = "fancy "; DocumentBuilder parser; DocumentBuilderFactory factory = DocumentBuilderFactory. newInstance (); Try { Parser = factory. newDocumentBuilder (); Document doc = parser. parse ("user. xml "); Element newlink = doc. createElement (name ); Doc. getDocumentElement (). appendChild (newlink ); (XmlDocument) doc). write (new FileOutputStream (new File ("xuser1.xml "))); } Catch (Exception e) { // To log it } |
In the above Code, a Document object doc is created to obtain the complete DOM Tree, and then the Node interface appendChild () method is applied, append a new node (fancy) to the end of the DOM Tree, call the write (OutputStream out) method of the XmlDocument class, and output the content in the DOM Tree to xuser. xml (in fact, it can also be output to user. xml: update the original XML document. In this example, output to xuser for ease of comparison. xml file ). Note that you cannot directly call the write () method on the Document Object doc, because the Document interface of JAXP does not define any write () method, therefore, you must forcibly convert a doc from a Document object to an XmlDocument object before calling the write () method. In the code above, the write (OutputStream out) method is used, this method uses the default UTF-8 encoding to output the content in the DOM Tree to a specific output medium. If the DOM Tree contains Chinese characters, the output may be garbled, that is, there is a so-called "Chinese Character Problem". The solution is to use the write (Writer out, String encoding) method to explicitly specify the encoding for the output, for example, if you set the second parameter to "GB2312", the "Chinese Character Problem" does not exist, and the output result can display Chinese characters normally.
For a complete example, see the following files: AddRecord. java (see attachment) and user. xml (see attachment ). The running environment of this example is Windows XP Professional and JDK 1.3.1. To compile and run the AddRecord. java program properly, go to http://xml.apache.org/dist/crimson/to download Apache Crimson and add the obtained crimson. jar file to the environment variable CLASSPATH.
Note:
Apache Crimson, formerly known as Sun Project X Parser, evolved from X Parser to Apache Crimson somehow. So far, many Apache Crimson code has been directly transplanted from X Parser. For example, the XmlDocument class used above is com in X Parser. sun. xml. xmlDocument, in Apache Crimson, is changed to org. apache. crimson. tree. xmlDocument class, in fact, most of their code is the same, may be different from the package statement and the import Statement and a lience at the beginning of the file. Early JAXP was bundled with X Parser, so some old programs used com. sun. xml packages. If you re-compile them now, it may fail. It must be for this reason. Later, JAXP and Apache Crimson were bundled together, such as JAXP 1.1. If you use JAXP 1.1, you do not need to download Apache Crimson and compile and run the above example (AddRecord. java ). The latest JAXP 1.2 EA (Early Access) is changed to a new one. Apache Xalan and Apache Xerces, which have better performance, are used as the XSLT processor and XML parser respectively, and cannot directly support Apache Crimson, therefore, if your development environment uses JAXP 1.2 EA or Java XML Pack (containing JAXP 1.2 EA), you cannot directly compile and run the above example (AddRecord. java), you need to download and install Apache Crimson.
Method 3: Use the TransformerFactory and Transformer classes
The standard method provided in JAXP to update the original XML document is to call the XSLT engine, that is, to use the TransformerFactory and Transformer classes. See the following Java code snippet:
// Create a DOMSource object. The constructor parameter can be a Document object. // Doc indicates the modified DOM Tree. DOMSource doms = new DOMSource (doc );
// Create a File object that represents the output media of the data contained in the DOM Tree. This is an XML File. File f = new File ("XMLOutput. xml ");
// Create a StreamResult object. The parameters of this constructor can be taken as File objects. StreamResult sr = new StreamResult (f );
// Call the XSLT engine in JAXP to output the data in the DOM Tree to the XML file. // The input of the XSLT engine is a DOMSource object and the output is a StreamResut object. Try { // Create a TransformerFactory object and then create a Transformer object. Transformer // The class is equivalent to an XSLT engine. We usually use it to process XSL files, but here we make // Use it to output XML documents. TransformerFactory tf = TransformerFactory. newInstance (); Transformer t = tf. newTransformer ();
// The key step is to call the transform () method of the Transformer object (XSLT engine ). // The parameter is the DOMSource object, and the second parameter is the StreamResult object. T. transform (doms, sr ); } Catch (TransformerConfigurationException tce) { System. out. println ("Transformer Configuration Exception -----"); Tce. printStackTrace (); } Catch (TransformerException te) { System. out. println ("Transformer Exception ---------"); Te. printStackTrace (); } |
In practical applications, we can use the traditional dom api to obtain the DOM Tree from the XML Document, and then perform various operations on the DOM Tree according to actual requirements to obtain the final Document object, next, we can create a DOMSource object from this Document object. The rest is to copy the above Code. After the program is run, XMLOutput. xml is the result you need (of course, you can change the parameters of the StreamResult class constructor at will and specify different output media instead of the same XML document ).
The biggest advantage of this method is that you can control the format of the content in the DOM Tree output to the output media as you like, but this function cannot be implemented by the TransformerFactory class and Transformer class alone, you also need to rely on the OutputKeys class for help. For a complete example, see the following files: AddRecord2.java (see attachment) and user. xml (see attachment ). The running environment of this example is Windows XP Professional and JDK 1.3.1. To properly compile and run the AddRecord2.java program, you need to go to the Web site http://java.sun.com to download and install JAXP 1.1 or Java XML Pack (Java XML Pack already contains JAXP ).
OutputKeys class
The javax. xml. transform. OutputKeys class can be used with the java. util. Properties class to control the format of XML document output by the jaxp xslt engine (Transformer class. See the following code snippet:
// Create a TransformerFactory object and then create a Transformer object. TransformerFactory tf = TransformerFactory. newInstance (); Transformer t = tf. newTransformer ();
// Obtain the output attribute of the Transformser object, that is, the default output attribute of the XSLT engine. // Java. util. Properties object. Properties properties = t. getOutputProperties ();
// Set the new output attribute: the output character is encoded as GB2312, which can support Chinese characters and output by the XSLT engine. // If the XML document contains Chinese characters, it can be properly displayed without the so-called "Chinese character problem ". // Note the Character String constant OutputKeys. ENCODING of the OutputKeys class. Properties. setProperty (OutputKeys. ENCODING, "GB2312 ");
/Update the output attributes of the XSLT engine. T. setOutputProperties (properties );
// Call the XSLT engine to output the content in the DOM Tree to the output media according to the settings in the output attribute. T. transform (DOMSource_Object, StreamResult_Object ); |
From the code above, we can easily see that by setting the output attribute of the XSLT engine (Transformer class), we can control the output format of the content in the DOM Tree, this is very helpful for customizing the output content. So what output attributes can be set for the jaxp xslt engine (Transformer class? The javax. xml. transform. OutputKeys class defines many string constants. They are all output attributes that can be set freely. Common output attributes are as follows:
Public static final java. lang. String METHOD |
It can be set to "xml", "html", or "text.
Public static final java. lang. String VERSION |
Standard version number. If METHOD is set to "xml", its value should be set to "1.0". If METHOD is set to "html ", the value should be set to "4.0". If the METHOD is set to "text", this output attribute will be ignored.
Public static final java. lang. String ENCODING |
Set the encoding method used when the output, such as "GB2312", "UTF-8" and so on, if it is set to "GB2312", can solve the so-called "Chinese character problem ".
Public static final java. lang. String OMIT_XML_DECLARATION |
Set whether to ignore the XML declaration when output to the XML document, that is, similar:
<? Xml version = "1.0" standalone = "yes" encoding = "UTF-8"?> |
Such code. The optional values include "yes" and "no ".
Public static final java. lang. String INDENT |
IDENT sets whether the XSLT engine automatically adds extra spaces when outputting XML documents. The optional values are "yes" and "no ".
Public static final java. lang. String MEDIA_TYPE |
MEDIA_TYPE specifies the MIME type of the output document.
What if I set the output attribute of the XSLT engine? The following is a summary:
The first is to obtain the set of default output attributes of the XSLT engine (Transformer class). This requires the getOutputProperties () method of the Transformer class. The returned value is a java. util. Properties object.
Properties properties = transformer. getOutputProperties (); |
Set new output attributes, such:
Properties. setProperty (OutputKeys. ENCODING, "GB2312 "); Properties. setProperty (OutputKeys. METHOD, "html "); Properties. setProperty (OutputKeys. VERSION, "4.0 "); ............................................................... |
The last step is to update the Set of default output attributes of the XSLT engine (Transformer class). This requires the setOutputProperties () method of the Transformer class. The parameter is a java. util. Properties object.
We have compiled a new program that applies the OutputKeys class to control the output attributes of the XSLT engine. The program architecture is roughly the same as that of the previous program (AddRecord3.java, however, the output results are slightly different. For the complete code, see the following files: AddRecord3.java (see attachment) and user. xml (see attachment ). The running environment of this example is Windows XP Professional and JDK 1.3.1. To properly compile and run the AddRecord3.java program, you need to go to the Web site http://java.sun.com to download and install JAXP 1.1 or Java XML Pack (Java XML Pack contains JAXP ).
Method 4: Use Xalan XML Serializer
Method 4 is actually a variant of method 3. It can run only with the support of Apache Xalan and Apache Xerces. The sample code is as follows:
// Create a DOMSource object. The constructor parameter can be a Document object. // Doc indicates the modified DOM Tree. DOMSource domSource = new DOMSource (doc );
// Create a DOMResult object to temporarily Save the output result of the XSLT engine. DOMResult domResult = new DOMResult ();
// Call the XSLT engine in JAXP to output the data in the DOM Tree to the XML file. // The input of the XSLT engine is a DOMSource object and the output is a DOMResut object. Try { // Create a TransformerFactory object and then create a Transformer object. Transformer // The class is equivalent to an XSLT engine. We usually use it to process XSL files, but here we make // Use it to output XML documents. TransformerFactory tf = TransformerFactory. newInstance (); Transformer t = tf. newTransformer ();
// Set the attributes of the XSLT engine (required; otherwise, a "Chinese Character Problem" occurs "). Properties properties = t. getOutputProperties (); Properties. setProperty (OutputKeys. ENCODING, "GB2312 "); T. setOutputProperties (properties );
// The key step is to call the transform () method of the Transformer object (XSLT engine ). // The parameter is the DOMSource object, and the second parameter is the DOMResult object. T. transform (domSource, domResult );
// Create the default Xalan XML Serializer and use it to temporarily store it in the DOMResult object // (DomResult) content is output to the output media in the form of an output stream. Serializer serializer = SerializerFactory. getSerializer (OutputProperties. getdefamethomethodproperties ("xml "));
// Set the output attribute of Xalan XML Serializer. This step is required. Otherwise // The so-called "Chinese character problem ". Properties prop = serializer. getOutputFormat (); Prop. setProperty ("encoding", "GB2312 "); Serializer. setOutputFormat (prop );
// Create a File object that represents the output media of the data contained in the DOM Tree. This is an XML File. File f = new File ("xuser3.xml ");
// Create the file output stream object fos. Pay attention to the parameters of the constructor. FileOutputStream fos = new FileOutputStream (f );
// Set the output stream of Xalan XML Serializer. Serializer. setOutputStream (fos );
// Serialized output result. Serializer. asDOMSerializer (). serialize (domResult. getNode ()); } Catch (Exception tce) { Tce. printStackTrace (); } |
This method is not very common and seems to be a little superfluous, so we will not discuss it. For a complete example, see the following files: AddRecord4.java (see attachment) and user. xml (see attachment ). The running environment of this example is Windows XP Professional and JDK 1.3.1. To compile and run the AddRecord4.java program properly, go to http://xml.apache.org/dist/to download and install Apache Xalan and Apache Xerces.
Or go to http://java.sun.com/xml/download.htmlto download and install the Java XML Pack. Because the latest Java XML Pack (Winter 01) includes Apache Xalan and Apache Xerces technologies.
Conclusion:
This article briefly discusses four methods for updating XML documents in Java programming. The first method is to directly read and write XML files. This method is cumbersome and error-prone and rarely used. This method is not used unless you need to develop your own XML Parser. The second method is to use the XmlDocument class of Apache Crimson. This method is extremely simple and easy to use. If you use Apache Crimson as the XML parser, you may wish to use this method, however, this method seems to be less efficient (due to the inefficient Apache Crimson). In addition, JAXP of higher versions, Java XML Pack, and JWSDP do not directly support Apache Crimson, that is, this method is not universal. The third method is to use the jaxp xslt engine (Transformer class) to output XML documents. This method may be a standard method and flexible to use, especially when the output format is freely controlled, we recommend this method. The fourth method is a variant of the third method. Xalan XML Serializer is used to introduce serialized operations, which is advantageous for modifying/outputting a large number of documents, unfortunately, it is troublesome to repeatedly set the attributes of the XSLT engine and the output attributes of the XML Serializer. It is also dependent on Apache Xalan and Apache Xerces technologies, and its versatility is slightly insufficient.
In addition to the four methods discussed above, there are also many ways to update XML documents by actually using other APIs (such as JDOM, Castor, XML4J, and Oracle XML Parser V2, we will not discuss it here.