Introduction to four common methods for updating XML in Java

Source: Internet
Author: User
Tags xslt xslt processor

This article briefly discusses four common methods for updating XML documents in Java programming, and analyzes the advantages and disadvantages of these four methods. Secondly, this article also discusses how to control the format of XML documents output by Java programs.

JAXP is the abbreviation of Java API for XML Processing. JAXP supports DOM, SAX, XSLT, and other standards. To enhance the flexibility of JAXP, developers have designed a Pluggability Layer for JAXP. With the support of Pluggability Layer, JAXP can work with various XML Parser (such as Apache Xerces) that implement DOM APIs and SAX APIs, it can also work with the XSLT Processor that executes the XSLT standard, such as Apache Xalan. The advantage of applying Pluggability Layer is that we only need to be familiar with the definition of each JAXP programming interface, rather than having to have a deep understanding of the specific XML Parser and XSLT processor used. For example, in a Java program, JAXP calls the XML Parser Apache Crimson to process XML documents. If we want to use another XML Parser (such as Apache Xerces ), in order to improve the performance of the program, the original program code may not need to be changed, you can use it directly (all you need to do is add the jar file containing Apache Xerces code to the environment variable CLASSPATH, delete the jar file containing Apache Crimson code in the environment variable CLASSPATH ).

Currently, JAXP is widely used. It can be said that it is a standard API for processing XML documents in Java. Some beginners often ask the following question when learning to use JAXP: The program I wrote updates the DOM Tree, but when the program exits, the original XML document has not changed, or is it the old one? How can we synchronously update the original XML document and the DOM Tree? In my opinion, JAXP does not seem to provide corresponding interfaces, methods, or classes, which is a confusing problem for many beginners. This article aims to solve this problem and briefly introduces several common methods for synchronously updating the original XML document and DOM Tree. To narrow down the scope of the discussion, the XML Parser involved in this article only includes Apache Crimson and Apache Xerces, while the XSLT processor only uses Apache Xalan.

Method 1: directly read and write XML documents

This is perhaps the most stupid and primitive method. After the program obtains the DOM Tree, It updates the DOM Tree by using each method of the Node interface of the DOM model. The next step is to update the original XML document. We can use recursive methods or the TreeWalker class to traverse the entire DOM Tree. At the same time, every node/element of the DOM Tree is written to the pre-opened original XML document in sequence, after the DOM Tree is completely traversed, the DOM Tree and the original XML document are synchronously updated. In reality, this method is rarely used, but if you want to program your own XML parser, this method may still be useful.

Method 2: Use the XmlDocument class

Use the XmlDocument class? This class is clearly unavailable in JAXP! Is the author wrong? No error! The XmlDocument class is used. Specifically, the write () method of the XmlDocument class is used.

As mentioned above, JAXP can be used together with a variety of XML parser. The XML Parser we selected this time is Apache Crimson. XmlDocument (org. apache. crimson. tree. xmlDocument) is a class of Apache Crimson and is not included in the standard JAXP. No wonder the XmlDocument class cannot be found in the JAXP document. Now the problem arises. How can I use the XmlDocument class to update XML documents? The XmlDocument class provides the following three write () methods (according to Crimson's latest version ------ Apache Crimson 1.1.3 ):

  write (OutputStream out)   write (Writer out)   write (Writer out, String encoding)  IOException

The above three write () methods are mainly used to output the content in the DOM Tree to a specific output medium, such as the file output stream and application console. So how can we use the above three write () methods? See the following Java code snippet:

String name ===== FileOutputStream(        }

In the above Code, a Document object doc is created to obtain the complete DOM Tree, and then the Node interface appendChild () method is applied, append a new node (fancy) to the end of the DOM Tree, call the write (OutputStream out) method of the XmlDocument class, and output the content in the DOM Tree to xuser. xml (in fact, it can also be output to user. xml: update the original XML document. In this example, output to xuser for ease of comparison. xml file ). Note that you cannot directly call the write () method on the Document Object doc, because the Document interface of JAXP does not define any write () method, therefore, you must forcibly convert a doc from a Document object to an XmlDocument object before calling the write () method. In the code above, the write (OutputStream out) method is used, this method uses the default UTF-8 encoding to output the content in the DOM Tree to a specific output medium,If the DOM Tree contains Chinese characters, the output may contain garbled characters, that is, the so-called Chinese character problem. The solution is to use the write (Writer out, String encoding) method, explicitly specify the encoding for the output. For example, if the second parameter is set to GB2312, there is no Chinese Character Problem and the output result can display Chinese characters normally.

For a complete example, see the following files: AddRecord. java (see attachment) and user. xml (see attachment ). The running environment of this example is Windows XP Professional and JDK 1.3.1. To compile and run the AddRecord. java program properly, go to http://xml.apache.org/dist/crimson/to download Apache Crimson and add the obtained crimson. jar file to the environment variable CLASSPATH.

Note:

Apache Crimson, formerly known as Sun Project X Parser, evolved from X Parser to Apache Crimson somehow. So far, many Apache Crimson code has been directly transplanted from X Parser. For example, the XmlDocument class used above is com in X Parser. sun. xml. xmlDocument, in Apache Crimson, is changed to org. apache. crimson. tree. xmlDocument class, in fact, most of their code is the same, may be different from the package statement and the import Statement and a lience at the beginning of the file. Early JAXP was bundled with X Parser, so some old programs used com. sun. xml packages. If you re-compile them now, it may fail. It must be for this reason. Later, JAXP and Apache Crimson were bundled together, such as JAXP 1.1. If you use JAXP 1.1, you do not need to download Apache Crimson and compile and run the above example (AddRecord. java ). The latest JAXP 1.2 EA (Early Access) is changed to a new one. Apache Xalan and Apache Xerces, which have better performance, are used as the XSLT processor and XML parser respectively, and cannot directly support Apache Crimson, therefore, if your development environment uses JAXP 1.2 EA or Java XML Pack (containing JAXP 1.2 EA), you cannot directly compile and run the above example (AddRecord. java), you need to download and install Apache Crimson.

Method 3: Use the TransformerFactory and Transformer classes

The standard method provided in JAXP to update the original XML document is to call the XSLT engine, that is, to use the TransformerFactory and Transformer classes. See the following Java code snippet:

        DOMSource doms =         File f =         StreamResult sr =                 TransformerFactory tf==--------------

In practical applications, we can use the traditional dom api to obtain the DOM Tree from the XML Document, and then perform various operations on the DOM Tree according to actual requirements to obtain the final Document object, next, we can create a DOMSource object from this Document object. The rest is to copy the above Code. After the program is run, XMLOutput. xml is the result you need (of course, you can change the parameters of the StreamResult class constructor at will and specify different output media instead of the same XML document ).

The biggest advantage of this method is that you can control the format of the content in the DOM Tree output to the output media as you like, but this function cannot be implemented by the TransformerFactory class and Transformer class alone, you also need to rely on the OutputKeys class for help. For a complete example, see the following files: AddRecord2.java (see attachment) and user. xml (see attachment ). The running environment of this example is Windows XP Professional and JDK 1.3.1. To compile and run the AddRecord2.java program properly, go to http://java.sun.com/to download and install JAXP 1.1 or Java XML Pack (Java XML Pack already contains JAXP ).

OutputKeys class

The javax. xml. transform. OutputKeys class can be used with the java. util. Properties class to control the format of XML document output by the jaxp xslt engine (Transformer class. See the following code snippet:

        TransformerFactory tf==        Properties properties =/        t.transform(DOMSource_Object,StreamResult_Object);

From the code above, we can easily see that by setting the output attribute of the XSLT engine (Transformer class), we can control the output format of the content in the DOM Tree, this is very helpful for customizing the output content. So what output attributes can be set for the jaxp xslt engine (Transformer class? The javax. xml. transform. OutputKeys class defines many string constants. They are all output attributes that can be set freely. Common output attributes are as follows:

Public static final java. lang. String METHOD

It can be set to xml, html, and text.

Public static final java. lang. String VERSION

The compliant version number. If the METHOD is set to xml, the value of the METHOD should be set to 1.0. If the METHOD is set to html, the value of the METHOD should be set to 4.0, if METHOD is set to text, this output attribute is ignored.

Public static final java. lang. String ENCODING

Set the encoding method used for output, such as GB2312, UTF-8, etc. If it is set to GB2312, can solve the so-called Chinese character problem.

Public static final java. lang. String OMIT_XML_DECLARATION

Set whether to ignore the XML declaration when output to the XML document, that is, similar:

<? Xml version = 1.0 standalone = yes encoding = UTF-8?>

Such code. The optional values include yes and no.

Public static final java. lang. String INDENT

IDENT sets whether the XSLT engine automatically adds extra spaces when outputting XML documents. The optional values are yes and no.

Public static final java. lang. String MEDIA_TYPE

MEDIA_TYPE specifies the MIME type of the output document.

What if I set the output attribute of the XSLT engine? The following is a summary:

The first is to obtain the set of default output attributes of the XSLT engine (Transformer class). This requires the getOutputProperties () method of the Transformer class. The returned value is a java. util. Properties object.

Properties properties = transformer. getOutputProperties ();

Set new output attributes, such:

Properties. setProperty (OutputKeys. ENCODING, GB2312 );

Properties. setProperty (OutputKeys. METHOD, html );

Properties. setProperty (OutputKeys. VERSION, 4.0 );

...............................................................

The last step is to update the Set of default output attributes of the XSLT engine (Transformer class). This requires the setOutputProperties () method of the Transformer class. The parameter is a java. util. Properties object.

We have compiled a new program that applies the OutputKeys class to control the output attributes of the XSLT engine. The program architecture is roughly the same as that of the previous program (AddRecord3.java, however, the output results are slightly different. For the complete code, see the following files: AddRecord3.java (see attachment) and user. xml (see attachment ). The running environment of this example is Windows XP Professional and JDK 1.3.1. To compile and run the AddRecord3.java program properly, go to http://java.sun.com/to download and install JAXP 1.1 or Java XML Pack (Java XML Pack contains JAXP ).

Method 4: Use Xalan XML Serializer

Method 4 is actually a variant of method 3. It can run only with the support of Apache Xalan and Apache Xerces. The sample code is as follows:

        DOMSource domSource =         DOMResult domResult =                 TransformerFactory tf==        Properties properties =        Serializer serializer =        Properties prop=        File f =         FileOutputStream fos=

This method is not very common and seems to be a little superfluous, so we will not discuss it. For a complete example, see the following files: AddRecord4.java (see attachment) and user. xml (see attachment ). The running environment of this example is Windows XP Professional and JDK 1.3.1. To compile and run the AddRecord4.java program properly, go to http://xml.apache.org/dist/to download and install Apache Xalan and Apache Xerces.

Or go to http://java.sun.com/xml/download.htmlto download and install the Java XML Pack. Because the latest Java XML Pack (Winter 01) includes Apache Xalan and Apache Xerces technologies.

Conclusion:

This article briefly discusses four methods for updating XML documents in Java programming. The first method is to directly read and write XML files. This method is cumbersome and error-prone and rarely used. This method is not used unless you need to develop your own XML Parser. The second method is to use the XmlDocument class of Apache Crimson. This method is extremely simple and easy to use. If you use Apache Crimson as the XML parser, you may wish to use this method, however, this method seems to be less efficient (due to the inefficient Apache Crimson). In addition, JAXP of higher versions, Java XML Pack, and JWSDP do not directly support Apache Crimson, that is, this method is not universal.The third method is to use the jaxp xslt engine (Transformer class) to output XML documents. This method may be a standard method and flexible to use, especially when the output format is freely controlled, we recommend this method.The fourth method is a variant of the third method. Xalan XML Serializer is used to introduce serialized operations, which is advantageous for modifying/outputting a large number of documents, unfortunately, it is troublesome to repeatedly set the attributes of the XSLT engine and the output attributes of the XML Serializer. It is also dependent on Apache Xalan and Apache Xerces technologies, and its versatility is slightly insufficient.

In addition to the four methods discussed above, there are also many ways to update XML documents by actually using other APIs (such as JDOM, Castor, XML4J, and Oracle XML Parser V2, we will not discuss it here.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.