The coding problem in the dom4j of "the Chinese problem is not discussed"

Source: Internet
Author: User
This article is mainly about the dom4j in the document to save the process of a Chinese problem, this article with the "80 ago," a article like the spring project has nothing to do, please "Spring fans" of self-respect, nothing to disturb, the text of the deficiencies are welcome to criticize the comments.
DOM4J is a relatively good Java open source XML parsing project that supports DOM, SAX and JAXP, and provides strong support for XPath query languages. Therefore, in many open source projects of the EASYJF team, such as Easyjweb, easydbo and so on are using DOM4J to handle XML file-related operations.

1. Load a DOM into memory from an XML file
FileInputStream in = new FileInputStream (new File (FileName));
Saxreader reader = new Saxreader ();
doc = Reader.read (in); 2, the data in the DOM to write to the XML file
With dom4j, it is easy to write data from a DOM to a file, as follows:
public void Write (Writer Writer) throws IOException; Therefore, if we want to write a document to the C:/test.xml file, we can simply use the following code:
Java.io.Writer wr= New java.io.FileWrite (filename);
Doc.write (WR);
Wr.close ()//Note that the close () method must be executed to implement the true write
  
This usage is also a very simple method that dom4j recommended for us to use. However, when our DOM contains Chinese character data, the XML document written by this method does not allow intuition to open. You will be prompted with an error similar to the following:
Org.dom4j.DocumentException:invalid byte 1 of 1-byte UTF-8 sequence (0xb2) Nested exception:invalid byte 1 of 1-byte UTF -8 sequence (0XB2)
At Org.dom4j.io.SAXReader.read (saxreader.java:484)
At Org.dom4j.io.SAXReader.read (saxreader.java:343)
At we can look at the generated XML file encoding, the content is utf-8, but the file format is indeed ANSI, as shown in the following figure:
cause Analysis
Because the default output encoding for FileWriter is ANSI encoding, and the content provided by the Wirte method in dom4j is actually saved in UTF-8, the XML file that contains the Chinese characters is not read properly. Workaround:
Instead of using a simple filewriter, you should use a writer that specifies a specific output encoding, and outputstreamwriter can specify the output encoding in the JDK IO package.
The correct code is as follows:
Java.io.OutputStream out=new Java.io.FileOutputStream (fileName);
Java.io.Writer wr=new Java.io.OutputStreamWriter (out, "UTF-8");
Doc.write (WR);
Wr.close ();
Out.close ();
To simplify, you can write the following style:
Java.io.Writer wr=new Java.io.OutputStreamWriter (New Java.io.FileOutputStream (FileName), "UTF-8");
Doc.write (WR);
Wr.close ();
Summary:
Since most of the best foundational open source projects are developed by foreigners, they are unlikely to be tested under the Chinese platform, and the use case data is rarely used in the Chinese platform, so even if we follow the common documentation and user Guide for these open-source projects, there will be many unpredictable errors. This is why I want to participate in the formation of open source team EASYJF, advocating for domestic open source, and develop some basic open source framework such as Easyjweb, Easydbo's original intention.
Of course, the Chinese question raised here is a question of Chinese that has not yet been discussed and that has to be handled correctly through some rare processing. Therefore, the same merge into the "Chinese problem is not negotiable" series.
(Note: This article author, easyjf open source Team Big Gorge, reprint, please keep the author's statement.) )

trackback:http://tb.blog.csdn.net/trackback.aspx?postid=1097799

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.