DOM4J cannot save the XML file as UTF-8, Invalid byte 2 of 2-byte UTF-8 Sequence-hxzon hands-gdo__xml

Source: Internet
Author: User
Tags xml parser
DOM4J cannot save the XML file as UTF-8, Invalid byte 2 of 2-byte UTF-8 Sequence-hxzon hands-gdo
These days began to learn dom4j, on the Internet to find an article on the dry, very fast, but found a problem is not to UTF-8 save the XML file, save again read out when will report "Invalid Byte 2 of 2-byte UTF-8 sequence." Such an error, check found that the file generated by dom4j, in the use of the correct processing of XML encoding in any editor in Chinese garbled, from Notepad to view and will not appear garbled will correctly display Chinese. It makes me very headache. The XML file that you are trying to generate using GBK, GB2312 encoding can be parsed normally. Therefore, the suspected dom4j did not process the Utf-8 encoding. You begin to see the original code for the dom4j. Finally found the problem is the problem of their own procedures.
The code for a new XML document in the DOM4J example and the popular online dom4j use profile is similar to the following
public void Createxml (String fileName) {
Document Nbspdoc = org.dom4j.document. Elper.createdocument. );
Element root = Doc.addelement ("book");
Root.addattribute ("name", "my book"); Element childtmp;
Childtmp = root.addelement ("price");
Childtmp.settext ("21.22"); Element writer = root.addelement ("author");
Writer.settext ("Dick");
Writer.addattribute ("ID", "001"); try {
Org.dom4j.io.XMLWriter XMLWriter = new Org.dom4j.io.XMLWriter (
New FileWriter (FileName));
Xmlwriter.write (DOC);
Xmlwriter.close ();
}
catch (Exception e) {
System.out.println (e);
}
}
The output used in the above code is the FileWriter object for file output. This is why the file encoding is not properly encoded, and the subclasses inherited by the writer class in Java do not provide encoding format processing, so dom4j cannot properly format the output files. At this time, the saved files will be the system's default encoding to save the file, in the Chinese version of Windows under the Java default encoding for GBK, that is, although we identified to save the XML in the UTF-8 format but actually the file is saved in GBK format, So that's why we can use GBK, GB2312 encoding to generate XML files that can be parsed correctly, and files generated in UTF-8 format cannot be parsed by the XML parser.
Okay, now that we've found out why, let's find a solution. First, let's look at how DOM4J implements the coding process.
Public XMLWriter (OutputStream out) throws Unsupportedencodingexception {
System.out.println ("in OutputStream");
This.format = Default_format;
This.writer = Createwriter (out, format.getencoding ());
This.autoflush = true;
Namespacestack.push (Namespace.no_namespace);
XMLWriter (OutputStream out, OutputFormat format) throws Unsupportedencodingexception {
System.out.println ("in Outputstream,outputformat");
This.format = format;
This.writer = Createwriter (out, format.getencoding ());
This.autoflush = true;
Namespacestack.push (Namespace.no_namespace);
}
/**
* Get a OutputStreamWriter, use preferred encoding.
*/
Protected Writer Createwriter (OutputStream outstream, String encoding) throws Unsupportedencodingexception {
return new BufferedWriter (
New OutputStreamWriter (OutStream, encoding)
);
}
From the above code, we can see that dom4j does not do anything very complicated with coding, completely through the function of Java itself. So when we use DOM4J to generate our XML file, we should not directly assign a writer object to the XmlWriter when we build it, but we should build it through a OutputStream subclass object. That is, in our code above, we should not use the FileWriter object to build the XML document, but should use the FileOutputStream object to build so that the code is modified into the following:
public void Createxml (String fileName) {
Document Nbspdoc = org.dom4j.document. Elper.createdocument. );
Element root = Doc.addelement ("book");
Root.addattribute ("name", "my book"); Element childtmp;
Childtmp = root.addelement ("price");
Childtmp.settext ("21.22"); Element writer = root.addelement ("author");
Writer.settext ("Dick");
Writer.addattribute ("ID", "001"); try {
Note the modifications here
Org.dom4j.io.XMLWriter XMLWriter = new Org.dom4j.io.XMLWriter (
New FileOutputStream (FileName));
Xmlwriter.write (DOC);
Xmlwriter.close ();
}
catch (Exception e) {
System.out.println (e);
}

The problem coding problem for this dom4j is an ending, and I hope this article will be useful to other friends.


Original: Http://hi.baidu.com/hxzon/item/b71490893fba92c799255f45

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.