Invalid characters in XML

Source: Internet
Author: User

When JDOM is used to read data from the database to generate an XML file, and then parse the XML file, there are always 0x0, invalid characters. After searching, the cause and solution of the problem are as follows:

 

Cause:

The characters to be filtered in XML are divided into two types. One is that the characters that are not allowed to appear in XML are not within the definition range of XML. Another
A class is a character used by XML itself. If the content contains these characters, it must be replaced with other characters. The first type of characters: For the first type of characters, we can use the W3C XML document to view which characters are not allowed to appear in the XML document. The allowed characters in XML are "# x9 | # XA | # XD | [# x20-# xd7ff] |
[# Xe000-# xfffd] | [# x10000-# x10ffff] ". Therefore, we can filter out characters out of this range. The range of characters to be filtered is: // x00-// x08 // x0b-// x0c // x0e-// x1f second-type characters: for the second type of characters, there are a total of five characters, such as: character HTML character encoding and (and) & amp; & #38; single quotation marks'
& Apos; & #39; double quotation marks "& quot; & #34; greater than Id> & gt; & #62; less than id <& lt; & #60; we only need to replace these five characters to solve the problem: replace them with regular expressions, str. replaceall (, "") is expressed as follows using a regular expression: Str. replaceall ("[<> & '/" // x00-// x08 // x0b-// x0c // x0e-// x1f] "," ") this expression does not pass. Another expression: [^ (? : [/U4e00-/u9fa5] */W */S *) + $: Str failed. replaceall ("[// x00-// x08 // x0b-// x0c // x0e-// x1f]", "") test successful

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.