When JDOM is used to read data from the database to generate an XML file, and then parse the XML file, there are always 0x0, invalid characters. After searching, the cause and solution of the problem are as follows:
Cause:
The characters to be filtered in XML are divided into two types. One is that the characters that are not allowed to appear in XML are not within the definition range of XML. Another
A class is a character used by XML itself. If the content contains these characters, it must be replaced with other characters. The first type of characters: For the first type of characters, we can use the W3C XML document to view which characters are not allowed to appear in the XML document. The allowed characters in XML are "# x9 | # XA | # XD | [# x20-# xd7ff] |
[# Xe000-# xfffd] | [# x10000-# x10ffff] ". Therefore, we can filter out characters out of this range. The range of characters to be filtered is: // x00-// x08 // x0b-// x0c // x0e-// x1f second-type characters: for the second type of characters, there are a total of five characters, such as: character HTML character encoding and (and) & amp; & #38; single quotation marks'
& Apos; & #39; double quotation marks "& quot; & #34; greater than Id> & gt; & #62; less than id <& lt; & #60; we only need to replace these five characters to solve the problem: replace them with regular expressions, str. replaceall (, "") is expressed as follows using a regular expression: Str. replaceall ("[<> & '/" // x00-// x08 // x0b-// x0c // x0e-// x1f] "," ") this expression does not pass. Another expression: [^ (? : [/U4e00-/u9fa5] */W */S *) + $: Str failed. replaceall ("[// x00-// x08 // x0b-// x0c // x0e-// x1f]", "") test successful