When JDOM is used to generate xml files, the characters 0x0 and 0x8 are always invalid. After searching, the cause and solution are as follows:
Cause:
The characters to be filtered in XML are divided into two types. One is that the characters that are not allowed to appear in XML are not within the definition range of XML. The other type is the characters used by XML itself. If the content contains these characters, they must be replaced with other characters. The first type of characters:
For the first type of characters, we can use the W3C XML document to check which characters are not allowed to appear in the XML document. The allowed characters in XML are "# x9 | # XA | # XD | [# x20-# xd7ff] | [# xe000-# xfffd] | [# x10000-# x10ffff]". Therefore, we can filter out characters out of this range. The range of characters to be filtered is \ x00-\ x08 \ x0b-\ x0c \ x0e-\ x1f second-type characters:
For the second type of characters, there are a total of five characters, such as: the characters HTML character encoding and (and) & amp; & #38; single quotation marks '& apos; & #39; double quotation marks (") & quot; & #34; greater than sign> & gt; & #62; smaller than sign <& lt; & #60, the solution is to replace it with a regular expression,
Regular Expression: [<> & '\ "\ x00-\ x08 \ x0b-\ x0c \ x0e-\ x1f]
Invalid characters in XML