XML read exception invalid byte 1 of 1-byte UTF-8 Sequence
To put it simply, this error may occur when you parse other users' XML formats, that is, when others generate xml, they do not save it as a UTF-8 character encoding format.
In Windows of the Chinese version, Java is encoded as GBK by default, that is, although we have identified that we want to save XML in UTF-8 format, the files are actually saved in GBK format, so that is why we can use GBK, gb2312 encoding to generate xml files can be correctly parsed, and files generated in UTF-8 format cannot be parsed by the XML parser.
Encoding exception encountered during XML parsing:
org.dom4j.DocumentException: Invalid byte 1 of 1-byte UTF-8 sequence. Nested exception: Invalid byte 1 of 1-byte UTF-8 sequence.at org.dom4j.io.SAXReader.read(SAXReader.java:484)at org.dom4j.io.SAXReader.read(SAXReader.java:321)at com.dataoperate.PaseXml.pXml(PaseXml.java:28)at com.dataoperate.JdbcOp.insertDb(JdbcOp.java:30)at com.dataoperate.JdbcOp.main(JdbcOp.java:89)Nested exception: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:554)at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(XMLEntityScanner.java:487)at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2687)at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)at org.dom4j.io.SAXReader.read(SAXReader.java:465)at org.dom4j.io.SAXReader.read(SAXReader.java:321)at com.dataoperate.PaseXml.pXml(PaseXml.java:28)at com.dataoperate.JdbcOp.insertDb(JdbcOp.java:30)at com.dataoperate.JdbcOp.main(JdbcOp.java:89)Nested exception: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:554)at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(XMLEntityScanner.java:487)at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2687)at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)at org.dom4j.io.SAXReader.read(SAXReader.java:465)at org.dom4j.io.SAXReader.read(SAXReader.java:321)at com.dataoperate.PaseXml.pXml(PaseXml.java:28)at com.dataoperate.JdbcOp.insertDb(JdbcOp.java:30)at com.dataoperate.JdbcOp.main(JdbcOp.java:89)
Solution:
1. The simplest is to <? XML version = "1.0" encoding = "UTF-8"?> Change to <? XML version = "1.0" encoding = "GBK"?>
2, or to open another XML when the character set to UTF-8 save
3. Write the XML again during code parsing.
Saxreader reader = new saxreader (); Org. dom4j. document document = reader. read ("d :\\ ha. XML "); outputformat of = new outputformat ();. setencoding ("UTF-8"); // change the encoding method xmlwriter writer = new xmlwriter (New filewriter "D: \ dom4j. XML "), );
4. Use Io to read the data directly from dom4j and modify the character encoding.
FileInputStream in = new FileInputStream(new File(fileName));Reader read = new InputStreamReader(in,"gbk");Document document = reader.read(read);