When you use Jdom for XML parsing, you encounter the following error:
java.net.SocketException:Unexpected end of file from server
At Sun.net.www.http.HttpClient.parseHTTPHeader (Unknown Source)
At Sun.net.www.http.HttpClient.parseHTTP (Unknown Source)
At Sun.net.www.http.HttpClient.parseHTTPHeader (Unknown Source)
At Sun.net.www.http.HttpClient.parseHTTP (Unknown Source)
At Sun.net.www.protocol.http.HttpURLConnection.getInputStream (Unknown Source)
At Org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity (Unknown Source)
At Org.apache.xerces.impl.XMLEntityManager.startEntity (Unknown Source)
At Org.apache.xerces.impl.XMLEntityManager.startDTDEntity (Unknown Source)
At Org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource (Unknown Source)
At Org.apache.xerces.impl.xmldocumentscannerimpl$dtddispatcher.dispatch (Unknown Source)
At Org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument (Unknown Source)
At Org.apache.xerces.parsers.XML11Configuration.parse (Unknown Source)
At Org.apache.xerces.parsers.XML11Configuration.parse (Unknown Source)
At Org.apache.xerces.parsers.XMLParser.parse (Unknown Source)
At Org.apache.xerces.parsers.AbstractSAXParser.parse (Unknown Source)
At Org.jdom.input.SAXBuilder.build (saxbuilder.java:453)
At Org.jdom.input.SAXBuilder.build (saxbuilder.java:891)
At Cn.edu.ruc.web.wrappers.JDomWrapper.getRootElement (jdomwrapper.java:36)
At Cn.edu.ruc.web.wrappers.JDomWrapper.getText (jdomwrapper.java:75)
At Cn.edu.ruc.web.WebsiteXMLBuild.main (websitexmlbuild.java:131)
Originally thought is the file naming method problem, causes Jdom to analyze this file, mistakenly thought is the online file, then goes to the Internet downloads this to analyze XML file, the result downloads not, therefore throws this link exception. Later, after trying to find out, regardless of the file name modification, this exception will be thrown.
So check the contents of the XML file, found in the top of the XML file, there is an HTML file most commonly used in the DTD format URL:
The code is as follows |
Copy Code |
<! DOCTYPE HTML PUBLIC "-//w3c//dtd HTML 4.01 transitional//en" "Http://www.w3.org/TR/html4/loose.dtd" > |
It suddenly dawned on me that it was jdom based on this link to crawl the Loose.dtd file, resulting in errors.
Workaround: Remove the above statement from the XML file, and Jdom can parse the XML file normally.