Overview
When the browser opens a Web page, the first task is to determine the encoding format of the Web page, and then use the appropriate encoding for parsing; our commonly used text editors also need to determine the encoding of the document to parse when opening the document. This is related to the technology is coding screening, below we introduce a more useful Java library.
At http://sourceforge.net/projects/cpdetector/this address can be downloaded to.
Instance
Do not do too much to repeat, directly paste out the instance code.
PackageCom.coder4j.main.cpdetector;ImportInfo.monitorenter.cpdetector.io.ASCIIDetector;ImportInfo.monitorenter.cpdetector.io.ByteOrderMarkDetector;ImportInfo.monitorenter.cpdetector.io.CodepageDetectorProxy;ImportInfo.monitorenter.cpdetector.io.JChardetFacade;ImportInfo.monitorenter.cpdetector.io.ParsingDetector;ImportInfo.monitorenter.cpdetector.io.UnicodeDetector;Importjava.net.MalformedURLException;ImportJava.net.URL;/*** Import the following jar<br> * cpdetector_1.0.10.jar,antlr-2.7.4.jar,chardet-1.0.jar* *@authorChinaxiang * @date 2015-10-11**/ Public classUsecpdetector {/*** Get the encoding of the URL * *@paramURL *@return*/ Public StaticString geturlencode (url url) {/** Detector is a detector that gives the detection task to a specific instance of the probe implementation class. * Cpdetector contains a number of commonly used probe implementation classes, which can be added through the Add method, such as Parsingdetector, * jchardetfacade, Asciidetector, Unicodedetector. * Detector returns the detected * Character set encoding in accordance with the "who first returns non-null probe results, whichever is the result". Use three third-party jar packages: Antlr.jar, Chardet.jar, and Cpdetector.jar * Cpdetector are based on statistical principles and are not guaranteed to be completely correct. */Codepagedetectorproxy Detector=codepagedetectorproxy.getinstance ();/** Parsingdetector can be used to check the encoding of HTML, XML and other files or character streams, and the parameters in the construction method are used to indicate whether the details of the probing process are displayed, and false is not displayed. */Detector.add (NewParsingdetector (false)); Detector.add (Newbyteordermarkdetector ());/** The Jchardetfacade encapsulates the jchardet provided by the Mozilla organization, which can be used to encode and measure most files. Therefore, generally with this detector can meet the requirements of most projects, if you are not at ease, you can * add a few more detectors, such as the following asciidetector, Unicodedetector and so on. * * used in Antlr.jar, Chardet.jar*/Detector.add (Jchardetfacade.getinstance ());//Asciidetector for ASCII code determinationDetector.add (Asciidetector.getinstance ());//Unicodedetector for the determination of Unicode family codesDetector.add (Unicodedetector.getinstance ()); Java.nio.charset.Charset CharSet=NULL; Try{CharSet=detector.detectcodepage (URL);} Catch(Exception ex) {ex.printstacktrace ();}if(CharSet! =NULL) { returncharset.name ();} return NULL;} Public Static voidMain (string[] args) {Try{URL URL=NewURL ("http://www.baidu.com"); String encode=geturlencode (URL); System.out.println (encode);//UTF-8}Catch(malformedurlexception e) {e.printstacktrace (); }}}View Code
The path to the file can also be converted to a URL, so you should be able to determine the file encoding.
Cpdetector Code Identification