Java implementation of File encoding monitoring
Recently in doing a document translation project, can document the encoding do not know, listen to headache. Tried a lot of ways to finally find out jchardet this tool can easily solve this problem. So make this note in the future to remind yourself and help and need people.
Package com.uujava.mbfy.test;Import Java.io.BufferedInputStream;Import Java.io.File;Import Java.io.FileInputStream;Import java.io.FileNotFoundException;Import java.io.IOException;Import Org.mozilla.intl.chardet.nsDetector;Import Org.mozilla.intl.chardet.nsICharsetDetectionObserver;/********************************************** * Maven * <!--for file encoding check-*<dependency> *<groupid> Net.sourceforge.jchardet</groupid> *<artifactid>jchardet</artifactid> *<version>1.0</ Version> *</dependency> * *********************************************//** * Jchardet Get file Character Set Jchardet * is the Java porting of Mozilla automatic character set detection algorithm code, its official homepage is: * http://jchardet.sourceforge.net/*/PublicClassFilecharsetdetector {PrivateBoolean found =False/** * If a character set detection algorithm is fully matched, this property holds the name of the character set. * Otherwise (such as a binary file) its value is the default value null, then the property should be queried */Private String encoding =NullPublicStaticvoidMain (string[] argv)Throws Exception {System.out.println ("File code:" +New Filecharsetdetector (). Guestfileencoding ("/home/k/documents/test/azmind_7_xh/azmind_7_xh/routing Management. txt"));}/** * Pass in a file object, check the file encoding * *@param file * File Object instance *@return file Encoding, if none, returns NULL *@throws FileNotFoundException *@throws IOException * *Public StringGuestfileencoding (File file)Throws Filenotfoundexception,ioexception {return geestfileencoding (file,New Nsdetector ());}/** * Get the encoding of the file * *@param file * File Object instance *@param languagehint * Language hint area code eg:1: Japanese; 2:chinese; 3:simplified Chinese; * 4:traditional Chinese; 5:korean; 6:dont Know (default) *@return file encoding, eg:utf-8,gbk,gb2312 form, if none, returns NULL *@throws FileNotFoundException *@throws IOException * *Public StringGuestfileencoding (File file,int languagehint)Throws FileNotFoundException, IOException {return geestfileencoding (file,New Nsdetector (Languagehint));}/** * Get the encoding of the file * *@param path * File paths *@return file encoding, eg:utf-8,gbk,gb2312 form, if none, returns NULL *@throws FileNotFoundException *@throws IOException * *Public StringGuestfileencoding (String Path)Throws Filenotfoundexception,ioexception {Return guestfileencoding (New File (path);}/** * Get the encoding of the file * *@param path * File paths *@param languagehint * Language hint area code eg:1: Japanese; 2:chinese; 3:simplified Chinese; * 4:traditional Chinese; 5:korean; 6:dont Know (default) *@return *@throws FileNotFoundException *@throws IOException * *Public StringGuestfileencoding (String Path,int languagehint)Throws FileNotFoundException, IOException {Return guestfileencoding (New File (path), languagehint);}/** * Get the encoding of the file * *@param file *@param det *@return *@throws FileNotFoundException *@throws IOException * *Private StringGeestfileencoding (file file, nsdetector det)Throws FileNotFoundException, IOException {Set an observer ...The Notify () would be called if a matching charset is Found.det.Init (New Nsicharsetdetectionobserver () {PublicvoidNotify (String charset) {found =true;encoding = CharSet;}}); Bufferedinputstream imp =New Bufferedinputstream (New FileInputStream (file));byte[] buf =Newbyte[1024];int Len;Boolean done =FalseBoolean isascii =Truewhile (len = Imp.read (buf,0, buf.length))! =-1) {//Check If the stream is only ASCII. if (isascii) Isascii = Det.isascii (buf, Len); //DoIt if non-ascii and not do yet. if (!isascii &&!done) done = Det. DoIt (buf, Len, false);} Det. Dataend (); if (isascii) {encoding = "ASCII"; found = true;} if (!found) {String prob[] = Det.getprobablecharsets (); if (Prob.length > 0) {//In the absence of a discovery case, take the first possible encoding encoding = prob[0];} else {return null;}} return encoding;}}
Http://www.cnblogs.com/mxcy/p/4008342.html
Java implementation file encoding monitoring (RPM)