How to use Java code to get the encoding of a file, file stream, or string

Source: Internet
Author: User

Today, through the network resources to study how to use Java code to get the file, file stream or string encoding, now share the code with you:

Package Com.ghj.packageoftool;import Info.monitorenter.cpdetector.io.asciidetector;import Info.monitorenter.cpdetector.io.byteordermarkdetector;import Info.monitorenter.cpdetector.io.codepagedetectorproxy;import Info.monitorenter.cpdetector.io.JChardetFacade; Import Info.monitorenter.cpdetector.io.parsingdetector;import info.monitorenter.cpdetector.io.UnicodeDetector; Import Java.io.bytearrayinputstream;import java.io.file;import Java.io.inputstream;import Java.net.URL;import java.nio.charset.charset;/** * File Tool class * * @author Gao Yingjie */public class Filetool {/** * get the encoding format of the local file * * @param file encoding to be judged  Format * * @author Gao Yingjie */public static String Getlocalfileencode (File localfile) {/* * Cpdetector is a probe that gives the probe task to an instance of a specific probe implementation class. * Cpdetector contains a number of commonly used probe implementation classes, which can be added through the Add method, such as Parsingdetector, Byteordermarkdetector, Jchardetfacade, Asciidetector, Unicodedetector. * Cpdetector returns the detected character set encoding in accordance with the "who first returns non-null probe results, whichever is the result". Cpdetector is based on statistical principles and is not guaranteed to be completely correct. */codepagedetectorproxy Codepagedetector = CodepagedetectorproXy.getinstance (); Codepagedetector.add (new Parsingdetector (false));//parsingdetector can be used to examine the encoding of HTML, XML, and other files or character streams. The parameters in the constructor method are used to indicate whether the details of the probing process are displayed, and false is not displayed. Codepagedetector.add (Jchardetfacade.getinstance ());//jchardetfacade encapsulates the jchardet provided by the Mozilla organization, which allows the encoding of most files to be measured. Therefore, generally with this detector can meet the requirements of most projects, if you are not at ease, you can add a few more detectors, such as the following asciidetector, Unicodedetector and so on.  Codepagedetector.add (New Byteordermarkdetector ()); Codepagedetector.add (Asciidetector.getinstance ());//asciidetector for the determination of ASCII encoding Codepagedetector.add ( Unicodedetector.getinstance ());//unicodedetector for the determination of Unicode family encoding CharSet charset = null;try {charset = Codepagedetector.detectcodepage (Localfile.touri (). Tourl ()); if (charset! = null) {return charset.name ();}} catch (Exception e) {e.printstacktrace ();} return null;} /** * Get the encoding format of the remote URL file * * @param URL path of the remote file * * @author Gao Yingjie */public static String geturlfileencode (URL url) {/* * CP Detector is a probe that gives the detection task to a specific instance of the probe implementation class. * Cpdetector contains a number of commonly used probe implementation classes, which can be added through the Add method, such as Parsingdetector, Byteordermarkdetector, JCHARDETFACade, Asciidetector, Unicodedetector. * Cpdetector returns the detected character set encoding in accordance with the "who first returns non-null probe results, whichever is the result". Cpdetector is based on statistical principles and is not guaranteed to be completely correct. */codepagedetectorproxy codepagedetector = Codepagedetectorproxy.getinstance (); Codepagedetector.add (new Parsingdetector (false));//parsingdetector can be used to check the encoding of HTML, XML, and other files or character streams, and the parameters in the construction method are used to indicate whether the details of the probing process are displayed, and false is not displayed. Codepagedetector.add (Jchardetfacade.getinstance ());//jchardetfacade encapsulates the jchardet provided by the Mozilla organization, which allows the encoding of most files to be measured. Therefore, generally with this detector can meet the requirements of most projects, if you are not at ease, you can add a few more detectors, such as the following asciidetector, Unicodedetector and so on. Codepagedetector.add (Asciidetector.getinstance ());//asciidetector for the determination of ASCII encoding Codepagedetector.add ( Unicodedetector.getinstance ());//unicodedetector for the determination of Unicode family encoding CharSet charset = null;try {charset = Codepagedetector.detectcodepage (URL); if (charset! = null) {return charset.name ();}} catch (Exception e) {e.printstacktrace ();} return null;} /** * Get file Stream encoding format * * @param inputstream file stream * * @author Gao Yingjie */public static String getinputstreamencode (InputStream Inpu TStream) {/* * Cpdetector is a probe thatThe detection task is given to a specific instance of the detection implementation class to complete. * Cpdetector contains a number of commonly used probe implementation classes, which can be added through the Add method, such as Parsingdetector, Byteordermarkdetector, Jchardetfacade, Asciidetector, Unicodedetector. * Cpdetector returns the detected character set encoding in accordance with the "who first returns non-null probe results, whichever is the result". Cpdetector is based on statistical principles and is not guaranteed to be completely correct. */codepagedetectorproxy codepagedetector = Codepagedetectorproxy.getinstance (); Codepagedetector.add (new Parsingdetector (false));//parsingdetector can be used to check the encoding of HTML, XML, and other files or character streams, and the parameters in the construction method are used to indicate whether the details of the probing process are displayed, and false is not displayed. Codepagedetector.add (Jchardetfacade.getinstance ());//jchardetfacade encapsulates the jchardet provided by the Mozilla organization, which allows the encoding of most files to be measured. Therefore, generally with this detector can meet the requirements of most projects, if you are not at ease, you can add a few more detectors, such as the following asciidetector, Unicodedetector and so on. Codepagedetector.add (Asciidetector.getinstance ());//asciidetector for the determination of ASCII encoding Codepagedetector.add ( Unicodedetector.getinstance ());//unicodedetector for the determination of Unicode family encoding CharSet charset = null;try {charset = Codepagedetector.detectcodepage (InputStream, 0); if (charset! = null) {return charset.name ();}} catch (Exception e) {e.printstacktrace ();} return null;} /** * Get the encoded format of the string * * @param stringvalue to determine the file encoding format string * * @author Gao Yingjie */public static string Getstringencode (String stringvalue) {/* * Cpdete ctor is a probe that gives the detection task to a specific instance of the probe implementation class. * Cpdetector contains a number of commonly used probe implementation classes, which can be added through the Add method, such as Parsingdetector, Byteordermarkdetector, Jchardetfacade, Asciidetector, Unicodedetector. * Cpdetector returns the detected character set encoding in accordance with the "who first returns non-null probe results, whichever is the result". Cpdetector is based on statistical principles and is not guaranteed to be completely correct. */codepagedetectorproxy codepagedetector = Codepagedetectorproxy.getinstance (); Codepagedetector.add (new Parsingdetector (false));//parsingdetector can be used to check the encoding of HTML, XML, and other files or character streams, and the parameters in the construction method are used to indicate whether the details of the probing process are displayed, and false is not displayed. Codepagedetector.add (Jchardetfacade.getinstance ());//jchardetfacade encapsulates the jchardet provided by the Mozilla organization, which allows the encoding of most files to be measured. Therefore, generally with this detector can meet the requirements of most projects, if you are not at ease, you can add a few more detectors, such as the following asciidetector, Unicodedetector and so on. Codepagedetector.add (Asciidetector.getinstance ());//asciidetector for the determination of ASCII encoding Codepagedetector.add ( Unicodedetector.getinstance ());//unicodedetector for the determination of Unicode family encoding CharSet charset = null;try {InputStream InputStream = new BytearrayinputstreAM (Stringvalue.getbytes ()); charset = Codepagedetector.detectcodepage (InputStream, 3); if (charset! = null) {return Charset.name ();}} catch (Exception e) {e.printstacktrace ();} return null;}}

Since the above code relies on many jar packages, please download the demo that was developed using myeclipse directly.

0 min Download demo

How to use Java code to get the encoding of a file, file stream, or string

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.