How to use Java code to obtain the encoding method of a file, file stream, or string

Source: Internet
Author: User

How to use Java code to obtain the encoding method of a file, file stream, or string

Today, I learned how to use Java code to obtain the encoding method of files, file streams, or strings through network resources. I will share the Code with you:

Package com. ghj. packageoftool; import info. monitorenter. cpdetector. io. ASCIIDetector; import info. monitorenter. cpdetector. io. byteOrderMarkDetector; import info. monitorenter. cpdetector. io. codepageDetectorProxy; import info. monitorenter. cpdetector. io. JChardetFacade; import info. monitorenter. cpdetector. io. parsingDetector; import info. monitorenter. cpdetector. io. unicodeDetector; import java. io. byteArrayInputStr Eam; import java. io. file; import java. io. inputStream; import java.net. URL; import java. nio. charset. charset; /*** file tool class ** @ author Gao huanjie */public class FileTool {/*** get the encoding format of local files ** @ param file the file encoding format to be determined * * @ author Gao huanjie */public static String getLocalFileEncode (File localFile) {/** cpDetector is a detector that submits a probe task to an instance of a specific probe implementation class. * CpDetector has built-in common test implementation classes. Examples of these test implementation classes can be added through the add method, such as ParsingDetector, ByteOrderMarkDetector, JChardetFacade, ASCIIDetector, and UnicodeDetector. * The cpDetector returns the character set encoding that has been detected based on the principle of "who first returns the non-null test result, which prevails. CpDetector is based on statistical principles and cannot be completely correct. */CodepageDetectorProxy codepageDetector = CodepageDetectorProxy. getInstance (); codepageDetector. add (new ParsingDetector (false); // ParsingDetector can be used to check the encoding of HTML, XML, and other files or streams. Parameters in the constructor are used to indicate whether to display detailed information about the probe process, false is not displayed. CodepageDetector. add (JChardetFacade. getInstance (); // JChardetFacade encapsulates JChardet provided by Mozilla, which can complete encoding determination for most files. Therefore, this detector can meet the requirements of most projects. If you are not at ease, you can add more detectors, such as ASCIIDetector and UnicodeDetector. CodepageDetector. add (new ByteOrderMarkDetector (); codepageDetector. add (ASCIIDetector. getInstance (); // ASCIIDetector used for ASCII encoding determination codepageDetector. add (UnicodeDetector. getInstance (); // UnicodeDetector is used for Unicode Family Encoding determination Charset charset = null; try {Charset = codepageDetector. detectCodepage (localFile. toURI (). toURL (); if (charset! = Null) {return charset. name () ;}} catch (Exception e) {e. printStackTrace ();} return null ;} /*** obtain the encoding format of the remote URL File ** @ param url URL path of the Remote File ** @ author Gao huanjie */public static String getURLFileEncode (url URL) {/** cpDetector is a detector that submits a probe task to an instance of a specific probe implementation class. * CpDetector has built-in common test implementation classes. Examples of these test implementation classes can be added through the add method, such as ParsingDetector, ByteOrderMarkDetector, JChardetFacade, ASCIIDetector, and UnicodeDetector. * The cpDetector returns the character set encoding that has been detected based on the principle of "who first returns the non-null test result, which prevails. CpDetector is based on statistical principles and cannot be completely correct. */CodepageDetectorProxy codepageDetector = CodepageDetectorProxy. getInstance (); codepageDetector. add (new ParsingDetector (false); // ParsingDetector can be used to check the encoding of HTML, XML, and other files or streams. Parameters in the constructor are used to indicate whether to display detailed information about the probe process, false is not displayed. CodepageDetector. add (JChardetFacade. getInstance (); // JChardetFacade encapsulates JChardet provided by Mozilla, which can complete encoding determination for most files. Therefore, this detector can meet the requirements of most projects. If you are not at ease, you can add more detectors, such as ASCIIDetector and UnicodeDetector. CodepageDetector. add (ASCIIDetector. getInstance (); // ASCIIDetector used for ASCII encoding determination codepageDetector. add (UnicodeDetector. getInstance (); // UnicodeDetector is used for Unicode Family Encoding determination Charset charset = null; try {Charset = codepageDetector. detectCodepage (url); if (charset! = Null) {return charset. name () ;}} catch (Exception e) {e. printStackTrace ();} return null ;} /*** get the encoding format of the file stream ** @ param inputStream file stream ** @ author Gao huanjie */public static String getInputStreamEncode (InputStream inputStream) {/** cpDetector is a detector that submits a probe task to an instance of a specific probe implementation class. * CpDetector has built-in common test implementation classes. Examples of these test implementation classes can be added through the add method, such as ParsingDetector, ByteOrderMarkDetector, JChardetFacade, ASCIIDetector, and UnicodeDetector. * The cpDetector returns the character set encoding that has been detected based on the principle of "who first returns the non-null test result, which prevails. CpDetector is based on statistical principles and cannot be completely correct. */CodepageDetectorProxy codepageDetector = CodepageDetectorProxy. getInstance (); codepageDetector. add (new ParsingDetector (false); // ParsingDetector can be used to check the encoding of HTML, XML, and other files or streams. Parameters in the constructor are used to indicate whether to display detailed information about the probe process, false is not displayed. CodepageDetector. add (JChardetFacade. getInstance (); // JChardetFacade encapsulates JChardet provided by Mozilla, which can complete encoding determination for most files. Therefore, this detector can meet the requirements of most projects. If you are not at ease, you can add more detectors, such as ASCIIDetector and UnicodeDetector. CodepageDetector. add (ASCIIDetector. getInstance (); // ASCIIDetector used for ASCII encoding determination codepageDetector. add (UnicodeDetector. getInstance (); // UnicodeDetector is used for Unicode Family Encoding determination Charset charset = null; try {Charset = codepageDetector. detectCodepage (inputStream, 0); if (charset! = Null) {return charset. name () ;}} catch (Exception e) {e. printStackTrace ();} return null ;} /*** get the encoding format of the String ** @ param stringValue the encoding format of the file to be judged. *** @ author Gao huanjie */public static String getStringEncode (String stringValue) {/** cpDetector is a detector that submits a probe task to an instance of a specific probe implementation class. * CpDetector has built-in common test implementation classes. Examples of these test implementation classes can be added through the add method, such as ParsingDetector, ByteOrderMarkDetector, JChardetFacade, ASCIIDetector, and UnicodeDetector. * The cpDetector returns the character set encoding that has been detected based on the principle of "who first returns the non-null test result, which prevails. CpDetector is based on statistical principles and cannot be completely correct. */CodepageDetectorProxy codepageDetector = CodepageDetectorProxy. getInstance (); codepageDetector. add (new ParsingDetector (false); // ParsingDetector can be used to check the encoding of HTML, XML, and other files or streams. Parameters in the constructor are used to indicate whether to display detailed information about the probe process, false is not displayed. CodepageDetector. add (JChardetFacade. getInstance (); // JChardetFacade encapsulates JChardet provided by Mozilla, which can complete encoding determination for most files. Therefore, this detector can meet the requirements of most projects. If you are not at ease, you can add more detectors, such as ASCIIDetector and UnicodeDetector. CodepageDetector. add (ASCIIDetector. getInstance (); // ASCIIDetector used for ASCII encoding determination codepageDetector. add (UnicodeDetector. getInstance (); // UnicodeDetector is used for Unicode Family Encoding determination Charset charset Charset = null; try {InputStream inputStream = new ByteArrayInputStream (stringValue. getBytes (); charset = codepageDetector. detectCodepage (inputStream, 3); if (charset! = Null) {return charset. name () ;}catch (Exception e) {e. printStackTrace () ;}return null ;}}

Because the above Code depends on a lot of jar packages, please download the Demo developed using MyEclipse directly.

Download Demo at 0]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.