. NET automatic character encoding recognition program Library Nchardet

Source: Internet
Author: User
Tags bool interface
Coding | program

What is Nchardet

Nchardet is a. NET implementation of the Mozilla automatic character code recognition Program Library Chardet, which is ported from the Jchardet,chardet Java version to enable code detection for a given character stream.

How Nchardet is working.

Nchardet guesses the encoding by comparing the input characters individually; because it is guessing, there may be situations where it is not fully identifiable; if the input character cannot determine the correct encoding, then Nchardet will give a set of possible encoding values.

How to use Nchardet

To use Nchardet to probe the encoding, the following steps are required.

1, using the developed language clues to construct the instance object of the detector class.
2, using the object that implements the Icharsetdetectionobserver interface as a parameter to invoke the Init method of the detector class.
3, incoming to probe the character stream for code detection.
4, call the detector class Dataend method.
5, get the result or possible result set.

A language clue is an integer, and the available language cues are as follows:

1. Japanese
2. Chinese
3. Simplified Chinese
4. Traditional Chinese
5. Korean
6. Dont Know (default)


The Icharsetdetectionobserver interface has only one notify method, and when the Nchardet engine thinks it has detected the correct encoding, it calls the Notify method. The user program can be notified from this nodify method (overriding the Notify implementation of the Icharsetdetectionobserver interface).

Code instance:


 //Implement Icharsetdetectionobserver interface
    public class Mycharsetdetectionobserver:
        nchardet.icharsetdetectionobserver
    {
         public string Charset = null;
       
        public void Notify ( String charset)
        {
             Charset = Charset;
       }
   }

        int lang = 2;//
   //Instantiate detector with specified language parameters
         detector det = new detector (lang);
   //initialization
        mycharsetdetectionobserver CDO = new Mycharsetdetectionobserver ();
        Det. Init (CDO);

Input character streams
Uri url = new Uri ("http://cn.yahoo.com");
HttpWebRequest request =
HttpWebRequest) webrequest.create (URL);
HttpWebResponse response =
(HttpWebResponse) Request. GetResponse ();
Stream stream = Response. GetResponseStream ();

byte[] buf = new byte[1024];
int Len;
bool done = false;
bool Isascii = true;

while (Len=stream. Read (buf,0,buf. Length))!= 0) {
Probing whether ASCII-encoded
if (ISASCII)
Isascii = Det.isascii (Buf,len);

If it is not ASCII encoding and the encoding is not determined, continue probing
if (!isascii &&!done)
Done = Det. DoIt (Buf,len, false);

}
Stream. Close ();
Stream. Dispose ();
Call the Datend method,
If the engine thinks the correct code has been detected,
The Icharsetdetectionobserver notify method is invoked at this time
Det. Dataend ();

if (ISASCII) {
Console.WriteLine ("CHARSET = ASCII");
Found = true;
}
else if (CDO. Charset!= null)
{
Console.WriteLine ("CHARSET = {0}", CDO. Charset);
Found = true;
}

if (!found) {
string[] prob = Det.getprobablecharsets ();
for (int i=0; i<prob. Length; i++) {
Console.WriteLine ("Probable Charset =" + prob[i]);
}
}
Console.ReadLine ();



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.