Coding | program
What is Nchardet
Nchardet is a. NET implementation of the Mozilla automatic character code recognition Program Library Chardet, which is ported from the Jchardet,chardet Java version to enable code detection for a given character stream.
How Nchardet is working.
Nchardet guesses the encoding by comparing the input characters individually; because it is guessing, there may be situations where it is not fully identifiable; if the input character cannot determine the correct encoding, then Nchardet will give a set of possible encoding values.
How to use Nchardet
To use Nchardet to probe the encoding, the following steps are required.
1, using the developed language clues to construct the instance object of the detector class.
2, using the object that implements the Icharsetdetectionobserver interface as a parameter to invoke the Init method of the detector class.
3, incoming to probe the character stream for code detection.
4, call the detector class Dataend method.
5, get the result or possible result set.
A language clue is an integer, and the available language cues are as follows:
1. Japanese
2. Chinese
3. Simplified Chinese
4. Traditional Chinese
5. Korean
6. Dont Know (default)
The Icharsetdetectionobserver interface has only one notify method, and when the Nchardet engine thinks it has detected the correct encoding, it calls the Notify method. The user program can be notified from this nodify method (overriding the Notify implementation of the Icharsetdetectionobserver interface).
Code instance:
//Implement Icharsetdetectionobserver interface
public class Mycharsetdetectionobserver:
nchardet.icharsetdetectionobserver
{
public string Charset = null;
public void Notify ( String charset)
{
Charset = Charset;
}
}
int lang = 2;//
//Instantiate detector with specified language parameters
detector det = new detector (lang);
//initialization
mycharsetdetectionobserver CDO = new Mycharsetdetectionobserver ();
Det. Init (CDO);
Input character streams
Uri url = new Uri ("http://cn.yahoo.com");
HttpWebRequest request =
HttpWebRequest) webrequest.create (URL);
HttpWebResponse response =
(HttpWebResponse) Request. GetResponse ();
Stream stream = Response. GetResponseStream ();
byte[] buf = new byte[1024];
int Len;
bool done = false;
bool Isascii = true;
while (Len=stream. Read (buf,0,buf. Length))!= 0) {
Probing whether ASCII-encoded
if (ISASCII)
Isascii = Det.isascii (Buf,len);
If it is not ASCII encoding and the encoding is not determined, continue probing
if (!isascii &&!done)
Done = Det. DoIt (Buf,len, false);
}
Stream. Close ();
Stream. Dispose ();
Call the Datend method,
If the engine thinks the correct code has been detected,
The Icharsetdetectionobserver notify method is invoked at this time
Det. Dataend ();
if (ISASCII) {
Console.WriteLine ("CHARSET = ASCII");
Found = true;
}
else if (CDO. Charset!= null)
{
Console.WriteLine ("CHARSET = {0}", CDO. Charset);
Found = true;
}
if (!found) {
string[] prob = Det.getprobablecharsets ();
for (int i=0; i<prob. Length; i++) {
Console.WriteLine ("Probable Charset =" + prob[i]);
}
}
Console.ReadLine ();