Due to the working relationship, it is often necessary to obtain the encoding information of a string to prevent garbled characters. I found the following libraries online:
1. C #
Https://code.google.com/p/ude/ probe library
Ude is a C # port of Mozilla universal charset detector.
The original source code is available:
Http://mxr.mozilla.org/mozilla/source/extensions/universalchardet/src/
Http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
Http://mxr.mozilla.org/mozilla-central/source/extensions/universalchardet/doc/UniversalCharsetDetection.doc
2. Java
Http://code.google.com/p/juniversalchardet/
3. Python
Http://chardet.feedparser.org/
4. c ++
IBM has an open source library ICU, http://site.icu-project.org/Conversion
Linux
Enca: http://freecode.com/projects/enca probe and conversion Library
C ++ version of Mozilla code
Http://code.google.com/p/uchardet/ probe library
Reference
Http://blog.csdn.net/xian0617/article/details/6706107
Https://www.byvoid.com/blog/tag/mozilla
Http://www.linuxidc.com/Linux/2011-05/35769.htm
Http://blog.csdn.net/wangyonggang/article/details/927
Enca, uchardet, ICU, Ude,
-------------------
Import the Library:
Using ude;
And feed a stream or a byte array to the detector. Call dataend to sort y the detector that
You want back the result:
Icharsetdetector cdet = new charsetdetector ();
Byte [] buff = new byte [1024];
Int read;
While (read = stream. Read (buff, 0, Buff. Length)> 0 &&! Done ){
Cdet. Feed (buff, 0, read );
}
Cdet. dataend ();
Console. writeline ("charset: {0}, confidence: {1}, cdet. charset, cdet. Confidence );
Alternatively, you can feed a stream to the detector:
Using (filestream FS = file. openread (filename )){
Icharsetdetector cdet = new charsetdetector ();
Cdet. Feed (FS );
Cdet. dataend ();
Console. writeline ("charset: {0}, confidence: {1}, cdet. charset, cdet. Confidence );
}