Chardet is a very good code recognition module.
Chardet is a third-party library of Python that needs to be downloaded and installed, and placed under the Python installation root directory \lib\site-packages
import Chardet import Urllib # Depending on your needs, you can select different data TestData = Urllib.urlopen ( http://www.baidu.com/ print Chardet.detect (TestData) Run result: { confidence : 0.99, " encoding : " gb2312 "}
The running result indicates that there is a 99% probability that this code is GB2312 encoded.
More advanced Applications:
ImportUrllib fromChardet.universaldetectorImportUniversaldetector Usock= Urllib.urlopen ('http://www.baidu.com/') #Create a Detection objectdetector =Universaldetector () forLineinchusock.readlines ():#The block is tested until the threshold has been reachedDetector.feed (line)ifDetector.done: Break #Close the Detection objectdetector.close () usock.close ( )#Output Test ResultsPrintDetector.result Run Result: {'confidence': 0.99,'encoding':'GB2312'}
Application background, if you want to encode a large file, using this advanced method, you can read only one, to identify the encoding method to improve the detection speed.
Reference: http://cache.baiducontent.com/c?m= 9f65cb4a8c8507ed4fece7631046893b4c4380146d96864968d4e414c42246071c35bff37d651304d2d82f2747f41802bded602571507be9dad58249d 7be942d2d9c6269304a8903599543f2975125b071ca09a9f94ea1&p=9e3f865bc5904ead08e2947d0f5da5&newp= 9234c64ad48309f30cbd9b7e0e148b231610db2151d7d3146b82c825d7331b001c3bbfb423221b01d7c6776302aa4856e8f732743c0821a3dda5c91d9 Fb4c57479c86f6824&user=baidu&fm=sc&query=python+chardet&qid=a082077700050a7d&p1=1
Python third-party library Chardet