When working with strings, you often encounter an unknown encoding of the string, and if you do not know the encoding of the string, you cannot convert the string to the required encoding. In the face of many different encoding input methods, will there be an effective encoding method? Chardet is a very good code recognition module. Chardet is a third-party library of Python that needs to be downloaded and installed. The downloaded addresses are: 1. Recommended Address: http://download.csdn.net/download/aqwd2008/4256178 2. Official Address: http://pypi.python.org/pypi/ Chardet If you use the source code installation method, you may be prompted missing setuptools this module. So here we use another more convenient installation method, no matter which installation package you choose, unzip it to the folder "Chardet" to copy this folder to "Python installation root \lib\site-packages", Make sure this location can be referenced by Python. If it cannot be referenced, join the environment variable. After installing the Chardet module, I can use it to see a sample code.
Import chardet import urllib #可根据需要, select different data TestData = Urllib.urlopen (' http://www.baidu.com/'). Read () print Chardet.detect (TestData) Run Result: {' confidence ': 0.99, ' encoding ': ' GB2312 '}
The running result indicates that there is a 99% probability that this code is GB2312 encoded. Another relatively high-level application.
ImportUrllib fromChardet.universaldetectorImportUniversaldetector Usock= Urllib.urlopen ('http://www.baidu.com/') #Create a Detection objectdetector =Universaldetector () forLineinchusock.readlines ():#The block is tested until the threshold has been reachedDetector.feed (line)ifDetector.done: Break #Close the Detection objectdetector.close () usock.close ( )#Output Test ResultsPrintDetector.result Run Result: {'confidence': 0.99,'encoding':'GB2312'}
Application background, if you want to encode a large file, using this advanced method, you can read only one, to identify the encoding method to improve the detection speed.
Python module Chardet Download and introduction