Python module Chardet Download and introduction

Source: Internet
Author: User

When working with strings, you often encounter an unknown encoding of the string, and if you do not know the encoding of the string, you cannot convert the string to the required encoding. In the face of many different encoding input methods, will there be an effective encoding method? Chardet is a very good code recognition module. Chardet is a third-party library of Python that needs to be downloaded and installed. The downloaded addresses are: 1. Recommended Address: http://download.csdn.net/download/aqwd2008/4256178 2. Official Address: http://pypi.python.org/pypi/ Chardet If you use the source code installation method, you may be prompted missing setuptools this module. So here we use another more convenient installation method, no matter which installation package you choose, unzip it to the folder "Chardet" to copy this folder to "Python installation root \lib\site-packages", Make sure this location can be referenced by Python. If it cannot be referenced, join the environment variable. After installing the Chardet module, I can use it to see a sample code.
Import chardet  import urllib    #可根据需要, select different data  TestData = Urllib.urlopen (' http://www.baidu.com/'). Read ()  print Chardet.detect (TestData)    Run Result:  {' confidence ': 0.99, ' encoding ': ' GB2312 '}  
The running result indicates that there is a 99% probability that this code is GB2312 encoded. Another relatively high-level application.
ImportUrllib fromChardet.universaldetectorImportUniversaldetector Usock= Urllib.urlopen ('http://www.baidu.com/')  #Create a Detection objectdetector =Universaldetector () forLineinchusock.readlines ():#The block is tested until the threshold has been reachedDetector.feed (line)ifDetector.done: Break  #Close the Detection objectdetector.close () usock.close ( )#Output Test ResultsPrintDetector.result Run Result: {'confidence': 0.99,'encoding':'GB2312'}

Application background, if you want to encode a large file, using this advanced method, you can read only one, to identify the encoding method to improve the detection speed.

Python module Chardet Download and introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.