Python uses chardet to determine character encoding
This article mainly introduces how to use chardet to determine character encoding in Python. It gives a detailed analysis of chardet functions, installation and usage skills in Python. For more information, see
Chardet in Python is used to implement the string/file encoding detection Template
1. Download and install chardet
: Http://pypi.python.org/pypi/chardet
After downloading chardet, decompress the chardet compressed package and put the chardet folder directly under the application directory, you can use import chardet to start using chardet, or copy chardet to the Python system directory, in this way, all your python programs only need to use import chardet.
?
1 |
Python setup. py install |
2. Instance
In use, chardet. detect () returns the dictionary, where confidence is the detection accuracy, and encoding is the encoding form.
(1) webpage encoding judgment:
?
1 2 3 4 5 |
>>> Import urllib >>> Rawdata = urllib. urlopen ('HTTP: // www.google.cn/'). read () >>> Import chardet >>> Chardet. detect (rawdata) {'Confidence ': 0.98999999999999999, 'encoding': 'gb2312 '} |
(2) file encoding judgment
?
1 2 3 4 5 6 7 |
Import chardet Tt = open ('C: \ 111.txt ', 'rb ') Ff = tt. readline () # You can change the value to read (5), but an error is returned after the value is changed to readlines (). Enc = chardet. detect (ff) Print enc ['encoding'] Tt. close () |
I hope this article will help you with Python programming.