Python character encoding Judgment Method Analysis, python character encoding judgment
This document describes how to determine the encoding of Python characters. We will share this with you for your reference. The details are as follows:
Method 1:
Isinstance (s, str) is used to determine whether it is a general string
Isinstance (s, unicode) is used to determine whether it is unicode
Or
if type(str).__name__!="unicode":str=unicode(str,"utf-8")else:pass
Method 2:
Python chardet character encoding judgment
Chardet can be used to conveniently detect the encoding of strings/files. In particular, for Chinese Web pages, some use GBK/GB2312 and some use UTF8. If you need to crawl some pages, it is very important to know the webpage encoding. Although the HTML page has the charset tag, but sometimes it is incorrect. Then chardet will help us a lot.
Chardet instance
>>> Import urllib >>> rawdata = urllib. urlopen ('HTTP: // www.google.cn /'). read () >>> import chardet >>> chardet. detect (rawdata) {'confidence ': 0.98999999999999999, 'encoding': 'gb2312'} >>> chardet can be used directly to detect the encoding of the given characters. The Return Value of the function is a dictionary with two elements. One is the credibility of the detection, and the other is the detected encoding.
Chardet Installation
After downloading chardet, decompress the chardet package and put the chardet folder under the application directory. You can use import chardet to start using chardet.
You can also use the setup. py Installation File to copy chardet to the Python system directory so that all your python programs can use import chardet.
Python setup. py install reference
Chardet Official Website: http://chardet.feedparser.org/
Chardet download page: http://chardet.feedparser.org/download/