Python character encoding Judgment Method Analysis, python character encoding judgment

Source: Internet
Author: User

Python character encoding Judgment Method Analysis, python character encoding judgment

This document describes how to determine the encoding of Python characters. We will share this with you for your reference. The details are as follows:

Method 1:

Isinstance (s, str) is used to determine whether it is a general string
Isinstance (s, unicode) is used to determine whether it is unicode

Or

if type(str).__name__!="unicode":str=unicode(str,"utf-8")else:pass

Method 2:

Python chardet character encoding judgment

Chardet can be used to conveniently detect the encoding of strings/files. In particular, for Chinese Web pages, some use GBK/GB2312 and some use UTF8. If you need to crawl some pages, it is very important to know the webpage encoding. Although the HTML page has the charset tag, but sometimes it is incorrect. Then chardet will help us a lot.

Chardet instance

>>> Import urllib >>> rawdata = urllib. urlopen ('HTTP: // www.google.cn /'). read () >>> import chardet >>> chardet. detect (rawdata) {'confidence ': 0.98999999999999999, 'encoding': 'gb2312'} >>> chardet can be used directly to detect the encoding of the given characters. The Return Value of the function is a dictionary with two elements. One is the credibility of the detection, and the other is the detected encoding.

Chardet Installation

After downloading chardet, decompress the chardet package and put the chardet folder under the application directory. You can use import chardet to start using chardet.

You can also use the setup. py Installation File to copy chardet to the Python system directory so that all your python programs can use import chardet.

Python setup. py install reference

Chardet Official Website: http://chardet.feedparser.org/
Chardet download page: http://chardet.feedparser.org/download/

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.