Python base "day03": Character-to-encode operation

Last Update:2017-07-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Overview

When it comes to Python coding, a summary of the words, said more are tears ah, this in the future development of Python is definitely a headache. So it's necessary to speak clearly.

Second, the introduction of the code

1, Notice:

In Python 2, the default encoding is ASCII, whereas in Python 3 the default encoding is Unicode
Unicode is divided into utf-32 (4 bytes), utf-16 (two bytes), Utf-8 (1-4 bytes), so utf-16 is the most commonly used Unicode version, but it is still utf-8 in the file because UTF8 saves space
The Python 3,encode encodes the Stringl into the bytes type, and the decode decodes the bytes type into string type.
In Unicode encoding 1 chinese characters = 2 bytes, 1 english characters = 1 bytes, remember: ASCII is not a Chinese character
Utf-8 is a variable long character encoding, it is Unicode optimized, all the English characters are still stored in ASCII format, all the Chinese character is 3 bytes uniform
Unicode contains the character encodings for all countries, and the conversion between different character encodings requires a Unicode process
The default encoding for Python itself is utf-8

2, the process of coding and transcoding in Py2,

Note: Because Unicode is an intermediate encoding, any conversion before any character encoding must be decoded into Unicode, encoded into a character encoding that needs to be transferred

3, py2 character encoding conversion, the code is as follows:

12345678910111213141516171819202122232425 #! /usr/bin/env python# -*- coding:utf-8 -*-# __auther__ == zhangqigaos ="其高最帅"#utf-8解码成unicode编码s_to_unicode =s.decode("utf-8")print("--------s_to_unicode-----")print(s_to_unicode)#然后unicode再编码成gbks_to_gbk =s_to_unicode.encode("gbk")print("-----s_to_gbk------")print(s_to_gbk)#gbk解码成unicode再编码成utf-8gbk_to_utf8 =s_to_gbk.decode("gbk").encode("utf-8")print("------gbk_to_utf8-----")print(gbk_to_utf8)#输出--------s_to_unicode-----其高最帅-----s_to_gbk------??????------gbk_to_utf8-----其高最帅

Note: The above scenario is suitable for characters that are non-Unicode encoded, but what if the character encoding is already Unicode? Advertising back, more exciting .....

4, the character encoding is already Unicode case, the code is as follows:

12345678910111213141516171819 #! /usr/bin/env python# -*- coding:utf-8 -*-# __auther__ == zhangqigao#u代码字符编码是unicodes =u‘你好‘#已经是unicode，所以这边直接是编码成gbks_to_gbk =s.encode("gbk")print("----s_to_gbk----")print(s_to_gbk)#这边再解码成unicode然后再编码成utf-8gbk_to_utf8 =s_to_gbk.decode("gbk").encode("utf-8")print("-----gbk_to_utf8---")print(gbk_to_utf8)#输出----s_to_gbk----???-----gbk_to_utf8---你好

Note: In Python2, specify the character encoding at the beginning of the file, is to tell the interpreter that I am now using the character encoding is utf-8, that I am in the printing of Chinese, then in the Utf-8, the text is included in the characters, then you can print it out. So if you do not set the character encoding, by default the system encoding, if your system encoding is ASCII, then will be an error, because ASCII can not save Chinese characters.

5, py3 character encoding conversion

In the notice has been mentioned in Python 3 encoding, the default is Unicode, so the conversion between character encoding does not need to decode process, direct encode can, the code is as follows:

12345678910111213141516171819202122232425 #! /usr/bin/env python# __auther__ == zhangqigao#无需声明字符编码，当然你声明也不会报错s =‘你好‘# 字符串s已经是unicode编码，无需decode,直接encodes_to_gbk =s.encode("gbk")print("----s_to_gbk----")print(s_to_gbk)#这边还是一样，gbk需要先解码成unicode，再编码成utf-8gbk_to_utf8 =s_to_gbk.decode("gbk").encode("utf-8")print("-----gbk_to_utf8---")print(gbk_to_utf8)#解码成unicode字符编码utf8_decode =gbk_to_utf8.decode("utf-8")print("-------utf8_decode----")print(utf8_decode)#输出----s_to_gbk----b‘\xc4\xe3\xba\xc3‘-----gbk_to_utf8---b‘\xe4\xbd\xa0\xe5\xa5\xbd‘-------utf8_decode----你好

Note: The python 3,encode encodes the Stringl into a bytes type, and decode decodes the bytes type into a string type, so it is not difficult to see the encode turn it into a bytes type of data. It is also important to note that, regardless of whether the character encoding is declared at the beginning of a Python 3 file, it can only be said that this Python file is the character encoding, the string in the file, or Unicode, such as:

Summarize:

Uniocode can recognize all character-encoded strings
In Python 2, conversions between character encodings need to be converted by Unicode, so you can print using Unicode, or you can use the corresponding character encoding (specified at the beginning of the file) to print characters or strings because there is no significant distinction between characters and bytes in Py2. That's why the result is so mixed up.
In Python 3, only Unicode to identify the character, if converted into a corresponding encoding format, directly into the corresponding encoding of the bytes type of bytecode, that is, binary, needs to be recognized, must be decoded to Unicode to identify
Py3 If the file is already specified at the beginning of the file encoding, why the file is used or Uniocde na? Because the corresponding encoding in the PY3 is binary, is the bytes type, is not recognized, can be recognized only Unicode. Because of the obvious distinction between characters and bytes in Py3, 3 and 4 are presented.
Speaking of which, if still do not understand, I quote someone else's article, elaborated, Python 2 and Python 3 on the character and byte distinction: punch here

Python base "day03": Character-to-encode operation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python base "day03": Character-to-encode operation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python base "day03": Character-to-encode operation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support