Python basics 1: python Basics

Last Update:2017-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First understand the history, but this article is redundant, such as the old lady wrapped in cloth ---------- smelly and long

Encoding history:

1. The computer can only process numbers. text files can only be converted to numbers.
Can be processed. 8 bit = 1 byte, so the maximum number of energy-saving characters is 255.

2. Americans invented computers. In English, all one byte represents all characters.
The ASCII (one byte) encoding is the standard American code.

3. When Chinese people use computers, they need to represent Chinese characters, so they invented
GB2312 encoding format, that is, two bytes are used to represent a Chinese character. Similarly, other language countries
The corresponding encoding is also created. There is no common standard, so when different languages use
The corresponding encoding will produce garbled characters.

4. For unified standards, Unicode encoding has emerged, and all languages are unified into one set of codes.
Unicode and ASCII encoding comparison
1) Letter A: ASCII decimal 65, binary is 0100 0001
ASCII in Chinese characters cannot be encoded as Unicode 20013 binary: 01001110 00101101
2) for computer recognition, the uniform length, so the front position of a is 0, that is, 00000000 0100
Unified standards
5. The standard is unified, and the garbled problem is solved, but the Unicode encoding length is long, but the computer is mainly English,
If all the content is in English, Unicode encoding doubles the storage space and transmission space.
How can this problem be solved?

6. If Unicode encoding can change, so the UTF-8 appears
In UTF-8, a letter is a byte, and a Chinese character is 3 bytes, especially the uncommon 4-6
Therefore, space and storage are saved.

7. The problem arises: The computer only recognizes Unicode encoding.
How to convert between UTF-8

When it needs to be recognized by the computer, it will be loaded into the memory. The encoding used at this time must be Unicode encoding.
UTF-8 encoding is used when data needs to be transmitted over the network or stored in files, to save space costs
So there is mutual conversion.

Python2 and python3 encoding on Windows/Linux to python2: On Windows:

1. First, let's take a look at the encoding of the window itself.
Import sys
Sys. getdefaultencoding ()
# Out: "UTF-8"
2. All strings are in English
S1 = "abc" --> type (s1): str
S2 = u "abc" --> type (s2): Unicode
U "" indicates that the subsequent strings are stored in unicode format.
S1.encode ("utf8") successful
S2.encode ("utf8") successful

3. When Chinese characters appear:
S1 = "hello" --> GB2312 encoding. Windows
S2 = u "hello"
S1.encode ("utf8") Error
S2.encode ("utf8") successful

Error cause:
The memory is encoded in Unicode,
When s1 is passed, it is not Unicode encoding (because storage is wasted), and
Encode is to convert a Unicode object to the encoding format in the parameter for encoding.
So s2 will not report an error.

Solution:
First, convert the gb2312 encoding to a unicode-encoded object.
Then convert to UTF-8
S1.decode ("gb2312"). ecode ("utf8") is successfully set to "gb2312" in Windows"
The decode ("xx") method is to convert an object encoded as "xx"
Unicode object

In Linux:

1. First, let's take a look at the linux encoding.
Import sys
Sys. getdefaultencoding ()
# Out: "ascii"
2. All strings are in English
S1 = "abc" --> type (s1): str
S2 = u "abc" --> type (s2): Unicode
U "" indicates that the subsequent strings are stored in unicode format.
S1.encode ("utf8") successful
S2.encode ("utf8") successful
3. When Chinese characters appear:
S1 = "hello" --> UTF-8 encoding. Why is it not ascii in Linux? Can ascii be used to represent Chinese characters?
It must have been converted to UTF-8.
S2 = u "hello"
S1.encode ("utf8") Error
S2.encode ("utf8") successful
Solution:
First, convert the UTF-8 encoding into a unicode-encoded object.
Then convert to UTF-8
S1.decode ("utf8"). ecode ("utf8") is successful. In Linux, the Chinese character is "UTF-8"
Equivalent to s1, it is converted back to UTF-8 encoding.

Python 3:

In python3, all str types are encoded in Unicode format, and the encode can be "UTF-8" directly"

In Windows:
1. All strings are in English
S1 = "abc" --> type (s1): str
S2 = u "abc" --> type (s2): Unicode
U "" indicates that the subsequent strings are stored in unicode format.
S1.encode ("utf8") successful
S2.encode ("utf8") successful
2. When Chinese characters are displayed:
S1 = "hello" --> Unicode encoding. Windows
S2 = u "hello" ---> there is no need to write this, without adding u "", 3 also thinks this is Unicode
S1.encode ("utf8") successful
S2.encode ("utf8") successful
In Linux: same as in Windows

Conclusion: talk #-*-coding: UTF-8 -*-

The biggest difference between python2 and 3:
2. When a file contains Chinese characters, it must be added at the beginning, and the Chinese character string must contain u ""
Purpose:
Tell python that the file is encoded in UTF-8 format and python will interpret it according to the encoding,
Unicode conversion is performed internally.
Why do not write in 3:
3. All Python files are interpreted in Unicode.
3

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python basics 1: python Basics

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support