Differences between Python2 and Python3 about string encoding processing

Source: Internet
Author: User
encoding of 0x00 characters

Computers, after all, are the inventions of Western countries, the beginning did not expect to spread to the world, using only one byte of 7 bits (ASCII) to indicate that the characters for the current large number of words is obviously not enough, so have experienced several sets of coding programs, different countries and regions have their own plans, has caused a lot of problems left over from the present history. To explain the coding principle please read this article: Python encoded in previous life 0x01 python string

Python has two different strings, one for storing text and one for storing bytes. For text, Python is used internally in Unicode storage, while the byte string displays the original byte sequence or ASCII.

What is called coding (encode).

by literal meaning and past experience, I'm going to encode this text or string as "UTF-8", and it feels like it should encode the byte data and display the correct text. Most people think so, but the truth is.

Encoding means that Unicode characters are encoded in a byte sequence according to an encoding rule, such as UTF-8:

Someone will ask at this point, i print out how the printed statement is garbled or Chinese, not a byte sequence. This is because when you call the print statement, the default is implicit decoding so that humans can see the friendly character data , and that is the default for the str () wrapper , to see the true hexadecimal number behind, You need to invoke Magic method _repr_ () .

What do you mean decoding (decode)?

The decoding, in turn, is to interpret the sequence of bytes as Unicode, according to the encoding rules (such as UTF-8).

There may be a question here, encoding and decoding are hexadecimal, and the Chinese characters are displayed.
It's going to be a combination of your environment. After reading the article I recommended above, you will understand that Unicode is only a standard, and the specific encoding is the implementation. With the correct Unicode encoding, only on behalf of you have the correct English literature, want to translate into Chinese, you have to switch again. And this one transition is your environment to help you complete. For example, you open a document, found that is garbled, most of the text editor is the way to decode the problem, change the decoding rule is good. the difference between 0x02 Python2 and Python3

Python3, everything's fine .

In Python3, the text string type (using the Unicode data store) is named Str, and the byte string type is named bytes. In general, instantiating a string results in a str object:

So now many people say that Python3 by default is Unicode, which means that.
If you want bytes, precede the text with a prefix of B, or encode.

So, obviously, the Str object has a encode method, and the bytes object has a decode method.

Python2 pretty fucked up and even misleading you .

The Str object in Python3 is called Unicode in Python2, and it feels very popular, right. but the bytes object is called Str in the Python2 . Is your usual STR, the default one ...

If you want a text string, you need to prefix the string with u or decode.

Funny more than that point, Python2 in the STR (Byte) object, unexpectedly has a encode method ... And you do not expect it to have any special use, it is used to the error, never use it ...
Similarly, Unicode (text character) objects also have a decode method for error.
Let's try this:

I don't know if you noticed the error message, we're decoding it, the rule is gbk, but it says it can't be encoded in ASCII , which is why.

That's Python2 . Implicit encoding for decoding a Unicode object equals the following code:

B.encode (' ASCII '). Decode (' GBK ')

That's why a lot of people say that Python2 's code sucks. Summary of 0x03

If you're using the 2. X, please form the custom of the string plus u prefix, unified coding UTF-8, if the Windows console or Pycharm console is still garbled, it is mostly the console code is different, the better.

Reference book "Python Advanced Programming"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.