Str and bytes in the Python3

Last Update:2018-05-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Unlike python2.x, python3.x strictly distinguishes between two types of str and bytes. The text is Unicode, represented by the STR type, and the binary data is represented by bytes.

python3.x does not mix str and bytes in any implicit manner. Thus the user cannot stitch strings and byte packets, search for strings in a byte packet (or vice versa), or pass a string into a function with a byte packet (or vice versa).

For example, a socket.send() function in python3.x, if passed in an encoded string, will be an error:

>>> client.send("test str")Traceback (most recent call last):  "<stdin>"1in<module>TypeErrorbytes-objectisnot‘str‘    >>> client.send(b"test str")  #将参数转换成 bytes 类型8                           #返回发送的数据长度

History of coding development

In the early days of computer history, the English-speaking countries represented by the United States dominated the entire computer industry, and 26 English letters formed a variety of English words, sentences, articles. Therefore, the earliest character encoding specification is the ASCII code, a 8-bit (that is, 1 bytes) encoding specification , it can cover the entire English department coding needs.

What is encoding? Encoding is the representation of a character in a binary. We all know that everything, whether in English, Chinese or symbols, is ultimately stored on disk as 01010101 of these things. Inside a computer, reading and storing data boils down to a bit stream of 0 and 1. The question is, how can humans not understand these bitstream, and how to make these 010101 human beings readable? So there is the character code, it is a translation machine, somewhere inside the computer, secretly help us to translate bitstream into a human can directly understand the text. For the average user, there is no need to know what the process is and how it is executed. But for programmers it is a problem that must be made clear.

As an ASCII example of encoding, it specifies that 1 bytes and 8 bits represent 1-character encodings, one by one byte. For example: 01000001 is a capital letter A, and sometimes we use the 65 decimal to denote the ASCII encoding of a in. 8 bits, which can represent up to 2 of the 8 characters without repetition, but the standard ASCII code is only 7 bits, the code value range is 0-127, and the highest bit is 0.

Later, the computer was popularized, Chinese, Japanese, Korean and so on the country's text needs to say in the computer, like ASCII this single-byte encoding has been far from enough, so the standard organization developed a UNICODE (Universal code), it stipulates Any one character, regardless of country, is represented by at least two bytes . Where the English alphabet is 2 bytes, and the kanji is 3 bytes. This code is good enough to meet everyone's requirements, but it's not compatible ASCII , and it takes up more space and memory. And in the computer world more characters are English letters, clearly can be 1 bytes to be able to express, not to use 2, resulting in a waste of space resources.

So UTF-8 the code came into being, it stipulated that the English alphabet series with 1 bytes, Chinese characters with 3 byte representation and so on. Therefore, it is compatible ASCII and can decode earlier documents. UTF-8soon it was widely used.

In the development of coding, China has also created its own coding methods, for example GBK , GB2312 BIG5 . They are confined to domestic use and are not recognized abroad. In GBK encoding, Chinese characters account for 2 bytes.

Conversion between bytes and Str

Test0= ' abc 'Test1=B' abc 'Print(type(test0), test0) - <class ' str '> ' abc 'Print(type(test1), test1) - <class ' bytes '>B' abc 'Test2= bytes(Test0,' Utf-8 ') test3= Str(Test1,' Utf-8 ') test4= Str(test1)Print(type(test2), test2) - <class ' bytes '>B' abc 'Print(type(TEST3), test3) - <class ' str '> ' abc 'Print(type(test4), test4) - <class ' str '> "B ' abc '"Test5=Test0.encode ()#参数可输入编码格式, default Utf-8Test6=Test1.decode ()Print(type(TEST5), TEST5) - <class ' bytes '>B' abc 'Print(type(TEST6), TEST6) - <class ' str '> ' abc '

Str and bytes in the Python3

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Str and bytes in the Python3

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Str and bytes in the Python3

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support