Str and bytes in the Python3

Source: Internet
Author: User

Str and bytes in the Python3

Unlike python2.x, python3.x strictly distinguishes between two types of str and bytes. The text is Unicode, represented by the STR type, and the binary data is represented by bytes.

python3.x does not mix str and bytes in any implicit manner. Thus the user cannot stitch strings and byte packets, search for strings in a byte packet (or vice versa), or pass a string into a function with a byte packet (or vice versa).

For example, a socket.send() function in python3.x, if passed in an encoded string, will be an error:

>>> client.send("test str")Traceback (most recent call last):  "<stdin>"1in<module>TypeErrorbytes-objectisnot‘str‘    >>> client.send(b"test str")  #将参数转换成 bytes 类型8                           #返回发送的数据长度

History of coding development

In the early days of computer history, the English-speaking countries represented by the United States dominated the entire computer industry, and 26 English letters formed a variety of English words, sentences, articles. Therefore, the earliest character encoding specification is the ASCII code, a 8-bit (that is, 1 bytes) encoding specification , it can cover the entire English department coding needs.

What is encoding? Encoding is the representation of a character in a binary. We all know that everything, whether in English, Chinese or symbols, is ultimately stored on disk as 01010101 of these things. Inside a computer, reading and storing data boils down to a bit stream of 0 and 1. The question is, how can humans not understand these bitstream, and how to make these 010101 human beings readable? So there is the character code, it is a translation machine, somewhere inside the computer, secretly help us to translate bitstream into a human can directly understand the text. For the average user, there is no need to know what the process is and how it is executed. But for programmers it is a problem that must be made clear.

As an ASCII example of encoding, it specifies that 1 bytes and 8 bits represent 1-character encodings, one by one byte. For example: 01000001 is a capital letter A, and sometimes we use the 65 decimal to denote the ASCII encoding of a in. 8 bits, which can represent up to 2 of the 8 characters without repetition, but the standard ASCII code is only 7 bits, the code value range is 0-127, and the highest bit is 0.

Later, the computer was popularized, Chinese, Japanese, Korean and so on the country's text needs to say in the computer, like ASCII this single-byte encoding has been far from enough, so the standard organization developed a UNICODE (Universal code), it stipulates Any one character, regardless of country, is represented by at least two bytes . Where the English alphabet is 2 bytes, and the kanji is 3 bytes. This code is good enough to meet everyone's requirements, but it's not compatible ASCII , and it takes up more space and memory. And in the computer world more characters are English letters, clearly can be 1 bytes to be able to express, not to use 2, resulting in a waste of space resources.

So UTF-8 the code came into being, it stipulated that the English alphabet series with 1 bytes, Chinese characters with 3 byte representation and so on. Therefore, it is compatible ASCII and can decode earlier documents. UTF-8soon it was widely used.

In the development of coding, China has also created its own coding methods, for example GBK , GB2312 BIG5 . They are confined to domestic use and are not recognized abroad. In GBK encoding, Chinese characters account for 2 bytes.

Conversion between bytes and Str

Test0= ' abc 'Test1=B' abc 'Print(type(test0), test0) - <class ' str '> ' abc 'Print(type(test1), test1) - <class ' bytes '>B' abc 'Test2= bytes(Test0,' Utf-8 ') test3= Str(Test1,' Utf-8 ') test4= Str(test1)Print(type(test2), test2) - <class ' bytes '>B' abc 'Print(type(TEST3), test3) - <class ' str '> ' abc 'Print(type(test4), test4) - <class ' str '> "B ' abc '"Test5=Test0.encode ()#参数可输入编码格式, default Utf-8Test6=Test1.decode ()Print(type(TEST5), TEST5) - <class ' bytes '>B' abc 'Print(type(TEST6), TEST6) - <class ' str '> ' abc '

Str and bytes in the Python3

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.