Learn the bytes of Python every day.

Source: Internet
Author: User

Learn the bytes of Python every day.

The byte code in Python is b‘xxx‘ represented in the form. X can be expressed in characters, or in ASCII encoding, with an \xnn NN of 256 characters from 00-FF (hex).

Basic operations

The following is a list of the basic operations of the byte, you can see that it is very similar to the string:

In[40]b = b"abcd\x64"In[41]bOut[41]b‘abcdd‘In[42]type(b)Out[42]bytesIn[43]len(b)Out[43]5In[44]b[4]Out[44]100    # 100用十六进制表示就是\x64

If you want to modify a byte in a byte string, you cannot modify it directly, you need to convert it to ByteArray and then modify it:

In[46]barr = bytearray(b)In[47]type(barr)Out[47]bytearrayIn[48]barr[0] = 110In[49]barrOut[49]bytearray(b‘nbcdd‘)
byte-to-character relationship

The above also mentions that bytes are very similar to characters, but they can be converted to each other. Bytes can be converted to corresponding characters in some form of encoding. Bytes can be converted to characters through the Encode () method, and characters can be converted to bytes by means of the decode () method:

Inch[50]: s = "Life is short, I use Python" in[51]: b = S.encode (' Utf-8 ') in[52]: BOut[52]: B '\xe4\xba\xba\xe7\x94\x9f\xe8\x8b\xa6\xe7\x9f\xad\xef\XBC\x8c\xe6\x88\x91\xe7\x94\xa8Python ' in[53]: c = s.encode (' GB18030 ') in[54]: COut[54]: B '\XC8\XCB\XC9\XFA\XBF\xe0\XB6\XCC\xa3\xac\xce\xd2\xd3\XC3Python ' in[55]: B.decode (' Utf-8 ') out[55]: ' Life is short, I use Python ' in[56]: C.decode (' GB18030 ') out[56]: ' Life is short, I use Python ' in[57]: C.decode (' Utf-8 ') Traceback (most recent call last): Exec (Code_obj, Self.user_global_ns, Self.user_ns) File "<ipyt Hon-input-57-8b50aa70bce9> ", line 1, in <module> c.decode (' Utf-8 ') unicodedecodeerror: ' Utf-8 ' codec can ' t deco De byte 0xc8 in position 0:invalid continuation Bytein[58]: B.decode (' GB18030 ') out[58]: ' Bang Hong 敓 à ︾ 煭 锛 屾 an ãºã ≒ython '

We can see the way the characters and bytes are parsed in different ways, and if the encoding and decoding are encoded in different ways, it will be garbled, and even the conversion fails. Because each encoding contains a different number of byte types, the \xc8 maximum characters of utf-8 are exceeded in the previous example.

Application

For the simplest example, I'm going to crawl through the content of a Web page, and now crawl back to the page when searching for Python with Baidu, and from the URL you can see the encoding utf-8, which is a super long byte string if the return result is not decoded. A normal HTML page can be displayed after the correct decoding.

import"http://www.baidu.com/s?ie=utf-8&wd=python""utf-8"print(mybytes.decode(encoding))page.close()

Learn the bytes of Python every day.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.