Learn the bytes of Python every day.
The byte code in Python is b‘xxx‘
represented in the form. X can be expressed in characters, or in ASCII encoding, with an \xnn
NN of 256 characters from 00-FF (hex).
Basic operations
The following is a list of the basic operations of the byte, you can see that it is very similar to the string:
In[40]b = b"abcd\x64"In[41]bOut[41]b‘abcdd‘In[42]type(b)Out[42]bytesIn[43]len(b)Out[43]5In[44]b[4]Out[44]100 # 100用十六进制表示就是\x64
If you want to modify a byte in a byte string, you cannot modify it directly, you need to convert it to ByteArray and then modify it:
In[46]barr = bytearray(b)In[47]type(barr)Out[47]bytearrayIn[48]barr[0] = 110In[49]barrOut[49]bytearray(b‘nbcdd‘)
byte-to-character relationship
The above also mentions that bytes are very similar to characters, but they can be converted to each other. Bytes can be converted to corresponding characters in some form of encoding. Bytes can be converted to characters through the Encode () method, and characters can be converted to bytes by means of the decode () method:
Inch[50]: s = "Life is short, I use Python" in[51]: b = S.encode (' Utf-8 ') in[52]: BOut[52]: B '\xe4\xba\xba\xe7\x94\x9f\xe8\x8b\xa6\xe7\x9f\xad\xef\XBC\x8c\xe6\x88\x91\xe7\x94\xa8Python ' in[53]: c = s.encode (' GB18030 ') in[54]: COut[54]: B '\XC8\XCB\XC9\XFA\XBF\xe0\XB6\XCC\xa3\xac\xce\xd2\xd3\XC3Python ' in[55]: B.decode (' Utf-8 ') out[55]: ' Life is short, I use Python ' in[56]: C.decode (' GB18030 ') out[56]: ' Life is short, I use Python ' in[57]: C.decode (' Utf-8 ') Traceback (most recent call last): Exec (Code_obj, Self.user_global_ns, Self.user_ns) File "<ipyt Hon-input-57-8b50aa70bce9> ", line 1, in <module> c.decode (' Utf-8 ') unicodedecodeerror: ' Utf-8 ' codec can ' t deco De byte 0xc8 in position 0:invalid continuation Bytein[58]: B.decode (' GB18030 ') out[58]: ' Bang Hong 敓 à ︾ 煭 锛 屾 an ãºã ≒ython '
We can see the way the characters and bytes are parsed in different ways, and if the encoding and decoding are encoded in different ways, it will be garbled, and even the conversion fails. Because each encoding contains a different number of byte types, the \xc8
maximum characters of utf-8 are exceeded in the previous example.
Application
For the simplest example, I'm going to crawl through the content of a Web page, and now crawl back to the page when searching for Python with Baidu, and from the URL you can see the encoding utf-8, which is a super long byte string if the return result is not decoded. A normal HTML page can be displayed after the correct decoding.
import"http://www.baidu.com/s?ie=utf-8&wd=python""utf-8"print(mybytes.decode(encoding))page.close()
Learn the bytes of Python every day.