Learn the bytes of Python every day.
The byte code in Python is b‘xxx‘
represented in the form. X can be expressed as a character, or in ASCII-encoded form \xnn
. NN is a total of 256 characters from 00-FF (hex).
Basic operations
The following is a list of the basic operations of the byte, can be seen that it and the string is very similar:
In[40]b = b"abcd\x64"In[41]bOut[41]b‘abcdd‘In[42]type(b)Out[42]bytesIn[43]len(b)Out[43]5In[44]b[4]Out[44]100 # 100用十六进制表示就是\x64
Suppose you want to change a byte in a byte string, you can't change it directly, you need to convert it to ByteArray and then change it:
In[46]barr = bytearray(b)In[47]type(barr)Out[47]bytearrayIn[48]barr[0] = 110In[49]barrOut[49]bytearray(b‘nbcdd‘)
byte-to-character relationship
The above also mentions that bytes are very similar to characters, in fact they can be converted to each other. Bytes can be converted into corresponding characters in some form of encoding.
Bytes can be converted to characters by passing in the encoding by means of the encode () method. The character can be converted to bytes by means of the decode () method:
Inch[50]: s = "Life is short, I use Python" in[51]: b = S.encode (' Utf-8 ') in[52]: BOut[52]: B '\xe4\xba\xba\xe7\x94\x9f\xe8\x8b\xa6\xe7\x9f\xad\xef\XBC\x8c\xe6\x88\x91\xe7\x94\xa8Python ' in[53]: c = s.encode (' GB18030 ') in[54]: COut[54]: B '\XC8\XCB\XC9\XFA\XBF\xe0\XB6\XCC\xa3\xac\xce\xd2\xd3\XC3Python ' in[55]: B.decode (' Utf-8 ') out[55]: ' Life is short. I use Python ' in[56]: C.decode (' GB18030 ') out[56]: ' Life is short, I use Python ' in[57]: C.decode (' Utf-8 ') Traceback (most recent call last): Exec (Code_obj, Self.user_global_ns, Self.user_ns) File "<ipyt Hon-input-57-8b50aa70bce9> ", line 1, in <module> c.decode (' Utf-8 ') unicodedecodeerror: ' Utf-8 ' codec can ' t deco De byte 0xc8 in position 0:invalid continuation Bytein[58]: B.decode (' GB18030 ') out[58]: ' Bang Hong 敓 à ︾ 煭 锛 屾 an ãºã ≒ython '
The way we can see the characters and bytes parsed out in different encodings is completely different, assuming that coding and decoding are encoded in different ways, it can be garbled or even failed to convert.
Because the number of byte types included in each encoding is different, as in the previous example, the \xc8
maximum character of Utf-8 is exceeded.
Application
Give the simplest example. I want to crawl the content of a webpage. Now to crawl with Baidu Search Python return page, Baidu is utf-8 encoding format. Assuming that the result decoding is not returned correctly, it is a super long byte string. A normal HTML page can be displayed after the correct decoding.
import"http://www.baidu.com/s?ie=utf-8&wd=python""utf-8"print(mybytes.decode(encoding))page.close()
Learn the bytes of Python every day.