Python must learn the bytes bytes every day, for the bytes byte in Python learning to understand, interested in small partners can refer to
The bytecode in Python is expressed in the form of B ' xxx '. X can be expressed as a character, or it can be expressed in ASCII encoded form \XNN, which has a total of 256 characters from 00-FF (hex).
First, the basic operation
The following is a list of the basic operations of the byte, you can see that it is very similar to the string:
IN[40]: b = B "Abcd\x64" in[41]: bout[41]: B ' ABCDD ' in[42]: type (b) out[42]: bytesin[43]: Len (b) out[43]: 5in[44]: b[4]out[44 ]: 100 # 100 hexadecimal means \x64
If you want to modify a byte in a byte string, you cannot modify it directly, you need to convert it to ByteArray and then modify it:
IN[46]: Barr = ByteArray (b) in[47]: Type (Barr) out[47]: bytearrayin[48]: barr[0] = 110in[49]: barrout[49]: ByteArray ( B ' NBCDD ')
Two, byte and character relations
The above also mentions that bytes are very similar to characters, but they can be converted to each other. Bytes can be converted to corresponding characters in some form of encoding. Bytes can be converted to characters through the Encode () method, and characters can be converted to bytes by means of the decode () method:
IN[50]: s = "Life is short, I use Python" in[51 ": b = S.encode (' utf-8 ') in[52]: bout[52]: B ' \xe4\xba\xba\xe7\x94\x9f\xe8\x8b\xa6\xe7\ X9f\xad\xef\xbc\x8c\xe6\x88\x91\xe7\x94\xa8python ' in[53]: c = s.encode (' GB18030 ') in[54]: cout[54]: B ' \xc8\xcb\xc9\ Xfa\xbf\xe0\xb6\xcc\xa3\xac\xce\xd2\xd3\xc3python ' in[55]: B.decode (' Utf-8 ') out[55]: ' Life is short, I use Python ' in[56 ': C.decode (' GB18030 ') out[56]: ' Life is short, I use Python ' in[57]: C.decode (' Utf-8 ') Traceback (most recent call last): Exec (Code_obj, Self.user_global_ns, Self.user_ns) File "<ipython-input-57-8b50aa70bce9>", line 1, in <module> c.decode (' Utf-8 ') Unicodedecodeerror: ' Utf-8 ' codec can ' t decode byte 0xc8 in position 0:invalid continuation bytein[58]: B.decode (' GB18030 ') out[58]: ' Bang Hong 敓 à ︾ 煭 锛 屾 an ãºã ≒ython '
We can see the way the characters and bytes are parsed in different ways, and if the encoding and decoding are encoded in different ways, it will be garbled, and even the conversion fails. Because each encoding contains a different number of byte types, the \xc8 in the previous example exceeds the maximum character of Utf-8.
Third, the application
For the simplest example, I want to crawl the content of a Web page, and now crawl to use Baidu Search Python return page, Baidu uses UTF-8 encoding format, if not the return result decoding, it is a super long byte string. A normal HTML page can be displayed after the correct decoding.
Import Urllib.requesturl = "Http://www.baidu.com/s?ie=utf-8&wd=python" page = Urllib.request.urlopen (URL) mybytes = Page.read () encoding = "Utf-8" Print (Mybytes.decode (encoding)) Page.close ()