Learn the bytes of Python every day.

Source: Internet
Author: User

Learn the bytes of Python every day.

The byte code in Python is b‘xxx‘ represented in the form. X can be expressed as a character, or in ASCII-encoded form \xnn . NN is a total of 256 characters from 00-FF (hex).

Basic operations

The following is a list of the basic operations of the byte, can be seen that it and the string is very similar:

In[40]b = b"abcd\x64"In[41]bOut[41]b‘abcdd‘In[42]type(b)Out[42]bytesIn[43]len(b)Out[43]5In[44]b[4]Out[44]100    # 100用十六进制表示就是\x64

Suppose you want to change a byte in a byte string, you can't change it directly, you need to convert it to ByteArray and then change it:

In[46]barr = bytearray(b)In[47]type(barr)Out[47]bytearrayIn[48]barr[0] = 110In[49]barrOut[49]bytearray(b‘nbcdd‘)
byte-to-character relationship

The above also mentions that bytes are very similar to characters, in fact they can be converted to each other. Bytes can be converted into corresponding characters in some form of encoding.

Bytes can be converted to characters by passing in the encoding by means of the encode () method. The character can be converted to bytes by means of the decode () method:

Inch[50]: s = "Life is short, I use Python" in[51]: b = S.encode (' Utf-8 ') in[52]: BOut[52]: B '\xe4\xba\xba\xe7\x94\x9f\xe8\x8b\xa6\xe7\x9f\xad\xef\XBC\x8c\xe6\x88\x91\xe7\x94\xa8Python ' in[53]: c = s.encode (' GB18030 ') in[54]: COut[54]: B '\XC8\XCB\XC9\XFA\XBF\xe0\XB6\XCC\xa3\xac\xce\xd2\xd3\XC3Python ' in[55]: B.decode (' Utf-8 ') out[55]: ' Life is short. I use Python ' in[56]: C.decode (' GB18030 ') out[56]: ' Life is short, I use Python ' in[57]: C.decode (' Utf-8 ') Traceback (most recent call last): Exec (Code_obj, Self.user_global_ns, Self.user_ns) File "<ipyt Hon-input-57-8b50aa70bce9> ", line 1, in <module> c.decode (' Utf-8 ') unicodedecodeerror: ' Utf-8 ' codec can ' t deco De byte 0xc8 in position 0:invalid continuation Bytein[58]: B.decode (' GB18030 ') out[58]: ' Bang Hong 敓 à ︾ 煭 锛 屾 an ãºã ≒ython '

The way we can see the characters and bytes parsed out in different encodings is completely different, assuming that coding and decoding are encoded in different ways, it can be garbled or even failed to convert.

Because the number of byte types included in each encoding is different, as in the previous example, the \xc8 maximum character of Utf-8 is exceeded.

Application

Give the simplest example. I want to crawl the content of a webpage. Now to crawl with Baidu Search Python return page, Baidu is utf-8 encoding format. Assuming that the result decoding is not returned correctly, it is a super long byte string. A normal HTML page can be displayed after the correct decoding.

import"http://www.baidu.com/s?ie=utf-8&wd=python""utf-8"print(mybytes.decode(encoding))page.close()

Learn the bytes of Python every day.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.