Python must learn bytes bytes per day

Source: Internet
Author: User
The bytecode in Python is expressed in the form of B ' xxx '. X can be expressed as a character, or it can be expressed in ASCII encoded form \XNN, which has a total of 256 characters from 00-FF (hex).

First, the basic operation

The following is a list of the basic operations of the byte, you can see that it is very similar to the string:

IN[40]: b = B "Abcd\x64" in[41]: bout[41]: B ' ABCDD ' in[42]: type (b) out[42]: bytesin[43]: Len (b) out[43]: 5in[44]: b[4]out[44 ]: 100 # 100 hexadecimal means \x64

If you want to modify a byte in a byte string, you cannot modify it directly, you need to convert it to ByteArray and then modify it:

IN[46]: Barr = ByteArray (b) in[47]: Type (Barr) out[47]: bytearrayin[48]: barr[0] = 110in[49]: barrout[49]: ByteArray ( B ' NBCDD ')

Two, byte and character relations

The above also mentions that bytes are very similar to characters, but they can be converted to each other. Bytes can be converted to corresponding characters in some form of encoding. Bytes can be converted to characters through the Encode () method, and characters can be converted to bytes by means of the decode () method:

IN[50]: s = "Life is short, I use Python" in[51 ": b = S.encode (' utf-8 ') in[52]: bout[52]: B ' \xe4\xba\xba\xe7\x94\x9f\xe8\x8b\xa6\xe7\ X9f\xad\xef\xbc\x8c\xe6\x88\x91\xe7\x94\xa8python ' in[53]: c = s.encode (' GB18030 ') in[54]: cout[54]: B ' \xc8\xcb\xc9\ Xfa\xbf\xe0\xb6\xcc\xa3\xac\xce\xd2\xd3\xc3python ' in[55]: B.decode (' Utf-8 ') out[55]: ' Life is short, I use Python ' in[56 ': C.decode (' GB18030 ') out[56]: ' Life is short, I use Python ' in[57]: C.decode (' Utf-8 ') Traceback (most recent call last): Exec (Code_obj, Self.user_global_ns, Self.user_ns) File "
 
  
   
  ", line 1, in 
  
   
    
    c.decode (' Utf-8 ') Unicodedecodeerror: ' Utf-8 ' codec can ' t decode byte 0xc8 in position 0:invalid continuation bytein[58]: B.decode (' GB18030 ') out[58]: ' Bang Hong 敓 à ︾ 煭 锛 屾 an ãºã ≒ython '
  
   
 
  

We can see the way the characters and bytes are parsed in different ways, and if the encoding and decoding are encoded in different ways, it will be garbled, and even the conversion fails. Because each encoding contains a different number of byte types, the \xc8 in the previous example exceeds the maximum character of Utf-8.

Third, the application

For the simplest example, I want to crawl the content of a Web page, and now crawl to use Baidu Search Python return page, Baidu uses UTF-8 encoding format, if not the return result decoding, it is a super long byte string. A normal HTML page can be displayed after the correct decoding.

Import Urllib.requesturl = "Http://www.baidu.com/s?ie=utf-8&wd=python" page = Urllib.request.urlopen (URL) mybytes = Page.read () encoding = "Utf-8" Print (Mybytes.decode (encoding)) Page.close ()

The above is the whole content of this article, I hope that you learn Python programming help.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.