The Bytes/str dichotomy in Python 3

Source: Internet
Author: User

Arguably the most significant new feature of Python 3 is a much cleaner separation between text and binary data. Text is all Unicode and is represented by the str type, and binary data are represented by thebytes ty Pe. What makes the separation particularly clean was that str and bytes can ' t being mixed in Python 3 in any imp Licit. You can ' t concatenate them, look for one inside another, and generally pass one-to-a function that expects the other. This is a good thing.

However, boundaries between strings and bytes are inevitable, and this is where the following diagram is always important To keep in mind:

Strings can encoded to bytes, and bytes can is decoded back to Strings.

>>> ' €20 ' encode (' utf-8 ') b ' \xe2\x82\xac20 ' >>> B ' \xe2\x82\xac20 '. Decode (' utf-8 ') ' €20 '

Think of it This way:a string was an abstract representation of text. A string consists of characters, which is also abstract entities not tied to any particular binary representation. When manipulating strings, we ' re living in blissful ignorance. We can split and slice them, concatenate and search inside them. We don ' t care how they is represented internally and how many bytes it takes-hold each character in them. We only start caring about the when encoding strings into bytes (for example, in order to send them over a communication Channel), or decoding strings from bytes (for the other direction).

The argument given to encode and decode are the encoding (or codec). The encoding is a-to-represent abstract characters in binary data. There is many possible encodings. UTF-8, shown above, is one. Here ' s another:

>>> ' €20 ' encode (' iso-8859-15 ') b ' \xa420 ' >>> B ' \xa420 '. Decode (' iso-8859-15 ') ' €20 '

The encoding is a crucial part of this translation process. Without the encoding, the bytes objectB ' \xa420 ' is just a bunch of bits. The encoding gives it meaning. Using a different encoding, this bunch of bits can has a different meaning:

>>> b ' \xa420 '. Decode (' windows-1255 ') '? 20 '

That's 80% of the lost due to using the wrong encoding, so is careful;-)

Comments

Comments powered by Disqus

The Bytes/str dichotomy in Python 3

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.