Go to str of Python, encode and decode of Unicode object

Source: Internet
Author: User

Python STR, The encode and decode methods of Unicode objects
The STR object in python is actually an "8-Bit String", a byte string, essentially similar to byte [] in Java.
The Unicode object in Python should be equivalent to the string object in Java, or essentially the char [] of Java.
For

Python code
1. S = "hello"
2. U = u "hello"

1. S. Decode and U. encode are the most common methods,
Simply put, the character strings in Python are represented in Unicode (in fact, the representation in python is a little different from that in real Unicode. It is almost transparent to us, so don't consider it ), use the STR object when interacting with people.
S. Decode --------> decodes s to Unicode. The parameter specifies the original encoding method of S. This is the same as Unicode (S, encodename.
U. encode --------> encodes Unicode into a STR object. The parameter specifies the encoding method used.
Note: decode to Unicode from Parameter
Encode to parameter from Unicode
Only the decode method and Unicode constructor can obtain Unicode objects.
The most common purpose of the preceding steps is to use the cp936 encoding in the Python source file,
# Coding = cp936 or #-*-coding: cp936-*-or # Coding: cp936 (if not written, the default value is ASCII)
In this way, the STR object in the source file is cp936 encoded, we want to pass this string to a place that needs to be saved as other encoding (such as XML UTF-8, Excel requires the UTF-16)
This is usually written as follows:
Strobj. Decode ("cp936"). encode ("UTF-16 ")

You typically encode a unicode string whenever you need to use it for Io, for instance transfer it over the network, or save it to a disk file.
To convert a string of bytes to a unicode string is known as decoding. use Unicode ('...', encoding) or '...'. Decode (encoding ).
You typically decode a string of bytes whenever you receive string data from the network or from a disk file.
2.
The first article has already been written a lot, because it is the most commonly used and basically does not need to be explained. I want to focus on the second article.
It seems that the encode method of the Unicode object and the decode method of STR are enough. The strange thing is that Unicode also has decode, while STR also has
Encode: What are these two tasks.
Use 1
STR itself has already been encoded. If encode is difficult to think of, it will usually be wrong)
First explain this
Str. encode (e) is the same as Unicode (STR). encode (e ).
This is useful since code that expects Unicode strings shoshould also work when it is passed
ASCII-encoded 8-bit strings (from Guido van rosum)
This section of the father of Python probably means that the encode method was originally called by Unicode, but if it is accidentally called as the method of the STR object, and this STR object is exactly
It is ASCII encoded (the ASCII section is the same as Unicode) and should be successful. This is a usage of the str. encode method (I think this is basically useless)
Similarly, it is the same to decode Unicode, which is composed of ASCII, because it seems that the ASCII in almost any encoding remains unchanged. Therefore, this operation is not performed.
U "ABC". Decode ("gb2312") and U "ABC" are equal.

Use 2
Non-character set non-character-encoding-codecs, which is defined only in Python. It makes no sense to leave Python (this is from the official Python Documentation)
And it is not a human language.
For example

Python code
1. '\ n'. encode ('hex') = '0a'
2. U' \ n'. encode ('hex') = '0a'
3. '0a'. Decode ('hex') = '\ N'
4. u'0a'. Decode ('hex') = '\ N'

It can be seen that the Hex Encoding can be converted between the characters' representation (of course, it must be in ASCII) and the hexadecimal representation.
In addition, there are a lot of interesting things, such as base64, which is generally called the anti-bot and anti-bot code for emails. gzip usually refers to compression (which I guess) and rot13 to 13, unknown Google
For more information, see the standard encodings section in http://docs.python.org/library/codecs.html. the previous table is character-based and the second table.
It is the non-character encoding here. An official explanation of these special codes is as follows:
For the codecs listed below, the result in the "encoding" ction is always a byte string.
The result of the "Decoding" ction is listed as operand type in the table.
The result of encode must be a byte STR, and the result of decode is in the operand column of the table.

Reference
Converting between Unicode and plain strings are converted between Unicode and normal strings.
Http://wiki.woodpecker.org.cn/moin/PyCkBk-3-18
What's the difference between encode/decode? (Python 2.x)
Http://stackoverflow.com/questions/447107/whats-the-difference-between-encode-decode-python-2-x
Http://docs.python.org/library/codecs.html

Function of coding Declaration
See http://www.python.org/dev/peps/pep-0263/
Non-ASCII encoding will appear in the declared source file;
In advanced IDE, IDE saves your file format as the encoding format you specified.
It is also confusing to determine that the encoding format for decoding 'ha' such as u'ha' in the source code is also the encoding format used by Unicode.
(The Reason Why JAVA does not need to be declared is that Java uses local encoding by default and Py uses ascii by default, making Python more error-prone,
In addition, there is a specified encoding parameter encoding during Java compilation)

The encoding format of the file determines the encoding format of the string declared in the source file. For example:

Python code
1. Str = 'haha'
2. Print repr (STR)

A. If the file format is UTF-8, the STR value is '\ xe5 \ x93 \ x88 \ xe5 \ x93 \ x88' (haha UTF-8 encoding)
B. If the file format is GBK, the STR value is '\ xb9 \ xfe \ xb9 \ xfe' (haha GBK encoding)

My understanding: the file encoding format is not specified after it is saved. the compiler can guess it only by using a clever or stupid editor. The sound name is more accurate.
Make the two consistent.

In fact, the decode and encode concepts are similar in many other languages or applications, such as the encoding conversion involved in string in Java and the native2ascii tool in JDK,
It seems that JavaScript also has this, but I cannot remember it clearly.

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/suofiya2008/archive/2011/05/12/6415162.aspx

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.