Encode and decode methods for Python's Str,unicode objects

Source: Internet
Author: User

Encode and decode methods for Python's Str,unicode objects
The Str object in Python is actually "8-bit string", a byte string, essentially similar to byte[in Java).
The Unicode object in Python should be the equivalent of a string object in Java, or essentially a Java char[].
For

    1. s="Hello"
    2. U=u"Hello"


1. The S.decode method and the U.encode method are the most commonly used,
In short, Python internally means that the string is Unicode (in fact, Python's internal representation and real Unicode are somewhat different, almost transparent to us, not considered), and the Str object is used when interacting with people.
S.decode--------> Decodes S to Unicode, and the parameter specifies s original encoding. This is the same as Unicode (S,encodename).
U.encode--------> Encodes Unicode into a str object, and the parameter specifies the encoding used.
Mnemonic: Decode to Unicode from parameter
Encode to parameter from Unicode
Only the Decode method and the Unicode constructor can get Unicode objects.
The most common uses for this are such scenarios where we specify the use of the encoding cp936 in the Python source file,
# coding=cp936 or #-*-coding:cp936-*-or #coding:cp936 way (not write default is ASCII encoding)
So the Str object in the source file is cp936 encoded, and we want to pass this string to a place that needs to be saved to another encoding (such as the utf-16 required for XML utf-8,excel)
It's usually written like this:
Strobj.decode ("cp936"). Encode ("utf-16")

You typically encode a Unicode string whenever your need to use it for iOS, for instance transfer it over the network, or SA ve it to a disk file.
To convert a string of bytes to a Unicode string is known as decoding. Use Unicode (' ... ', encoding) or ' ... '. Decode (encoding).
You typically decode a string of bytes whenever to receive string data from the network or from a disk file.
2.
The first article has been written a lot, because it is the most commonly used, basically do not explain. What I'm trying to say is this second one.
It seems that the Encode method of the Unicode object and the Decode method of Str are sufficient. Oddly, Unicode also has decode, and STR also has
Encode, what the hell are these two doing?
Useful 1
STR itself is already coded, and it's hard to think of any use (usually a mistake) if encode.
Let's explain this.
Str.encode (e) is the same as Unicode (str). Encode (e).
This is useful since code, expects Unicode strings should also work if it is passed
Ascii-encoded 8-bit strings (from Guido van Rossum)
The father of Python probably meant to say that the Encode method was originally Unicode, but if it was accidentally tuned as a Str object, the Str object just
is ASCII-encoded (ASCII is the same as Unicode) and should be made to succeed. This is one of the uses of the Str.encode method (I think this is basically useless)
Similarly, it is the same thing to decode the ASCII Unicode of light again, as it seems that ASCII is unchanged in almost any encoding. So the operation is tantamount to not doing it.
U "abc". Decode ("gb2312") and U "abc" are equal.

Useful 2
Non-character encoding set NON-CHARACTER-ENCODING-CODECS, these are defined only in Python, leaving Python meaningless (this official document from Python)
And it is not a human language, hehe.
Like what

  1. ' \ n '. Encode (' hex ') = =' 0a '
  2. u' \ n '. Encode (' hex ') = =' 0a '
  3. ' 0a '. Decode (' hex ') = =' \ n '
  4. U' 0a '. Decode (' hex ') = =' \ n '
' \ n '. Encode (' hex ') = = ' 0a ' u ' \ n '. Encode (' hex ') = = ' 0a ' 0a '. Decode (' hex ') = = ' \ n ' u ' 0a '. Decode (' hex ') = = ' \ n '


The visible code named Hex can be expressed as a character representation (of course, must be in ASCII) and hexadecimal representation between the conversion
There are a lot of fun, such as: Base64 Popular is known as the anti-gentleman to the message of the code, gzip probably refers to the compression bar (this is my guess), rot13 rotation 13, I do not know who Google
For these, the official has a detailed table, in the Http://docs.python.org/library/codecs.html Standard Encodings section, the previous table is character-based encoding, the second table
is the non-character encoding here. With regard to these special codes, the official sentence explains:
For the codecs listed below, the result in the "encoding" direction was always a byte string.
The result of the "decoding" direction is listed as a operand type in the table.
The result of encode must be a byte str, and the result of decode operand a column in the table.


Reference
Converting between Unicode and Plain Strings between Unicode and normal strings
Http://wiki.woodpecker.org.cn/moin/PyCkBk-3-18
What ' s the difference between encode/decode? (Python 2.x)
Http://stackoverflow.com/questions/447107/whats-the-difference-between-encode-decode-python-2-x
Http://docs.python.org/library/codecs.html


The role of the encoding Declaration
Please refer to http://www.python.org/dev/peps/pep-0263/
Non-ASCII encoding will appear in the declaration source file;
In the Advanced IDE, the IDE saves your file format as you specify the encoding format.
It is also a confusing place to determine the encoding format used to decode the ' ha ' into Unicode, similar to the U ' Ha ' statement in the source code.
(Java does not need to be declared because the default is local encoding in Java, and the default is ASCII in py, making Python more error-prone,
Also, Java compiles with a specified encoding of the parameter encoding)

The encoding format of the file determines the encoding format of the string that is declared in the source file, for example:

    1. str = ' haha '
    2. print repr (str)


A. If the file format is Utf-8, the value of STR is: ' \xe5\x93\x88\xe5\x93\x88 ' (haha utf-8 encoding)
B. If the file format is GBK, the value of STR is: ' \xb9\xfe\xb9\xfe ' (haha GBK encoding)

My understanding: The file encoding format after saving no place to indicate, only by clever or stupid editor, the compiler to guess. And fame is more accurate.
It's never wrong to make the two agree.

In fact, a lot of other languages or applications are similar to decode and encode concepts, such as String in Java involved in the encoding conversion and the tool native2ascii in the JDK,
It seems that JavaScript has this too, and it's not clear.

Encode and decode methods for Python's Str,unicode objects

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.