The encoding and decoding in Python is the conversion between Unicode and Str. Encoding is Unicode-STR, on the other hand, decoding is
is Unicode, str. The rest of the problem is deciding when to encode or decode. The "code indication" at the beginning of the file, which is the #-*-coding:-*-this statement. The Python default script file is UTF-8 encoded and is corrected with a "coded indication" when there are characters in the file that are not UTF-8 encoded in the range. About sys.defaultencoding, this is used when decoding does not explicitly indicate the decoding method. For example, I have the following code:
[Python]View PlainCopy
- #! /usr/bin/env python
- # -*- coding: utf-8 -*-
- s = "Chinese" # Note that the str here is str type, not unicode
- S.encode ( Span class= "string" > ' GB18030 ')
This code re-encodes s into the GB18030 format, which is the conversion of Unicode-Str. Because S is itself a str type,
Python automatically decodes s to Unicode first, and then encodes it into GB18030. Because decoding is done automatically by Python, and we do not specify the decoding method, Python uses the sys.defaultencoding to decode it in the way indicated. In many cases sys.defaultencoding is
Anscii, if S is not the type, it will go wrong. In the above case, my sys.defaultencoding is anscii, and the encoding method of S and the file encoding method is consistent, is UTF8, so error:
Unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xe4 in position
0:ordinal not in range (128)
In this case, we have two methods to correct the error:
One is to explicitly indicate the encoding of s
[Python]View PlainCopy
- #! /usr/bin/env python
- #-*-Coding:utf-8-*-
- s = ' Chinese '
- S.decode ('utf-8 '). Encode (' gb18030 ')
The second is to change the encoding of the sys.defaultencoding file.
[Python]View PlainCopy
- </pre><p><pre name= "code" class= "python" >#! /usr/bin/env python
- # -*- coding: utf-8 -*-
- import sys
- Reload (SYS) # python2.5 is deleted after initialization sys.setdefaultencoding This method, we need to reload
- sys.setdefaultencoding ( ' utf-8 ')
-
- str = "Chinese"
- str.encode ( ' GB18030 ')
After reading, change to this
Print "<P>ADDR:", form["addr"].value.decode (' gb2312 '). Encode (' Utf-8 ')
Successfully passed.
Let me summarize the reasons why I wrote this:
1. Encode conversion when the retrieved data is inconsistent with the code declared in your current script
2. In the encoding conversion, the data is first converted to Unicode code in its own encoded format, and the Unicode is encoded by UTF8.
3. Why my browser will pass back GB2312 encoded data to the server, which should be related to the client's system encoding
Python error Unicodedecodeerror: