I am learning the process of Python, the second problem encountered, is the Chinese garbled, now is barely getting started, here to tell you about my experience, but also a new guide.
In the article, I will focus on a concept: there is to go. Where does the data come from and where to go? Chinese in ====================================================1, Windows CMD Terminal
C:\Documents and Settings\admin>python
python 2.7.< Span class= "lit" >7 (default jun 12014, 14: 17:13) [msc v. 1500 32 bit (intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = ‘我是中文‘
>>> ss = u‘我真的是中文‘
>>> s
‘\xce\xd2\xca\xc7\xd6\xd0\xce\xc4‘
>>> ss
u‘\u6211\u771f\u7684\u662f\u4e2d\u6587‘
>>> print s
我是中文
>>> print ss
我真的是中文
>>>
In this case, the input and output will not be garbled, even if our string added U. 1) Where does the input come from? Terminal 2) What is the input encoding? One does not know, one is Unicode 3) What is the output encoding? Do not know 2., execute PY file in Windows cmdLet's take a look at the code test.py
#coding:utf-8
s = ‘abc我是中文字符串‘
ss = u‘我也是中文字符串‘
print s
print repr(s)
print ss
print repr(ss)
The file is in the form of a UTF8 withour BOM (we'll discuss the file encoding later). We perform a look at the cmd terminal
D:\code>python test.py
abc鎴戞槸涓枃瀛楃涓
‘abc\xe6\x88\x91\xe6\x98\xaf\xe4\xb8\xad\xe6\x96\x87\xe5\xad\x97\xe7\xac\xa6\xe4\xb8\xb2‘
我也是中文字符串
u‘\u6211\u4e5f\u662f\u4e2d\u6587\u5b57\u7b26\u4e32‘
D:\code>
God, how there will be garbled, how can have garbled!! I'm going crazy. Stop, don't go crazy, let's step into the analysis:1) Where does the input come from? Crap, from the file.
2) What is the input encoding? This, like one is UTF8, one is Unicode
3) What is the output encoding? Don't know hey, not utf8?
did you see a little bit of a problem? UTF8------> Output encoding---------> garbledUnicode------>Output encoding---------> does not appear garbled then, before the text output, we first converted to Unicode, and then output, is not there is no garbled it? Let's give it a try.
#coding:utf-8
s = ‘abc我是中文字符串‘
ss = u‘我也是中文字符串‘
print s
print repr(s)
# 其它字符串解码成unicode
uu = s.decode(‘utf-8‘)
print uu
print repr(uu)
print ss
print repr(ss)
Take a look at the results
D:\code>python test.py
abc鎴戞槸涓枃瀛楃涓
‘abc\xe6\x88\x91\xe6\x98\xaf\xe4\xb8\xad\xe6\x96\x87\xe5\xad\x97\xe7\xac\xa6\xe4\xb8\xb2‘
abc我是中文字符串
u‘abc\u6211\u662f\u4e2d\u6587\u5b57\u7b26\u4e32‘
我也是中文字符串
u‘\u6211\u4e5f\u662f\u4e2d\u6587\u5b57\u7b26\u4e32‘
D:\code>
Sure enough, no garbled. It seems to have finally solved a problem, but it's not enough because we may have other problems. 3. Interacting with users in Windows cmd in order to cope with various environments, our code will encounter a variety of problems? For example, the written code may be executed in CMD, may be executed in idle, may also be executed under Linux, we have to as far as possible control program according to our will to work, the first is not garbled.If we now have a code file that requires the user's input, we execute it in cmd, we need to know a bit clearly, what is the encoding we entered? only know what the input encoding is? Can be decoded to Unicode so that no garbled characters are present.
So, in Cmd, what is the input encoding?
before that, let's learn the decode and the encode.1) Decode decoding, in the case of a known string encoding, transcoding to Unicode, such as S.decode (' Utf-8 '), the result is Unicode
2) Encode encoding, in the case of existing Unicode, transcoding to other codes, such as U.encode (' Utf-8 '), the result is Utf-8
I just have to say a little bit, you know.
sys.stdin.encoding
Of course, the corresponding is
sys.stdout.encoding
Or look at the code:
#coding:utf-8
import sys
s = raw_input()
print s
print repr(s)
u = s.decode(sys.stdin.encoding)
print u
print repr(u)
o = u.encode(sys.stdout.encoding)
print o
print repr(o)
Run in cmd
D:\code>python test.py
我是中文
我是中文
‘\xce\xd2\xca\xc7\xd6\xd0\xce\xc4‘
我是中文
u‘\u6211\u662f\u4e2d\u6587‘
我是中文
‘\xce\xd2\xca\xc7\xd6\xd0\xce\xc4‘
D:\code>
Run in Idle
python 2.7.< Span class= "lit" >7 (default jun 12014, 14: 17:13) [msc v. 1500 32 bit (intel)] on Win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
我是中文
我是中文
‘\xce\xd2\xca\xc7\xd6\xd0\xce\xc4‘
我是中文
u‘\u6211\u662f\u4e2d\u6587‘
我是中文
‘\xce\xd2\xca\xc7\xd6\xd0\xce\xc4‘
>>>
Running in Linux
[email protected]:~/Desktop# python test.py
我是中文
我是中文
‘\xe6\x88\x91\xe6\x98\xaf\xe4\xb8\xad\xe6\x96\x87‘
我是中文
u‘\u6211\u662f\u4e2d\u6587‘
我是中文
‘\xe6\x88\x91\xe6\x98\xaf\xe4\xb8\xad\xe6\x96\x87‘
Summary: If you know where you are coming from and where you are going, you will be sure to get to that place.
Python Chinese Code (i)