#-*-coding:utf-8-*-import sys,ostxta = open (' A.txt ', ' r ') str = ' for line ' txta: str + = Line.strip (). Decode (' Utf-8 ') Txta.close () for word in str: print word.encode (' Utf-8 ')
Direct output, it will be garbled, the first to decode, then encode.
Reference URL: http://blog.csdn.net/devil_2009/article/details/39526713
The first thing to understand is that the default string in Python is ASCII encoding, which is a string type, and the ASCII encoding process has a problem with the characters. The internal encoding format for Python is Unicode, and a ' u ' prefix can be used to directly declare a Unicode string, such as U ' Hello ', which is a Unicode type. If a non-ASCII code representation of a character is present in the processed string, it must be converted to Unicode encoding if it is not an error. The specific methods are: Decode (), converting other edge-encoded strings to Unicode encoding, such as Str1.decode (' gb2312 '), to convert gb2312 encoded string str1 into Unicode encoding; encode (), Converts a Unicode encoding into another encoded string, such as Str2.encode (' gb2312 '), to convert a Unicode-encoded string str2 to gb2312 encoding; Unicode (), with Decode (), Converting other encoded strings to Unicode encoding, such as Unicode (STR3, ' gb2312 '), represents the conversion of GB2312 encoded string STR3 to Unicode encoding. When transcoding, be sure to understand what string STR is encoded, then decode into Unicode, and then encode into other encodings. In addition, the decoding of a Unicode-encoded string can be an error, so in the case of unknown encoding, it is possible to determine whether the encoding is Unicode, using isinstance (str, Unicode). not only in Chinese, but when dealing with non-ASCII encoded strings, you can follow these steps: 1. Determine the encoding format of the source character, assuming that it is utf8;2, using Unicode (), or decode () to convert to Unicode encoding, such as Str1.decode (' UTF8 '), or Unicode (str1, ' UTF8 '), 3, encodes the processed string into the specified format using encode ().
Working with Chinese characters in a Python string