Python print will automatically encode and convert the output text, but the write method of the file object will not. Therefore, when some strings are output normally using print, write to file is not necessarily the same as print.
The encoding of print conversion is related to environment variables. Windows XP converts data to GBK. In Linux, it is converted according to environment variables. Use the locale command in Linux. For example, mine is:
[Zhaowei @ papaya zhaowei] $ locale
Lang = zh_cn
Lc_ctype = "zh_cn"
Lc_numeric = "zh_cn"
Lc_time = "zh_cn"
Lc_collate = "zh_cn"
Lc_monetary = "zh_cn"
Lc_messages = "zh_cn"
Lc_paper = "zh_cn"
Lc_name = "zh_cn"
Lc_address = "zh_cn"
Lc_telephone = "zh_cn"
Lc_measurement = "zh_cn"
Lc_identification = "zh_cn"
Lc_all =
At this time, it will be considered gb2312. In python, you can use the locale module to obtain the encoding of the current environment:
Import locale
Print locale. getdefalocallocale ()
Print will automatically replace the string with this encoding during output. Let's take a look at the following. The word "Taobao" is a well-known word that is not found in gb2312. When you convert it to gb2312, an error will occur.
#-*-Encoding: gb18030 -*-
Import locale
Import sys, encodings, encodings. aliases
# Now A is Unicode
A = u'hangzhou'
Print A. encode ("gb2312 ")
The above code reports an exception, which is the cause. But print a can be output directly (assuming your environment variable is GBK, gb18030, or UTF-8 ). If your environment variable is gb2312, this print will report an error! So when processing text data from other places, it is best not to use gb2312 encoding, Chinese data, must use gb18030 or UTF-8!
Writing Unicode data with the write of the file object will also lead to errors! Encoding conversion is required.
#-*-Encoding: gb18030 -*-
Import locale
Import sys, encodings, encodings. aliases
# Now A is Unicode
A = u'hangzhou'
F = open ("aaa.txt", "W ")
F. Write ()
F. Close ()