When you use Python to write a file, or when you write a network stream to a local file, you will most likely encounter: Unicodeencodeerror: ' GBK ' codec can ' t encode character ' \xa0 ' in position ... This problem. There are a lot of similar files on the web about how to solve this problem, but it is encode,decode related, is this the real cause of the problem? No. Many times, we used decode and encode, tried all kinds of coding, utf8,utf-8,gbk,gb2312 and so on, the code has been tried, but the compile time still appears: Unicodeencodeerror: ' GBK ' codec Can ' t encode character ' \xa0 ' in position XXX. Broke down.
Writing a Python script under Windows is a serious coding problem.
When you write a network data stream to a file, we encounter several encodings:
1: The encoding of the #encoding = ' XXX ' here (the contents of the first line of the Python file) refers to the encoding of the Python script file itself, which is irrelevant. As long as xxx and the file itself are encoded the same. For example, notepad++ "format" menu can be set up a variety of coding, then need to ensure that the menu set in the Code and encoding xxx the same line, different words will be the error
2: Network data stream encoding such as access to the Web page, then the network data stream encoding is the page encoding. You need to decode the Unicode encoding using decode.
3: Target file Encoding to write the encoding of the network data stream to the new file, then I need to specify the encoding of the new file. Write file code such as:
Copy Code code as follows:
, then TXT is a string, which is a string that has been decoded by Decode. The key point is coming: the encoding of the target file is the culprit that causes the title to refer to the problem. If we open a file:
Copy Code code as follows:
f = open ("Out.html", "W")
, under Windows, the new file's default encoding is GBK, so that the Python interpreter will use GBK encoding to parse our network data stream txt, but txt is already decode Unicode encoding, this will lead to parsing, this problem. The solution is to change the encoding of the target file:
Copy Code code as follows:
f = open ("Out.html", "w", encoding= ' Utf-8 ')
。 In this way, the problem will no longer exist.