This article mainly introduces the use of different code to read and write TXT file in Python, this article gives the code of read and write files under different coding methods, the need for friends can refer to the
The code is as follows:
Import OS
Import Codecs
Filenames=os.listdir (OS.GETCWD ())
Out=file ("Name.txt", "W")
For filename in filenames:
Out.write (Filename.decode ("gb2312"). Encode ("Utf-8")
Out.close ()
Writes the current directory and file name of the executing file to the Name.txt file and saves it in UTF-8 format
If you use ANSI encoding to save, you can write with the following code:
The code is as follows:
Out.write (filename)
Open file and write to
The codecs module is referenced and is not currently known to the module. In this record method, have the time to master the function and usage of the module.
The code is as follows:
Import Codecs
File=codecs.open ("Lol.txt", "W", "Utf-8")
File.write (U "i")
File.close ()
Read ANSI-encoded text files and Utf-8 encoded files
Reading ANSI encoded files
Create a file test.txt, file format with ANSI, content:
The code is as follows:
ABC Chinese
Using Python to read
The code is as follows:
# CODING=GBK
Print open ("Test.txt"). Read ()
Result: ABC Chinese
Read UTF-8 encoded files (no BOM)
Change the file format to UTF-8:
The code is as follows:
Result: ABC Juan PO
Obviously, you need to decode this:
The code is as follows:
#-*-Coding:utf-8-*-
Import Codecs
Print open ("Test.txt"). Read (). Decode ("Utf-8")
Result: ABC Chinese
Read Utf-8 encoded file (with BOM)
Some software, when saving a file encoded in UTF-8, inserts three invisible characters (0xEF 0xBB 0xBF, or BOM) where the file begins. In some software you can control whether to insert a BOM. If, in the case of a BOM, you need to remove these characters when reading, the codecs module in Python defines this constant:
The code is as follows:
#-*-Coding:utf-8-*-
Import Codecs
data = open ("Test.txt"). Read ()
If data[:3] = = codecs. Bom_utf8:
data = Data[3:]
Print Data.decode ("Utf-8")
Result: ABC Chinese
Look at the following example:
The code is as follows:
#-*-Coding:utf-8-*-
data = open ("Name_utf8.txt"). Read ()
U=data.decode ("Utf-8")
Print U[1:]
When you open a file in utf-8 format and read the Utf-8 string, the decoding becomes a Unicode object. However, the additional three characters will be converted into a Unicode character. The character cannot be printed. So for normal display, use u[1:] The way to filter to the first character.
Note: When handling Unicode Chinese strings, you must first call the Encode function on it and convert it to another encoded output.
Set the Python default encoding
The code is as follows:
Import Sys
Reload (SYS)
Sys.setdefaultencoding ("Utf-8")
Print sys.getdefaultencoding ()
Today I ran into the Python coding problem, the error message is as follows
The code is as follows:
Traceback (most recent call last):
File "Ntpath.pyc", line 108, in join
Unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xa1 in position 36:ordinal not in range (128)
Obviously the current encoding is ASCII and cannot parse 0xa1 (decimal 161, exceeding the upper limit of 128). After entering the Python console, we found that the default encoding is indeed ASCII, and the verification process is:
The sys.setdefaultencoding () function cannot be invoked in python2.6 to modify the default encoding. Because Python invokes the site.py file at startup, the Setdefaultencoding method of SYS is removed when the default encoding is set in this file. Can no longer be invoked. After you have determined that SYS has been imported, you can reload the Sys module and then sys.setdefaultencoding (' UTF8 ')
The code is as follows:
Import Sys
Reload (SYS)
Sys.setdefaultencoding ("Utf-8")
Print sys.getdefaultencoding ()
Really works, according to Limodou, site.py is a script that is loaded by default after the Python interpreter starts. If you start with Python-s, the site.py will not be loaded automatically.
It's kind of long-winded.
==================================
How do you permanently set the default encoding to Utf-8? There are 2 different ways:
==================================
The first method < do not recommend: Edit site.py, modify setencoding () function, force set to Utf-8
The second method < recommendation: Add a name sitecustomize.py, recommended to store the path for the Site-packages directory
Sitecustomize.py was executed in site.py, because Sys.setdefaultencoding () was last deleted at site.py, so sitecustomize.py can be used Tdefaultencoding ().
The code is as follows:
Import Sys
Sys.setdefaultencoding (' Utf-8 ')
Since sitecustomize.py can be loaded automatically, you can set up some other things besides coding.
Encoding of strings
The code is as follows:
s1= ' Chinese '
Strings entered directly like the one above are processed according to code file encoding, and in the case of Unicode encoding, there are three ways to do this:
The code is as follows:
1 s1 = U ' Chinese '
2 s2 = Unicode (' Chinese ', ' GBK ')
3 S3 = S1.decode (' GBK ')
Unicode is a built-in function, and the second parameter indicates the encoding format of the source string.
Decode is a method of any string that converts a string to Unicode format, and a parameter indicates the encoding format of the source string.
Encode is also a method of any string that converts a string into a format specified by the parameter.