Reading and Writing txt files using different codes in Python
This article mainly introduces how to read and write txt files using different codes in Python. This article provides code methods for reading and writing files under different codes. For more information, see
The Code is as follows:
Import OS
Import codecs
Filenames = OS. listdir (OS. getcwd ())
Out = file ("name.txt", "w ")
For filename in filenames:
Out. write (filename. decode ("gb2312"). encode ("UTF-8 "))
Out. close ()
Write the current directory and name of the execution file to the name.txt file and save it in UTF-8 format.
If it is saved in ANSI encoding, write the following code:
The Code is as follows:
Out. write (filename)
Open the file and write
REFERENCE The codecs module. I do not know about this module currently. Record the method here to learn about the functions and usage of this module.
The Code is as follows:
Import codecs
File = codecs. open ("lol.txt", "w", "UTF-8 ")
File. write (u "I ")
File. close ()
Read ANSI-encoded text files and UTF-8-encoded files
Read ANSI encoded files
Create a file named test.txt in ANSI format with the following content:
The Code is as follows:
Abc Chinese
Read data using python
The Code is as follows:
# Coding = gbk
Print open ("Test.txt"). read ()
Result: abc (Chinese)
Read UTF-8 encoded files (without BOM)
The file format into UTF-8:
The Code is as follows:
Result: abc Juan
Obviously, decoding is required here:
The Code is as follows:
#-*-Coding: UTF-8 -*-
Import codecs
Print open ("Test.txt"). read (). decode ("UTF-8 ")
Result: abc (Chinese)
Read UTF-8 encoded files (with BOM)
Some software inserts three invisible characters (0xEF 0xBB 0xBF, or BOM) at the beginning of the file by default when saving a UTF-8-encoded file ). Some software controls whether to insert BOM. If you need to remove these characters when reading a BOM, The codecs module in python defines the constant:
The Code is as follows:
#-*-Coding: UTF-8 -*-
Import codecs
Data = open ("Test.txt"). read ()
If data [: 3] = codecs. BOM_UTF8:
Data = data [3:]
Print data. decode ("UTF-8 ")
Result: abc (Chinese)
Let's look at the example below:
The Code is as follows:
#-*-Coding: UTF-8 -*-
Data = open ("name_utf8.txt"). read ()
U = data. decode ("UTF-8 ")
Print u [1:]
Open a file in UTF-8 format and read the UTF-8 string, and then decode it into a unicode object. However, the added three characters are converted to a unicode character. This character cannot be printed. For normal display, use the u [1:] method to filter the first character.
Note: When processing unicode Chinese strings, you must first call the encode function to convert it to other encoding outputs.
Set python default encoding
The Code is as follows:
Import sys
Reload (sys)
Sys. setdefaultencoding ("UTF-8 ")
Print sys. getdefaultencoding ()
Today I encountered a python encoding problem. The error message is as follows:
The Code is as follows:
Traceback (most recent call last ):
File "ntpath. pyc", line 108, in join
UnicodeDecodeError: 'ascii 'codec can't decode byte 0xa1 in position 36: ordinal not in range (128)
Obviously, the current encoding is ascii, and 0xa1 cannot be parsed (decimal: 161, exceeds the upper limit of 128). After Entering the python console, it is found that the default encoding is ascii, and the verification process is:
Sys. setdefaultencoding () function to modify the default encoding, because python calls site at startup. py file. After the default encoding is set in this file, the setdefaultencoding method of sys is deleted. It cannot be called again. After confirming that sys has been imported, You can reload the sys module and then sys. setdefaultencoding ('utf8 ')
The Code is as follows:
Import sys
Reload (sys)
Sys. setdefaultencoding ("UTF-8 ")
Print sys. getdefaultencoding ()
It does. According to limodou, site. py is a script loaded by default after the python interpreter is started. If it is started using python-S, site. py will not be automatically loaded.
The above is pretty cool.
========================================
How can I set the default encoding to UTF-8 permanently? There are two methods:
========================================
Method 1 <not recommended>: Edit site. py, modify the setencoding () function, and set it to UTF-8.
Method 2 <recommended>: Add sitecustomize. py. The recommended path is under the site-packages directory.
Sitecustomize. py is in site. py is executed by import because sys. setdefaultencoding () is in site. the last deleted by py, so you can click sitecustomize. py uses sys. setdefaultencoding ().
The Code is as follows:
Import sys
Sys. setdefaultencoding ('utf-8 ')
Since sitecustomize. py can be automatically loaded, you can set some other things besides encoding.
String Encoding
The Code is as follows:
S1 = 'Chinese'
Strings directly entered as above are processed according to the code file encoding. For unicode encoding, there are three methods:
The Code is as follows:
1 s1 = u'chinese'
2 s2 = unicode ('Chinese', 'gbk ')
3 s3 = s1.decode ('gbk ')
Unicode is a built-in function. The second parameter indicates the encoding format of the source string.
Decode is a method used by any string to convert the string to unicode format. The parameter indicates the encoding format of the source string.
Encode is also a method of any string. It converts a string to the format specified by the parameter.