1. File processing
=open(file="file01.txt", mode="r", encoding="utf-8"#python3默认编码格式为utf-8= f.read()print(data)print(type# <class ‘str‘>f.close()
If an error
#UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0xcc in position 0: invalid continuation byte
The description is not coded.
According to the normal logic, the file is stored in what way, it should be used in what way to read, such as gb2312, should be gb2312 to read. Files are stored in Utf-8, and are read in UTF-8 encoded format.
2. File processing-binary mode
Files are stored binary in the computer, we can ignore the code, directly read the contents of the file in binary form
=open(file="file01.txt", mode="rb"= f.read()print(data)f.close()
Read in binary mode, and the resulting mode = "rb"
result is also binary.
When reading video, pictures and other content and network transmission, it will be read in binary mode.
#打印结果 b‘\xef\xbb\xbf\xe7\x94\xb0\xe7\xbb\xb4\xe9\x80\x9a\t12001\t10\r\n\xe5\xbc\xa0\xe5\xae\xb6\xe9\x93\xad\t12002\t11\r\n\xe8\x88\x92\xe5\xa8\x85\t12003\t12\r\n\xe5\xad\x99\xe7\x8e\x89\xe5\x80\xa9\t12004\t13\r\n\xe5\xbc\xa0\xe8\xb6\x85\t12005\t14\r\n\xe7\x8e\x8b\xe4\xba\xac\t12006\t15\r\n\xe5\xbb\x96\xe6\x9e\x97\xe8\x8b\xb1\t12007\t16\r\n\xe5\xbe\x90\xe6\x99\x93\xe8\x8e\x89\t12008\t17\r\n\xe9\x87\x91\xe5\x98\x89\xe7\xa5\xba\t12009\t18\r\n\xe5\x8f\x8a\xe6\xa0\xbc\t12010\t19\r\n\xe4\xba\x8e\xe5\x87\xaf\xe9\x98\xb3\t12011\t20\r\n\xe6\x9d\x8e\xe4\xbf\x8a\xe7\xba\xa2\t12012\t21\r\n\xe5\x88\x98\xe5\x86\xac\t12013\t22\r\n‘
3. File processing-Tools for intelligent detection coding
import=open(‘file01.txt‘, mode="rb"= f.read()print(chardet.detect(data))#打印结果:#{‘encoding‘: ‘UTF-8-SIG‘, ‘confidence‘: 1.0, ‘language‘: ‘‘}#confidence,表示 encoding 为 UTF-8-SIG 的概率为 1.0
Then, when we know what format the target file is, we data.decode("utf-8")
can print out what we need.
4. Loop read-by-read file
=open(‘file01.txt‘‘r‘, encoding=‘utf-8‘)forin f: print(line, end=""# end = "" 表示,打印的时候以什么结尾,此处可以去掉print默认的换行符 \nf.close()
Printing results:
# 田维通 12001 10# 张家铭 12002 11# 舒娅 12003 12# 孙玉倩 12004 13# 张超 12005 14# 王京 12006 15# 廖林英 12007 16# 徐晓莉 12008 17 ...
5. Writing files
#以 gbk 格式创建一个文件,写入内容“将进酒”=open(file=‘file02.txt‘, mode=‘w‘, encoding=‘gbk‘)f.write(‘将进酒‘)f.close()
If you write it again this time,
=open(file=‘file02.txt‘, mode=‘w‘, encoding=‘gbk‘) f.write(‘杯莫停‘)f.close()
The result is that the original File02.txt file has been overwritten
6. Write file--append
Write content appended to the existing content after
=open(‘file02.txt‘‘ab‘) #mode 为 ab 或 a,表示追加f.write(‘\n人生得意须尽欢‘.encode(‘gbk‘))f.close()
7. File processing-read and write mixed operation files
=open(‘file02.txt‘‘r+‘, encoding=‘gbk‘= f.read()print("content:", data)f.write("\n锄禾日当午")f.write("\n汗滴禾下土")f.write("\n离离原上草")f.write("\n一岁一枯荣")f.close()
Results:
8. Other functions of the file operation (1) flush ()
=open(‘f_flush.txt‘‘w‘, encoding=‘utf-8‘)f.write(‘奇门遁甲‘# 在f.close() 之前,写入的内容是在内存中的,而且可能此时txt文件里是没有内容的,所以可以加一句 f.flush(),把文件强制从内存buffer里刷新到硬盘#一般内存里的buffer满了会自动刷新到硬盘,但是使用 f.flush() 可控制强制刷新到硬盘f.close()
(2) Tell () Seek ()
# 文本内容: hello world!>>>=open(‘file03.txt‘‘r‘, encoding=‘gbk‘)>>>#返回当前文件操作光标位置 0>>> f.seek(1# 把操作文件的光标移到指定位置1>>> f.read()‘ello world!‘
Note: Tell (), Seek () is a byte, and the length is calculated in bytes. In addition, each character in different encoding format accounted for the byte length is not the same, GBK a Chinese account for 2 bytes, utf-8 a Chinese 3 bytes
Take Chinese as an example:
# File contents: outmanoeuvred>>>F= Open("File03.txt",' R ', encoding=' GBK ')>>>F.read ()' outmanoeuvred '>>>F.tell ()8>>>F.seek (0)#把文件光标移动到起点 00>>>F.seek (4)#把文件光标移动到 4, at this time, GBK, a Chinese character accounted for 2 bytes, at this time the position of the cursor between the technical high and a chip4>>>F.read ()# So, read the result for the latter two words' a chip '>>>#----------------------------------------------------------------------------->>>F.seek (1)#把文件光标移动到 1, "Half a word", at this time read the content is not a problem, because he only got a part of the technical word byte, print will error1>>>F.tell ()1>>>F.read () Traceback (most recent): File"<stdin>", line1,inch <Module>Unicodedecodeerror:' GBK 'Codec can' t decode byte 0xef in position 6:incompletemultibyte sequence
(3) seekable () determine whether the file can be readable () to determine whether the file is readable, writable () to determine whether the file is writable (4) Truncate () truncates files by a specified length
#文件内容: 技高一筹>>>=open("file03.txt",‘r+‘,encoding=‘gbk‘)>>> f.seek(2)2>>> f.truncate()2>>>#文件内容 只剩一个 “技” 字
Truncate (4) Specifies the length, from the beginning of the file to truncate the specified length, without specifying the length, the contents from the current position to the end of the file is completely removed.
#文件内容: 技高一筹>>>=open("file03.txt",‘r+‘,encoding=‘gbk‘)>>> f.tell()0>>> f.truncate(4) #文件内容 只剩一个 “技高” 字
9. Make changes to the contents of the file
ImportOsf= Open(' File05.txt ',' r+ ', encoding=' GBK ')#打开file05. txt filesF_new= Open(' File05_new.txt ',' W ', encoding=' GBK ')# Create a new fileOld_str= ' John Doe 'New_str= ' Li Yunlong ' forLineinchF:ifOld_strinchLine#逐行读取Line=Line.replace (Old_str, NEW_STR)#修改文件中的内容F_new.write (line)# Read-by-line content is written to the newly created fileF.close () F_new.close () Os.replace (' File05_new.txt ',' File05.txt ')# Replace, in order to achieve the purpose of modifying the contents of the File (window with Os.replace () can be implemented, but using os.rename () can not, will be error, prompted file05.txt already exist, cannot be created)
python--file Processing