Python file processing

Source: Internet
Author: User

File Operation Introduction

Python uses the open () function, which invokes the open () function, and the application initiates a system call to open (...) to complete the operation of the file. The following examples are completed in the PYTHON3 environment ~

Read file Simple Read

Sample Read file:

f = open(file=‘/Users/luyi/tmp/abc‘, mode=‘r‘, encoding=‘utf-8‘)data = f.read()       # 读取文件中所有数据print(data)             # 打印文件内容f.close()                 # 关闭操作系统级打开的文件

The open () function parameter description:

file:指定文件路径mode:指定打开文件的模式,r 表示只读模式,还有很多其他模式,下面会依次介绍encoding:指定从文件中读取的数据使用什么解码标准(decode),解码后以unicode的形式存放到内存中~。注意:若不指定encoding,打开文件的默认编码就是操作系统的默认编码,简体中文版的windows是gbk,linux下是utf-8。以什么编码存放,就应该以什么编码读取~

Note Points for reading file operations:
1) After the read action is completed, you need to close the file open at the operating system level (F.close ())
2) Recycle application-level variables (del f)
After the TIP:F variable is not referenced, the Python garbage collection mechanism automatically reclaims the F variable without having to do it manually. However, F.close () must be done manually, otherwise it will continue to consume system resources until the end of the program run ~

There is also a way to manually shut down system-level resources

with open() as f:    pass# 将上述代码进行调整with open(file = ‘/Users/luyi/tmp/abc‘, mode = ‘r‘, encoding = ‘utf-8‘) as f:    data = f.read()    print(data)

Tip:with the end of the code, the system-level Open file is automatically closed ~

With can also open more than one file at a time, the following example implements a progressive copy of a file's file, which involves the following methods:

with open(file = ‘/Users/luyi/tmp/abc.txt‘, mode = ‘r‘, encoding=‘utf-8‘) as read_file, open(file = ‘/Users/luyi/tmp/cde.txt‘, mode = ‘w‘, encoding=‘utf-8‘) as write_file:    for line in read_file:        write_file.write(line)

When reading a file, if it is unclear what encoding the file is being opened, you can use the Chardet module to detect how the file is encoded

import chardetresult = chardet.detect(open(‘/Users/luyi/tmp/abc.txt‘, mode=‘rb‘).read())print(result)结果输出:{‘encoding‘: ‘utf-8‘, ‘confidence‘: 0.87625, ‘language‘: ‘‘}

In the example above, the file is opened using ' RB ' mode, which is read-only and opens the file in bytes. Non-text files can only be opened in B mode because the files themselves are stored as bytes on the storage device. The text file can be opened in byte mode, or it can be opened in this document mode (default), Picture format file (Jpg,png ... ), video audio format files (MP3, MP4, avi ... ) can only be opened in byte mode.

When a text file is opened using byte mode, the encoding parameter cannot be used in the open function, to convert to Unicode, the data is now read into memory and then manually decode~

with open(file = ‘/Users/luyi/tmp/abc.txt‘, mode = ‘rb‘, encoding = ‘utf-8‘) as f:    print(f.read())# open中使用 encoding参数,会直接抛出错误信息ValueError: binary mode doesn‘t take an encoding argument#############with open(file = ‘/Users/luyi/tmp/abc.txt‘, mode = ‘rb‘) as f:    data = f.read()    print(data)                           # 原样输出内容    print(data.decode(‘utf-8‘))  # 解码后输出文本内容输出结果:b‘\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8ckitty\n‘你好,kitty
Looping files
f = open("abc.txt",‘r‘,encoding="utf-8")data = f.read()

Using this method to read the file, all the data in the file will be read into memory, if the file is large, which will put a lot of pressure on memory, this is not a reasonable way to read the file. This is the time to get the data row by line ~

# 使用 readline() 仅读取文本的一行内容,光标从文件的开头移动到一行的末尾(只读一行,遇到\r 或者 \n 为止),然后读物第二行with open(file = ‘/Users/luyi/tmp/def.txt‘, mode = ‘r‘, encoding=‘utf-8‘) as f:    data = f.readline()         # 仅读取一行    print(data)# 使用 while 循环遍历文件with open(file = ‘/Users/luyi/tmp/def.txt‘, mode = ‘r‘, encoding=‘utf-8‘) as f:    line = f.readline()    while line:                   # 当 line 不为None,继续读取文件        print(line, end=‘‘)    # print() 自带换行功能,line中已有‘\n‘,print不用再换行,所以加上参数end=‘‘        line = f.readline()# 使用 for 循环遍历 f 变量也是逐行读取文件内容,与 readline() 类似。这里其实内部调用了 f 变量的next()方法,逐行迭代文件对象,这种方式的遍历貌似比 readline() 要来的简单明了。with open(file = ‘/Users/luyi/tmp/def.txt‘, mode = ‘r‘, encoding=‘utf-8‘) as f:    for line in f:        print(line, end=‘‘)

TIP: When invoking the next () method, you cannot call the tell () method at the same time to get the file cursor position

and ReadLine () features a readlines (), ReadLines () takes the contents of a file once and stores it as a list, each line of the file is an element of the list ~

with open(file = ‘/Users/luyi/tmp/def.txt‘, mode = ‘r‘, encoding=‘utf-8‘) as f:    line = f.readlines()    print(type(line))    print(line)结果输出:<class ‘list‘>[‘1.aaaaaaaaa\n‘, ‘2.bbbbbbbbb\n‘, ‘3.ccccccccc\n‘, ‘4.ddddddddd\n‘, ‘5.eeeeeeeee\n‘, ‘6.fffffffff\n‘]
Write a file

Python writes files, also uses open (), Mode uses ' W ' (write only), ' WB ' (binary Write), ' w+ ' (readable writable) ~

Write directly
f = open(file = ‘/Users/luyi/tmp/abc.txt‘, mode = ‘w‘, encoding = ‘utf-8‘)f.write(‘你好~~,hello‘)f.close()       # 同样需要注意,关闭系统级别的资源

Open () parameter description:

file:指定文件路径,若该目录下文件不存在,会创建,若文件存在,直接覆盖;若目录不存在,抛出错误mode=‘w‘:指定为只写模式encoding:指定写入到文件时,使用的编码标准,即将 内存中 unicode 格式的数据 按照指定编码标准编码后进行存储(这里是 encoding 过程,和read相反)

You can also use the with open () method instead:

with open(file = ‘/Users/luyi/tmp/abc.txt‘, mode = ‘w‘, encoding = ‘utf-8‘) as f:    f.write(‘你好!!,hello‘)
Binary Write

Binary write mode is ' WB ', when specifying mode= ' WB ', data must be passed in byte type data (bytes type), and a newline character (\ n) needs to be added manually

with open(file = ‘/Users/luyi/tmp/abc.txt‘, mode = ‘wb‘) as f:    f.write(‘你好~~,hello‘.encode(‘utf-8‘))

TIP: Because it is ' WB ' mode, the Write () function needs to pass in the bytes type of data, directly make the single or double quotation marks contain the data is the STR type, the str type of data encode, the return is bytes type, or can also use B ' ~

with open(file = ‘/Users/luyi/tmp/abc.txt‘, mode = ‘wb‘) as f:    f.write(b‘abc‘)          # bytes 类型只能包含 ASCII 码字符

Write multiple lines simultaneously using the Writelines () function

with open(file = ‘/Users/luyi/tmp/abc.txt‘, mode = ‘w‘, encoding=‘utf-8‘) as f:    f.writelines([‘111111\n‘, ‘222222\n‘, ‘333333\n‘])    # 需要自己添加换行符# wb模式with open(file = ‘/Users/luyi/tmp/abc.txt‘, mode = ‘wb‘) as f:    f.writelines([bytes(‘你好\n‘.encode(‘utf-8‘)), ‘kitty\n‘.encode(‘utf-8‘)])

Tip:f.flush () can instantly swipe the contents of a file from memory to disk
Note: When opening a file with ' W ', ' WB ', ' w+ ' (write mode, rarely used) mode, the file contents will be emptied when the file is opened ~

Read/write mode

Open File mode also has read and write mode, open can read and write operation, that is mode= ' r+ ';

with open(file = ‘/Users/luyi/tmp/abc.txt‘, mode = ‘r+‘, encoding=‘utf-8‘) as f:    print(f.read())    f.write(‘你好~~,hello‘)
Append file

When you open a file, you can only append the file if you use the mode "a" or "AB"-that is, append content at the end of the original content ~

with open(file = ‘/Users/luyi/tmp/abc.txt‘, mode = ‘a‘, encoding=‘utf-8‘) as f:    f.write(‘aaaaaaa\n‘)    f.write(‘vvvvvvv\n‘)文件内容:你好kittyaaaaaaavvvvvvv
Description of various modes of file operation

Mode enumeration for manipulating files

Mode Description
' R ' Read mode
' W ' Write mode
A Append mode
' B ' Binary mode
' t ' Text mode (default)
+ Read and write a file at the same time
' X ' Write-only mode, the file does not exist automatically created, the file was originally there is an error

In addition to the above several patterns, there are various combinations of the above pattern ~
The commonly used patterns are:
RB ', ' WB ', ' AB ' # non-text files, can only be opened using ' B ' mode
' r+ ', ' w+ ', ' A + ' # All indicate that you can read and write, but note that the ' w+ ' mode opens the file, so the general mode ' w+ ' is rarely used, the cursor jumps directly to the end of the file when using the mode ' A + ', so the previous content cannot be read ~

Additional Actions for files

Common functions for other file operations:

def writable(self, *args, **kwargs): # real signature unknown        判断文件是否可写def readable(self, *args, **kwargs): # real signature unknown        判断文件是否可读def fileno(self, *args, **kwargs): # real signature unknown        返回文件句柄在内核中的索引值,做IO多路复用时可以用到def seek(self, *args, **kwargs): # real signature unknown        移动光标移到指定位置,注意这里移动光标的单位是字节def seekable(self, *args, **kwargs): # real signature unknown        判断文件是否可进行 seek 操作def tell(self, *args, **kwargs): # real signature unknown        返回当前文件操作光标位置 def truncate(self, *args, **kwargs): # real signature unknown        按指定长度截断文件,经个人测试 python3中的 truncate 方法貌似已经失效

During the operation of the file, the cursor moves continuously, for example, after the ReadLine () method reads a line of information, the cursor moves to the end of the current line, and the WriteLine () cursor moves to the end of the current line (the writing process looks like the cursor is always at the end) ~

The units of the cursor movement are related to the open mode of the file, and if the file is opened in text mode, read (5) Moves the cursor backwards by 5 characters, and if the file is opened in a non-text mode (' B ' mode), read () moves backwards by 5 bytes. The Seek function no matter what the pattern is, the units of the cursor movement are bytes ~

For example, the text reads as follows:

123456789abcdefghi

Code

with open(file = ‘/Users/luyi/tmp/abc.txt‘, mode = ‘r+‘, encoding=‘utf-8‘) as f:    f.seek(5)    print(f.readline())    print(f.tell())        # 输出当前光标位置结果输出:678910

.................^_^

Python file processing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.