Compare the four methods for reading files using Python,
Preface
Everyone knows that Python supports a variety of file reading methods, but different reading methods have different effects when you need to read a large file. Next let's take a look at the detailed introduction.
Scenario
Read a large 2.9 GB file row by row
Method
Split the string once for reading each row
The following methods use... As method to open the file.
The with statement is applicable to resource access. Make sure that the necessary "clean" operation is performed to release resources no matter whether exceptions occur during use, for example, the file is automatically closed after use, and the lock in the thread is automatically obtained and released.
Method 1: The most common way to read files
with open(file, 'r') as fh: for line in fh.readlines(): line.split("|")
Running result: it takes 15.4346568584 seconds
The system monitor shows that the memory suddenly exceeded from 4.8 GB to 8.4 GB, and fh. readlines () stores all the data read in the memory. This method is suitable for small files.
Method 2
with open(file, 'r') as fh: line = fh.readline() while line: line.split("|")
Running result: it takes 22.3531990051 seconds
The memory is almost unchanged because only one row of data is accessed in the memory, but the time is significantly longer than the previous time, which is inefficient for further data processing.
Method 3
with open(file) as fh: for line in fh: line.split("|")
Running result: it takes 13.9956979752 seconds
The memory is almost unchanged, and the speed is faster than method 2.
For line in fh treats file object fh as iteratable, and it automatically uses buffered IO and memory management, so you don't have to worry about large files. This is very pythonic!
Method 4: fileinput Module
for line in fileinput.input(file): line.split("|")
Running result: it takes 26.1103110313 seconds
The memory is increased by 200-300 MB, and the speed is the slowest.
Summary
The above method is for reference only. It is recognized that the method for reading large files is the best of three. However, the specific situation depends on the performance of the machine and the complexity of data processing.
Well, the above is all the content of this article. I hope the content of this article will help you in your study or work. If you have any questions, you can leave a message, thank you for your support.