Python's API to read and write files is simple, and it's easy to step on the pit when you're not careful. The author records a step of the pit process, and gave some summary, I hope to everyone in the process of using Python, can avoid some of the potential pitfalls of code.
1.read () and ReadLines ():
A tutorial that searches Python for read and write files, and often sees read () and ReadLines () this pair of functions. So we will often see the following code:
with open(file_path, 'rb') as f: sha1Obj.update(f.read())
Or
with open(file_path, 'rb') as f: for line in f.readlines(): print(line)
This pair of methods does not produce any exceptions when reading small files, but once large files are read, it is easy to generate Memoryerror, which is a memory overflow problem.
# # # #Why Memory Error?
Let's take a look at these two methods first:
When the default parameter is Size=-1, the Read method reads until EOF, and a memory overflow error naturally occurs when the file size is larger than the available memory.
Similarly, ReadLines constructs a list. List instead of ITER, so all content is stored on top of the memory, and memory overflow errors occur.
2. Correct usage:
In a system that is actually running, it is very dangerous to write the above code, and This "pit" is very covert. so let's take a look at the correct use, the correct usage is very simple, according to the API description of the function of the corresponding encoding is OK:
If the binary file is recommended as follows, you can specify how many bytes the buffer has. Obviously, the larger the buffer, the faster the read speed.
with open(file_path, 'rb') as f: while True: buf = f.read(1024) if buf: sha1Obj.update(buf) else: break
In the case of a text file, you can use the ReadLine method or directly iterate over the file (Python encapsulates a syntactic sugar, the endogenous logic of the two is consistent, but obviously the iteration of the file is more pythonic ) Each time a row is read, the efficiency is relatively low. The author briefly tested, under the 3G file, probably the performance and the former difference of 20%.
with open(file_path, 'rb') as f: while True: line = f.readline() if buf: print(line) else: breakwith open(file_path, 'rb') as f: for line in f: print(line)
3. Description of the Memory Detection tool:
For the memory footprint of Python code, it is necessary to monitor the code for memory. Here I recommend two gadgets to detect the memory footprint of Python code.
# # #memory_profiler
First install Memory_profiler with PIP
pip install memory_profiler
Memory_profiler is working with Python adorners, so we need to add adorners to the function we're testing.
from hashlib import sha1import sys@profiledef my_func(): sha1Obj = sha1() with open(sys.argv[1], 'rb') as f: while True: buf = f.read(10 * 1024 * 1024) if buf: sha1Obj.update(buf) else: break print(sha1Obj.hexdigest())if __name__ == '__main__': my_func()
Then add *-M memory_profiler** When you run the code
You can understand the memory footprint of each step of the function's code.
Guppy
Leaf out, is still installed by Pip first Guppy
pip install guppy
You can then use Guppy in your code to directly print out how many objects are created for each Python type (list, tuple, dict, and so on), and how much memory is being consumed.
from guppy import hpyimport sysdef my_func(): mem = hpy() with open(sys.argv[1], 'rb') as f: while True: buf = f.read(10 * 1024 * 1024) if buf: print(mem.heap()) else: break
As shown, you can see that the corresponding memory consumption data is printed out:
Both tools Guppy and Memory_profiler can be used to monitor the memory footprint of Python code as it runs.
4. Summary:
Python is a simple language, but it is because of its brevity that much more detail needs to be carefully scrutinized and thought out. I hope that everyone in the daily work and learning can also be more on some of the details of the summary, less stepping on some unnecessary "pit."