Although the process of creating a Python iterator is powerful, it is often inconvenient to use. In Python, the mechanism of calculating while looping is called a generator.
First explain the scene.
There is a file about 500G, and there is only one line, there is a separator between the lines, we need to read the data in the file line by line,
Then write it to the database.
Some small partners signed up and said that we can use open to get the line, and then use the for loop.
Look at me
with open ("file") as f:
for i in f.readlines():
print i
Because it has only one line, you will read all the data out of this way, no one can afford 500G memory
, There is no way to do it.
Note that there is a separator between the lines of this sentence, this is our entry point.
First explain a function file.read()
1. This read function does not read all at once, you can pass int parameter, which represents the number of characters read
2. Continuous call, you can read the offset value.
With this, our problem will be solved.
Examples are as follows:
file_phth="C:/Users/PycharmProjects/test1/test.txt"
with open(file_phth,"r") as f:
a=f.read(20)
b=f.read(20)
print(a,b)
Print results:
Ten ,wang
i lov e you, tu ran hao
If there is this function, we can read big data. Look at the example I wrote below:
file_phth="C:/Users//PycharmProjects/test1/test.txt"
def Myread(f,newline):
bug="" #Temporarily store the read data
while True:
while newline in bug: #Determine whether the separator is temporarily storing data
pos=bug.index(newline) #Use the index method and return the index of the separator
yield bug[:pos] #take the value before the separator and save it in the generator
bug=bug[pos+len(newline):] # Also update the bug after taking the value, delete the previous value plus the separator
chunk=f.read(200) #200 characters at a time
if not chunk: #If you can't get the value, use this to end the loop
yield bug #The value after the last separator is also saved in the generator
break
bug=bug+chunk #The value after the last separator in the 200 character plus the chunk value obtained again
with open(file_phth,"r") as f:
for i in Myread(f, newline="{|}"):
print(i)
Let me explain the workflow first. This is a classic example and works perfectly.
The while loop refers to the value after the last separator in the 200 characters obtained, plus the chunk value obtained again.
Until all the values before the separators are taken.
The purpose of the if statement is:
When the chunk can't get the value, that is, the boundary of the file content, you must end the loop and yield the value after the last separator again.
The last for loop traverses the value of the generator, and the obtained value can be directly inserted into the database.
Idea: If you encounter large files, you can't directly put them in memory, you need to read them in sections to reduce memory usage
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.