Python: Large file iterator, GB level, read (), ReadLine (), ReadLines (), and syntax sugar with open () __python

Source: Internet
Author: User
Tags readline

When I recently processed a text document (about 2GB of file size), memoryerror errors and slow file reads, we found two methods for faster large file Reading, and this article describes both of these reading methods.

Preliminary

When we talk about "text processing", we usually refer to what we are dealing with. Python is easy to read the contents of a text file into a string variable that can be manipulated. The file object provides three read methods:. Read (),. ReadLine (), and. ReadLines (). Each method can accept a variable to limit the amount of data read at a time, but they usually do not use a variable. read () reads the entire file at a time, and is typically used to place the contents of the file in a string variable. However, the. Read () generates the most direct string representation of a file's content, but it is unnecessary for sequential row-oriented processing, and is not possible if the file is larger than available memory. The following is an example of the Read () method:

Try:
f = open ('/path/to/file ', ' R ')
print f.read ()
finally:
if f:
F.close ()

If the file is small, read () is the most convenient read (), and if you cannot determine the file size, call the read (size) to compare insurance; if it is a configuration file, it is most convenient to call ReadLines ():

For line in F.readlines ():
process (line) # <do something with line>

Read in Chunks

Processing large files is very easy to think of is to split large files into a number of small file processing, processing each small file after the release of that part of memory. ITER & yield are used here:

def read_in_chunks (FilePath, chunk_size=1024*1024): "" "
Lazy function (generator) to read a file piece by piece.< C2/>default Chunk size:1m
can set your own chunk size
"" "
file_object = open (FilePath) while
True:
  chunk_data = File_object.read (chunk_size)
if not chunk_data:
break
yield chunk_data
if __name__ = = "__main__":
filePath = './path/filename ' for
Chunk in Read_in_chunks (FilePath):
process (chunk) # < Do something with chunk>

Using with open ()

The With statement opens and closes the file, including throwing an inner block exception. The for-line in F file object F is considered an iterator that automatically uses buffer IO and memory management, so you don't have to worry about large files.

#If the "based" with
open (...) as F: For line in
F:
process (line) # <do something with LINE&G T

Conclusion

When you use Python for large file reads, you should let the system handle it, and in the simplest way, give it to the interpreter, just manage your work.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.