Python read Big Data txt

Source: Internet
Author: User
If you call the Read () method directly on a large file object, it causes unpredictable memory consumption. A good approach is to use fixed-length buffers to continuously read the contents of the file. That is through yield.

When using Python to read a two multi-g txt text, naïve direct use of the ReadLines method, the result of a running memory will be collapsed.

Fortunately colleagues to the next, with yield method, tested under no pressure. The reason for this is that the readlines is to put all the text content in memory, and yield is similar to the generator.

The code is as follows:

def open_txt (file_name):  with open (file_name, ' r+ ') as F: When    True: line      = F.readline ()      if not line:< C4/>return      yield Line.strip ()

Invoke instance:

For text in Open_txt (' Aa.txt '):  print Text

Example two:

The target TXT file is about 6G, want to take out the previous 1000 data saved in a new TXT file to do the rest of the operation, although do not know whether it is necessary, but the small amount of data to test it first. Refer to this post: I want to save a list to a TXT document, how to save, I wrote a simple applet.
====================================================

Import Datetimeimport Picklestart = Datetime.datetime.now () print "start--%s"% (start) FileHandle = open (' train.txt ') fil E2 = open (' S_train.txt ', ' w ') i = 1while (i < 10000):  a = Filehandle.readline ()  file2.write (". Join (a))   I = i + 1filehandle.close () file2.close () print "done--%s"% (Datetime.datetime.now ()-start) if __name__ = = ' __main__ ': 
  
   pass
  

====================================================
Pickle This library everyone said a lot, the official website to see, the back can be a good study.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.