Compare the four methods for reading files using Python,

Source: Internet
Author: User

Compare the four methods for reading files using Python,

Preface

Everyone knows that Python supports a variety of file reading methods, but different reading methods have different effects when you need to read a large file. Next let's take a look at the detailed introduction.

Scenario

Read a large 2.9 GB file row by row

  • CPU i7 6820HQ
  • RAM 32G

Method

Split the string once for reading each row

The following methods use... As method to open the file.

The with statement is applicable to resource access. Make sure that the necessary "clean" operation is performed to release resources no matter whether exceptions occur during use, for example, the file is automatically closed after use, and the lock in the thread is automatically obtained and released.

Method 1: The most common way to read files

with open(file, 'r') as fh: for line in fh.readlines(): line.split("|")

Running result: it takes 15.4346568584 seconds

The system monitor shows that the memory suddenly exceeded from 4.8 GB to 8.4 GB, and fh. readlines () stores all the data read in the memory. This method is suitable for small files.

Method 2

with open(file, 'r') as fh: line = fh.readline() while line: line.split("|")

Running result: it takes 22.3531990051 seconds

The memory is almost unchanged because only one row of data is accessed in the memory, but the time is significantly longer than the previous time, which is inefficient for further data processing.

Method 3

with open(file) as fh: for line in fh: line.split("|")

Running result: it takes 13.9956979752 seconds

The memory is almost unchanged, and the speed is faster than method 2.

For line in fh treats file object fh as iteratable, and it automatically uses buffered IO and memory management, so you don't have to worry about large files. This is very pythonic!

Method 4: fileinput Module

for line in fileinput.input(file): line.split("|")

Running result: it takes 26.1103110313 seconds

The memory is increased by 200-300 MB, and the speed is the slowest.

Summary

The above method is for reference only. It is recognized that the method for reading large files is the best of three. However, the specific situation depends on the performance of the machine and the complexity of data processing.

Well, the above is all the content of this article. I hope the content of this article will help you in your study or work. If you have any questions, you can leave a message, thank you for your support.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.