On the performance of c ++ reading files from fread and mmap

Source: Internet
Author: User

Reading a file may become a speed bottleneck during large-scale data processing. Whether your CPU has 4 or 8 cores, the clock speed is 2 GB or 3 GB, and the hard disk IO speed is always limited. In my recent experience, it took 34.8 seconds to process a text of 11 GB, of which 30.2 seconds was used for access IO, accounting for about 87% of the total time.
Although there is an upper limit on hard disk I/O, can all the functions provided by C ++ allow us to reach this upper limit? To get the truth, I used the fread function to read the 11g text. In linux, I used iostat to check the access speed of the hard disk and found that the read speed was about 380 M/s. Then we tested the access speed of the read text using the dd command, and found that the speed can reach 460 Mb/s. It can be seen that fread access by a single thread does not reach the hard disk read limit. The first consideration is whether fread has some fixed overhead when accessing the hard disk. using multithreading can achieve the effect of sequential access IO and improve the efficiency of reading text, the results show that multithreading only has a read rate of 380 Mb/s.
Why is fread less efficient? Read some information to learn how to access the hard disk in fread/fwrite mode. You must specify the number of data to be read to the kernel, and then copy the obtained content from the kernel buffer pool to the user space; writing also requires a process like this. In this way, when I/O is accessed, the buffer of such a kernel is often used, resulting in speed restrictions. One solution is mmap. Mmap directly maps a part of the file to the user space. You can directly read and write this part to the kernel buffer pool, in this way, the back-and-forth copying of the kernel and user space is usually faster.
To illustrate this problem from the data, I quoted a netizen's conclusion, hoping to inspire everyone.

Method/platform/time (SEC) Linux gcc Windows mingw Windows VC2008
Scanf 2.010 3.704 3.425
Cin 6.380 64.003 19.208
Cin cancel synchronization 2.050 6.004 19.616
Fread 0.290 0.241 0.304
Read 0.290 0.398 not supported
Not supported by mmap 0.250
Pascal read 2.160 4.668

Author: jiang1st2010

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.