"Reprint" search for C + + fastest read file scheme

Source: Internet
Author: User
Tags fread

Original address: https://www.byvoid.com/blog/fast-readfile/

In the competition, when big data is encountered, the reading file becomes the bottleneck of the program running speed, and it needs a faster reading method. It is believed that almost all C + + learners have stumbled on the slow speed of the CIN machine, and since then vowed not to read data from CIN. Others say that the speed of Pascal's read statement is less than scanf in C/s and C + + players can only anxious. is C + + really low Pascal? The answer is self-evident. An advanced method is to read the data in a second, and then convert the string, this method is very good legend, but the specific how never tried, so today simply can think of all the way to read the data to test the side, the results are amazing.

The biggest thing about reading data in competitions is reading a whole lot of integers, so I wrote a program that generated 10 million random numbers into data.txt, altogether 55MB. Then I wrote a program to calculate the run time of the backbone, the code is as follows:

#include <ctime>int  main () {    int start = clock ();     // Do SOMETHING    printf ("%.3lf\n",double(Clock ()-start)/clocks_per_sec);}

The simplest method is to write a loop scanf, the code is as follows:

Const intMAXN =10000000;intNUMBERS[MAXN];voidScanf_read () {freopen ("Data.txt","R", stdin);  for(intI=0; i<maxn;i++) scanf ("%d",&numbers[i]);}

But what about efficiency? The test result on my Computer Linux platform was 2.01 seconds. Next is CIN, the code is as follows

Const int 10000000 ; int NUMBERS[MAXN]; void Cin_read () {    freopen ("data.txt","R"  , stdin);      for (int i=0; i<maxn;i++)         >> numbers[i];}

To my surprise, Cin took only 6.38 seconds, faster than I thought. Cin Slow is a reason, in fact, the default time, CIN and stdin always keep in sync, that is to say, these two methods can be mixed, without worrying about the file pointer confusion, and cout and stdout are the same, the two mixed will not output order confusion. Because of this compatibility feature, CIN has a lot of extra overhead, how do I disable this feature? Just one statement Std::ios::sync_with_stdio (false), so you can cancel the synchronization of CIN to stdin. The procedure is as follows:

Const intMAXN =10000000;intNUMBERS[MAXN];voidCin_read_nosync () {freopen ("Data.txt","R", stdin); Std::ios::sync_with_stdio (false);  for(intI=0; i<maxn;i++) Std::cin>>numbers[i];}

What is the efficiency after canceling the synchronization? The test run time reduced to 2.05 seconds, and the scanf efficiency is similar ! With this, you can rest assured that CIN and cout have been used.

Next, let's test the process of reading the entire file, first of all to write a string into the function of the array, the code is as follows

Const intMaxs = -*1024x768*1024x768;CharBUF[MAXS];voidAnalyse (Char*buf,intLen =maxs) {    inti; Numbers[i=0]=0;  for(Char*p=buf;*p && p-buf<len;p++)        if(*p = =' ') numbers[++i]=0; ElseNumbers[i]= Numbers[i] *Ten+ *p-'0';}

The most common way to read the entire file into a string is to use Fread, which is the following code:

Const intMAXN =10000000;Const intMaxs = -*1024x768*1024x768;intNUMBERS[MAXN];CharBUF[MAXS];voidFread_analyse () {freopen ("Data.txt","RB", stdin); intLen = Fread (buf,1, Maxs,stdin); Buf[len]=' /'; Analyse (Buf,len);}

The above code is surprisingly efficient, tested to read these 10 million numbers in only 0.29 seconds, the efficiency has increased almost 10 times times! Mastering the way is absolutely invincible, however, I remember fread is encapsulated read, if you use read directly, is not faster? The code is as follows:

Const intMAXN =10000000;Const intMaxs = -*1024x768*1024x768;intNUMBERS[MAXN];CharBUF[MAXS];voidRead_analyse () {intFD = open ("Data.txt", o_rdonly); intLen =read (FD,BUF,MAXS); Buf[len]=' /'; Analyse (Buf,len);}

The test discovery run time is still 0.29 seconds, and read does not have a special advantage. Is this the end of it? No, I can call Linux's underlying function mmap, the function is to map the file to memory, is the basic method of all read file methods to encapsulate, direct use of mmap? The code is as follows:

Const intMAXN =10000000;Const intMaxs = -*1024x768*1024x768;intNUMBERS[MAXN];CharBUF[MAXS];voidMmap_analyse () {intFD = open ("Data.txt", o_rdonly); intLen = Lseek (FD,0, Seek_end); Char*mbuf = (Char*) mmap (NULL,LEN,PROT_READ,MAP_PRIVATE,FD,0); Analyse (Mbuf,len);}

After testing, the operating time was reduced to 0.25 seconds, and the efficiency continued to increase by 14%. So far I have no better way to continue to improve the speed of reading files. How fast is Pascal going to be measured back? The result made the people surprised, actually run for 2.16 seconds of more. The procedure is as follows:

ConstMAXN=10000000;varnumbers:Array[0.. MAXN] ofLongint; I:longint;beginassign (input,'Data.txt');    Reset (input);  fori:=0  toMaxn Doread (numbers[i]);End.

To ensure accuracy, I switched to the Windows platform and tested it. The result is the following table:

Method/platform/time (seconds) Linux GCC Windows MinGW Windows VC2008
scanf 2.010 3.704 3.425
Cin 6.380 64.003 19.208
CIN Cancel Synchronization 2.050 6.004 19.616
Fread 0.290 0.241 0.304
Read 0.290 0.398 Not supported
Mmap 0.250 Not supported Not supported
Pascal Read 2.160 4.668

A few questions can be seen from above.

    1. Running programs on Linux platforms is generally faster than on Windows.
    2. Programs compiled under Windows VC generally run faster than MinGW (MINimal Gcc for Windows).
    3. The VC is not sensitive to CIN canceling synchronization or not, and the efficiency is the same. In turn, the MinGW is very sensitive , 8 times times the efficiency difference.
    4. Read is a Linux system function, MinGW may have some kind of emulation, and read is slower than fread.
    5. Pascal program running speed is really not flattering.

I hope this article can inspire you, and welcome to discuss with me.

Byvoid Original Reprint Please specify

"Reprint" search for C + + fastest read file scheme

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.