Linux programming practices-file I/O buffer testing and simple CAT implementation

Source: Internet
Author: User

Simple cat command implementation

The CAT tool is easy to implement. The following code uses the basic open, read, printf, and close functions to basically implement the cat command function:

1 # include <stdio. h>
2 # include <unistd. h>
3 # include <fcntl. h>
4
5 # define readsize 4096
6
7 int main (int ac, char * AV []) {
8 int RfD =-1, rlen =-1, ret =-1; // file descriptor, read content length, program return value
9 char rbuf [readsize]; // read content Buffer
10 memset (rbuf, 0, readsize );
11 if (AC = 2 ){
12 if (RfD = open (av [1], o_rdonly) =-1)
13 {
14 perror ("ccat :");
15 return-1;
16}
17 while (rlen = read (RfD, rbuf, readsize)> 0 ){
18 printf ("% s", rbuf );
19 memset (rbuf, 0, readsize );
20}
21 ret = close (RFD );
22 if (ret =-1)
23 perror ("ccat :");
24}
25 return ret;
26}

It is worth noting that the definition of readsize in line 1 is 5th. Here is a reference to 'method to improve file I/O efficiency: use buffer.

Two efficiency overhead of File Operations

Transfer Data Volume and mode switch. The buffer size of each transmitted data volume directly affects the number of switching modes.

What is mode switching? Explain it.

In the figure above, the white space of read and write is "user space", and the gray space at the bottom of the large box is "kernel space ", access to any hardware device, including disks, must go through the "kernel space" layer. However, in "user mode", only access to "user space" is allowed ", to access the "kernel space", you need to switch from "user mode" to "Administrator mode". The book also provides an image of this process:

Kent needs to switch from the "user mode" to the "Administrator mode" in order to become superman. After the task is completed, he is switching back to the identity of the reporter to make money (after all, saving the Earth is also an obligation, if there are too many tasks, it would be inefficient to find a phone booth and cut them out.

The Setting principle of the buffer, such as the function of the memory between the CPU and the hard disk, is also the principle. We all know that the memory is too large, which is a waste. If it is too small, it is inefficient. Is there a just-right amount? Let's call this a critical point. Does the Linux kernel control the file I/O INTERACTION buffer?

The buffer size test in this book is 4096. the test method is to read a file of 5 MB in size and write the content to another file. The test results are shown in the table below, it seems that the test results with the buffer size above 4096 have not changed. Is this critical value true?

 

Buffer size execution time/s
1 50.29
4 12.81
16 3.28
32 0.96
......
4096 0.18
8192 0.18
16384 0.18

Modify cat

The example in the book is a copy operation. The entire process is that the kernel extracts data from the disk and transmits the data to the user space. The user writes the data to the kernel space and then writes the data to the disk. The data comes from and goes from a peripheral disk. A read/write operation is performed twice in "user mode" and "Administrator mode.

We are not superstitious about authority. We have transformed the above ccat. C code. test the effect of the buffer size setting on the validity period of mode switching. The difference is that the data comes from a disk, but the location is a display device. Two mode switching is also performed.

The transformation idea is: we only need to add two time record points in the open and close locations, and the final difference is the time of the entire process (open, read, print, close). However, from the conclusion list in the book, we can use gettimeofday (...) to get the precision of at least milliseconds (...) function, which can get the time in microseconds (10 ^ 6). The Code is as follows:

# Include <stdio. h>
# Include <string. h>
# Include <stdlib. h>
# Include <unistd. h>
# Include <fcntl. h>
# Include <sys/STAT. h>
# Include <sys/time. h>

# Define readsize 16384 // size of the file to be read at a time

Void Showtime (char *, struct timeval); // print time
Long filesize (char *); // get the file size (bytes)
Int main (int ac, char * AV []) {
Int RfD =-1, rlen =-1, ret =-1, I = 1; // file descriptor, read content length, program return value
Int nfblocks; // number of file segments
Char rbuf [readsize]; // read content Buffer
Struct timeval Times [2]; // open and close time
Long int fsize; // File Size
Float sub; // Time Difference
Memset (rbuf, 0, readsize );
If (AC = 2 ){
If (fsize = filesize (av [1]) = 0)
{
Printf ("File Size unknown \ n ");
Return ret;
}
// Calculate the number of file shards (read count)
Nfblocks = fsize/readsize;
If (fsize % readsize> 0) nfblocks ++;
If (RfD = open (av [1], o_rdonly) =-1)
{
Perror ("ccat :");
Return-1;
}
Gettimeofday (& Times [0], null); // open time
While (rlen = read (RfD, rbuf, readsize)> 0 ){
Printf ("% s", rbuf );
Memset (rbuf, 0, readsize );
I ++;
}
Ret = close (RFD); // close time
Gettimeofday (& Times [1], null );
If (ret =-1)
Perror ("ccat :");
// Calculate the time difference
Sub = (Times [1]. TV _sec-times [0]. TV _sec)
+ (Times [1]. TV _usec-times [0]. TV _usec)/1000000.0;
Printf (">>> file size: % lD, read size: % d, blocks num: % d \ n", fsize, readsize, nfblocks );
Printf (">>> open at: % LD (s): % LD (US) \ n", (long) Times [0]. TV _sec, (long) Times [0]. TV _usec );
Printf (">>> close at: % LD (s): % LD (US) \ n", (long) Times [1]. TV _sec, (long) Times [1]. TV _usec );
Printf (">>> sub = % F (s) \ n", sub );

}
Return ret;
}



Long filesize (char * filename)
{
Struct stat pstat;
If (STAT (filename, & pstat) <0)
Return 0;
Return (long) pstat. st_size;
}

 

The file size is 5292841 bytes. The following is the running result (read size is the read size, blocks num is the number of data blocks, and sub is the time used in the whole process, which is the result of our attention ):

When readsize = 256:

>>>file size:5292841,read size:256,blocks num:20676
>>>open at:1320222334(s):183692(us)
>>>close at:1320222347(s):258245(us)
>>>sub=13.074553(s)

When readsize = 512:

>>>file size:5292841,read size:512,blocks num:10338
>>>open at:1320222371(s):143244(us)
>>>close at:1320222383(s):451112(us)
>>>sub=12.307868(s)

When readsize = 1024:

>>>file size:5292841,read size:1024,blocks num:5169
>>>open at:1320222411(s):239208(us)
>>>close at:1320222422(s):288572(us)
>>>sub=11.049364(s)

When readsize = 2048:

>>>file size:5292841,read size:2048,blocks num:2585
>>>open at:1320222448(s):259733(us)
>>>close at:1320222459(s):147523(us)
>>>sub=10.887790(s)

When readsize = 4096:

>>>file size:5292841,read size:4096,blocks num:1293
>>>open at:1320222490(s):639213(us)
>>>close at:1320222501(s):419869(us)
>>>sub=10.780656(s)

When readsize = 8192:

>>>file size:5292841,read size:8192,blocks num:647
>>>open at:1320222535(s):491225(us)
>>>close at:1320222545(s):707764(us)
>>>sub=10.216539(s)

When readsize = 16384:

>>>file size:5292841,read size:16384,blocks num:324
>>>open at:1320222638(s):119740(us)
>>>close at:1320222648(s):66890(us)
>>>sub=9.947150(s)

Result Analysis

The computing efficiency of the running results is increasing progressively, and 4096 does not seem to be the bottleneck that cannot be optimized. If we continue to ignore the memory and increase the buffer size, the access time efficiency can still be improved. It should be noted that the running result is the running result when the machine is started, so I continue to do the experiment, open some software, the CPU is occupied to about 40%, and then run, the result value sub is almost twice that of the previous one. If the CPU usage fluctuates, the test result is also directly affected and it is difficult to obtain a stable result, obviously, the Linux kernel does not control the buffer size.

The test results in the book are strange. The access efficiency does not change after the buffer size is 4096 (kept at 0.18 s). The CPU usage should be stable with such stable test results, in this case, it should also show an increasing trend like my test results.

In summary, the critical quantity does not exist. That is to say, when the CPU and memory are rich enough, the higher the content exchange volume, the higher the file I/O timeliness, And the bottleneck value exists, the larger the buffer, the higher the file read/write efficiency, but also the larger the memory demand.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.