How can I improve the speed of reading and writing files?

Source: Internet
Author: User
Tags readfile

Excerpted a post from a location. The URL is missing.

 

When extracting game resources, I program to use ifstream and ofstream in C ++. However, the speed of reading and writing files is very slow, and the hard drive is crazy, does anyone know how to increase the speed? Thank you!

Yangdi
Memory ing may be better.

 

Asakura zookeeper
How can I try using Win32 API directly?
Createfilea, readfile, writefile

Remember that when createfilea is used, the dwflagsandattributes parameter must at least file_flag_random_access + file_flag_sequential_scan, which is much faster.

 

Redna Xela
Hard Disk conversion is normal. degree problem. when reading and writing a large amount of data, it must be crazy, but the actual number of reads and writes may not be the same. simply put, if you can perform more work with a small number of reads and writes, the hard disk load will be reduced and the program speed will be accelerated. 4 K data can be written at a time much faster than 1 byte of 4 K data. in contrast, reducing the number of hard disk reads and writes means consuming more memory as a buffer.

Legend has it that the stream implementation in the C ++ standard library... this is a bottleneck in high-intensity read/write operations. ACM questions involving a large number of file operations always prompt not to use istream/ostream/iostream/ifstream/ofstream/fstream in the C ++ standard library.
Quote:

In general the stream classes shoshould be pretty efficient, but performance can be improved further in applications in which I/O is performance critical.

Low efficiency is not a standard library error... it was designed to implement good performance. however, in general use, there are many features that you may not be able to use at all, such as some formatting operations. it slows down. in addition, stream buffering may also cause efficiency problems. As for the I/O operation, a lot of objects are created... that one. to "Repair" these problems, you can write basic_streambuf <> specific features. or directly using stream buffer will be a little faster:

Write directly like this
Copy code

# Include <iostream>

Int main ()
{
// Copy all standard input to standard output
STD: cout <STD: cin. rdbuf ();
}

And directly read
Copy code

# Include <iostream>

Int main ()
{
// Copy all standard input to standard output
STD: CIN> STD: cout. rdbuf ();
}

CIN and cout are the istream and ostream instances connected to stdin and stdout. The same applies to ifstream and ofstream instances.

But... stream in the C ++ standard library may be more fundamental. on Windows, you can directly use Win32 APIs and manage the buffer by yourself. using mapping is also a good solution. the boost Library also supports mapping _ file.

 

Joyful
Quote:

Reference the long-standing dash on the fifth floor at, September 15, 1st:
Memory ing may be better.

Memory ing? What should I do?

 

Yangdi
Quote:

Reference:

Memory ing? What should I do?

Use createfilemapping ()
For details, refer to msdn

 

Asakura zookeeper
Createfilemapping () does not significantly improve the efficiency in a single thread, because the actual speed of reading and writing data to disks is almost the same.

The best use of this API is to perform multi-thread asynchronous operations on a small file. For example, to re-arrange the data of a file, the efficiency improvement is obvious.

In other aspects, operations on a file are more convenient than operations on the memory, but cannot change the actual content of the physical file. For example, you can use complicated algorithms to encrypt and decrypt files, because the operations on the handle generated by this function do not immediately write the file

Quote:

Reference: redna Xela, 3rd Floor, published at, September 15:
Hard Disk conversion is normal. degree problem. when reading and writing a large amount of data, it must be crazy, but the actual number of reads and writes may not be the same. simply put, if you can perform more work with a small number of reads and writes, the hard disk load will be reduced and the program speed will be accelerated. 4 K data can be written at a time much faster than 1 byte of 4 K data. in contrast, reducing the number of hard disk reads and writes means consuming more memory as a buffer.
.......

FX, you have a small question.
Although in terms of hard drive life, the fewer write times, the better, but not for efficiency
You can find that a large file is not written as quickly as a few times.
The optimal size of data written each time depends on the size of the hard disk cache.
If you are writing a general program to be used on someone else's machine, consider the uncertain cache.
At present, the hard disk is usually cached at 4 MB. Considering other programs are using the cache (for example, QQ is the disk cache robber), it is better to set the data block size to about 1 MB.
In addition, you need to add a command to wait for the system immediately after writing the function to the disk, so that the program will continue after the hard disk completes the write process, which is very important in improving efficiency.

Then we can find that writing a large file is faster than writing a file at a time, and it won't make the system card. (Of course, if you have any algorithm that is occupying the CPU, it won't count ...... OTL)

 

Joyful
Will the fscanf and fprintf of C be better?

 

Redna Xela
Quote:

Reference:
You can find that a large file is not written as quickly as a few times.
The optimal size of data written each time depends on the size of the hard disk cache.
If you are writing a general program to be used on someone else's machine, consider the uncertain cache.
At present, the hard disk is usually cached at 4 MB. Considering other programs are using the cache (for example, QQ is the disk cache robber), it is better to set the data block size to about 1 MB.

Well, I learned ~ I usually write a K cache, and the speed is acceptable (compared with = _ =), so I have never thought about the relationship carefully...

Quote:

Reference:
Will the fscanf and fpritf using C be better?

The file series of C is at least faster (probably false) than the stream in the current standard library of C ++ ). file series: fopen/fread/fgetc/fgets/fscanf/fseek/fwrite/fputc/fputs/fprintf/fclose. In fact, there is a system-controlled cache at the underlying layer. size... I don't know. however, at least it does not need to create many object instances like stream, and the efficiency should be relatively higher. scanf class functions should pay attention to the leakage issue...

 

Yangdi
File maintains a cache mechanism.
File uses the default cache value as the IO cache (4 K), or you can use setbuf to set the cache size.

Assume that fread 1 byte causes readfile 4 K, and fread then copies the data to be read to the specified buffer. In the future, as long as the access does not pass this boundary, it will always be read from the IO cache, and fwrite will also be called until the boundary of the IO cache is exceeded. You can call flush to actively force the refresh to call writefile or fclose to passively refresh and call writefile (fclose will block at this time ).

Besides, the hard disk cache is managed and used by the hard disk controller, just as the processor cache cannot be used to directly write data to the hard disk, it will first write data to the cache and then write data to the disk. there is no optimization in the process, that is, the order in which the hard drive writes data to the disk
In fact, the hard disk is a random access device. It doesn't matter which one to write first. Therefore, after converting the I/O access at the application layer into the underlying I/O requests, the kernel layer will optimize the sorting of I/O requests.

Assume that there are currently 10 request kernel layers hanging on an I/o queue that calculate the physical location of each request in advance and then sort it to ensure that the head rotates for one week, try to make multiple of the 10 requests complete within one week, imagine the best case. 10 requests are all on one disk. The head is rotated for one week and 10 requests are all completed. The worst case is to be switched to 10 weeks and 10 weeks. The reason is that only one head can be operated at a time. however, 10 requests may unfortunately be on 10 disks (at this time, the kernel will not be sorted again)

Therefore, it is best to keep your I/O operations in a continuous disk space as much as possible and not physically cross the disk. For this reason, you may need accurate parameters of the hard disk and exact calculation.

The advantage of caching is offset by high-intensity I/O operations because the hard disk write speed cannot keep up with the processor's request cache. The larger the cache, the longer the buffer time. When the cache fills up the ready signal on the hardware an I/O queue that can only suspend the kernel can no longer be written for an invalid hard drive. At this time, the upper layer continuously requests the kernel layer, either continuing to mount the request to the IO Request queue or blocking the process that initiates the IO until the cache has the hardware enables the ready signal driver to extract IO requests from the inland Io Request queue and fill in the cache again .... That is to say, the advantage of cache is only at the beginning of the cache time. This advantage is especially good for small IO requests because it will not be blocked or suspended until the cache is filled up.

In fact, the above mentioned software is very limited and also very tired. Why orz ....

 

Ravenex
I would like to ask you a lot, not to consider data reading and writing on the hard disk, but to consider operations in the memory, the same size of data blocks, in smaller units, such as 1 byte, or in a large unit, such as 16 bytes for traversal, what is the difference in speed? Assume that a data block is fixed or exclusive to bytes, which can be done in units such as (1) unsigned char, (2) unsigned int, (3) unsigned long, etc, at the same time, if the general register on the machine is 32-bit, then (2) is certainly faster than (1), because the number of cycles is less and the machine code length is almost the same, but (2) what is the difference from (3?

I also want to ask, are there many hard disks in external boxes and cache on the boxes? How big is it? Does it affect the read/write performance of the hard disk?

 

Yangdi
First of all, if the unsigned long type is not represented by 64-bit long data type on GCC in VC, if the 64-bit data type is used on 32-bit machines, the result is that the compiler accesses 64-bit long data twice.

Cahce problems I have explained above for High-pressure I/O operation cache size is not so obvious, but for lightweight and medium-volume access to the larger the cache, blocking hard drive driver access to the hard disk the fewer opportunities, the faster it is.

All access to the disk on the software is eventually the access to the hard disk device by the hard disk driver.
If the hard disk is written by a slow device to him without a cache, the driver is allowed to access the hard disk only after it is accessed, however, it is time-consuming to wait for it to complete the IO operation. Therefore, the cache is introduced. Each time the driver writes data to the hard disk cache, because cahce has a certain capacity, you do not have to wait for the hard disk to complete the IO operation to continue writing. on the other hand, the hard disk is retrieved from the cache and then I/O operations are performed again and again, so the cache is like a FIFO cache. The larger the capacity, the more the driver writes, the less likely it will be blocked. it's just a matter of expediency. If there are no requests before the cache is filled up, the 8 m cache will be able to cache more data than the 2 m cache, which is naturally faster, but if your Io request data volume is very large 8 m or 2 M will always be quickly filled up. Then there will be no difference between the two. Therefore, the effect of the cache size on performance will only be particularly effective in light weight and medium volume.

 

Yangdi
In addition, the actual performance is also related to many factors. For example, the read/write functions such as fread and fwrite implement their own Io buffering at the user layer. You can actually read 1 block. (different from the block meaning of a file system) you write 1 byte. It only writes data to the memory I/O buffer instead of actually writing data.
In addition, the file system is used because the basic unit of the file system is not bytes, but the block size. Each file system has a different size.
For example, if you use readfile to directly read 1 byte, the file system may read 1 block of data, and the file system itself has a cache.
Next is the general Io cache provided by the kernel. This cache has nothing to do with the specific file system. As long as there is access to Io, the data will be cached in the Self-maintained page-based Cache, and finally the hardware level. cache the file system and general cache of the kernel. The specific implementation methods of each operating system may be different.

 

Ravenex
So using a 64-bit data type on a 32-bit machine is not faster than using a 32-bit data type to execute a fixed loop operation, right? I still don't understand small ones. I have to practice more.

In most game packets, a single file won't reach 4 MB, right? It makes sense to study the best usage experience of cache. Otherwise, data structures such as tree B will not focus on hard disk read/write features.

 

Yangdi
Quote:

Reference:
So using a 64-bit data type on a 32-bit machine is not faster than using a 32-bit data type to execute a fixed loop operation, right? I still don't understand small ones. I have to practice more.

In most game packets, a single file won't reach 4 MB, right? It makes sense to study the best usage experience of cache. Otherwise, data structures such as tree B will not focus on hard disk read/write features.

Not actually, it will be faster because you virtually put 2 4-byte access in a loop. This optimization is called loop expansion. This optimization is usually not done by the compiler. It is artificial optimization. Data Display is expanded. 16- 32 (that is, within a loop, processing-bytes each time, 4 bytes each time) can improve performance. (The command line is fully utilized)

I think it is of little significance to study the cache of hard disks. Because there are several layers of data written from your program to the actual hard disk, and you can't just use the hard disk as an object for your own use, the optimization is not research on hard disk cache optimization unless you cross all the software layers to deal with the driver directly (the database usually does this)

 

Ravenex
It is worth studying whether the buffer size managed by the program you write is clear.
The cache of the hard disk itself, as well as the DMA or something, there is a cache on each layer, so far it cannot be managed. Writing only applications like small ones does not need to care about the underlying implementation.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.