Multithreaded file I/O

Source: Internet
Author: User

Tackling the file I/O bottleneck
Stefan is founder of www.RED-SOFT-ADAIR.com, a company that rescues software development projects in a technical crisis. He can be contacted
StefanWoe@gmail.com

Over the past year or so, we 've heard a lot about multithreading, deadlocks, cache invalidation, and superlinear scalability. i, for one, have introduced multithreading for the CPU-intensive parts of my applications. during this time, however, I 've come
To realize that the biggest bottleneck often involves loading data from files. Consequently I 've built a small test suite that performs the following actions:

  • Read a single file sequentially
  • Read multiple files sequentially
  • Read from a single file at random positions
  • Read from multiple files at random positions
  • Write a single file sequentially
  • Write multiple files sequentially
  • Write to a single file at random positions
  • Write to multiple files at random positions

I repeated each of these tests with 1, 2, 4, 8, 16, and 32 threads. from each test, I recorded the total throughput in MB/s. the source code consists of a single c ++ source file and is available
Here.

I 've run all of these tests on:

  • A dual quad-core Xeon with 10 gb ram and an internal RAID 5 system consisting of four 10 k SAS disks running Windows XP x64
  • A dual Xeon with 4 gb ram and 10 k u360 SCSI disks running Windows XP
  • A Core Duo laptop with 1.5 gb ram and a 5400 rpm sata disk running Windows XP

For single file I/O, I used files of 200 MB in size. all threads in a run accessed the same single file. for multiple files I/O, each file had 20 mb. all threads were using separate files. all accesses were made with blocks of 512 bytes.

I repeated the entire test suite three times. the values I present here are the average of the three runs. the standard deviation in most cases did not exceed 10-20%. all tests have been also run three times with reboots after every run, so that no file
Was accessed from cache. these runs were slower, of course, but the influence of the number threads was similar to the results shown here (with the exception of random read multiple files; see below ). to suppress caching effects with the test, I used different
Set of files for each subtest with 1, 2..., 32 threads.

Theoretical Aspects

Before discussing the results, it is important to consider a few important I/O issues.

Read caching. All modern operating systems use all available memory to cache the most recently used files -- or more specifically the most recently read or written blocks. the raid machine with 10 gb ram has a clear advantage here, as it's
Able to cache all files used (some 6 GB in total ). the other machines cannot reuse any cache data from one to the next of the three runs as the first used file must be dropped from cache after reading and writing more file space than available Ram.

Write back caches. Operating systems and more specifically raid systems offer write caches that buffer written data before it is actually stored on the hard disks. The write command returns although the data is only in the cache. on first
Tests I encountered a slowdown on the raid System in the middle of the write tests. the reason for this was that the write cache was full, so the tests were influenced by the tests made before. I therefore introduced a pause of some seconds after each subtest
Before starting with a different number of threads. Read and Write caches explain some of the huge throughput of the raid system used.

Native command queuing. Most current drives (even my three-yearold laptop's SATA drive) support native command queuing. this technology contains asynchronous execution of read and write commands to a hard disk. the commands return in
Order where they best fit the drive's point of rotation. for instance, if four read requests (A, B, C, D) are sent to the disk, they might return in the order D, C, B,, because the drive's heads accidentally were close to the position of the data requested
With D when the commands were pinned ed. Then the nearest position was the one of C and so on.

Now consider a single-threaded application waiting to receive first A, then B, then C, and so on, compared to a application with four threads requesting a, B, c, and D simultaneously. while the multithreaded application has encoded ed A, B, C, and D all together,
The single-threaded one might just have received a after the same runtime. (For more information, see

Serial ATA: Native command queuing .) while there extends more theoretical aspects (operating system, drivers, disk fragmentation, sector, and block size, to mention a few), going deeper is beyond the scope of this article.

Sequential read

For sequential access, I read files from the first to the last byte. with multiple files, each thread read one separate file of 20 mb. for a single file access, a file of 200 MB was divided in slices of equal length and each thread read one of these slices.

Figure 1 (): Read multiple files sequentially.

Figure 1 (B): Read single files sequentially.

In short, the results show that:

  • In case of single disks performance decreases significantly with additional threads
  • Only in the case of a RAID system, up to four threads increases performance significantly

This result was surprising because I have a large application where I was able to reduce the load time for large datasets by more then 35% even on the laptop by introducing multithreading recently. the reason for this is obvious: the application did not
Only read the file. it also processed the data read, stored it into arrays and lists and so on. using multiple threads, both cores were utilized by up to 100%. in this case, additional threads did not improve the performance of file access but the overall
Performance increased, although the main task of the process was reading files.

Random read

For random read access, I positioned the file pointer at a random position somewhere within the file, then read a block 512 bytes. I did this 10,000 times for each file in the case of multiple files. in the case of a single file, the 10,000 accesses were
Divided among all threads.

Figure 2 (): Read multiple files randomly.

Figure 2 (B): Read single files randomly.

Reading multiple files random was the only case where the behavior was different after a reboot: when reading the files for the first time, all machines showed increased performance with more threads -- even the laptop completed MED best with 32 threads. the
Reason for this is that even the hard drive of the laptop supports native command queuing; this is a perfect example of this technology.

When reading a single file, two threads perform a little bit better than one; on the raid four threads are even better, but more threads decrease performance on all systems. these results did not differ strongly after a reboot.

Sequential write

Files were written from the first to the last byte for sequential access. with multiple files, each thread wrote one separate file of 20 mb. for a single file, a file of 200 MB was divided in slices of equal length and each thread did write one of these slices.
All files existed in full length before the test has been started, but their entire content was overwritten.

Figure 3 (): Write multiple files sequentially.

Figure 3 (B): Read single files sequentially.

The results for multiple files are similar to those for sequential read: performance generally decreases with multiple threads on single disks, But it increases for a raid System -- in cases of sequentially writing, up to 8 threads. in cases of writing
Single file, the results are surprising: It seems that up to 8 threads do not affect performance on single disks and do increase performance on a raid system. More than 8 threads always decreases performance.

Random write

For random write access, I positioned the file pointer at a random position within the file, then wrote a block of 512 bytes. I did this 10,000 times for each file in the case of multiple files. in the case of a single file, the 10,000 accesses were divided
Among all threads.

Figure 4 (): Write multiple files randomly.

Figure 4 (B): Read single files randomly.

The results of random write show that:

  • With multiple files, performance generally increases with more threads. for single disks, there seems to be a saturation with 2-4 threads. for the RAID system, saturation seems to be reached with 8 threads.
  • When writing a single file, two threads perform better than one. on the raid, four threads are best, but more threads decrease performance on all systems. however, this is less drastic on RAID system.
What this all means

Overall, the results show that multithreaded file I/O can both improve or decrease performance significantly. keep in mind that an application typically does not only read data, but also processes the data read in a more ore less CPU-intensive way. this
Leads to different results for every application and even tasks within a application. this also may or may not be the case for writing data. furthermore, there are very different ways in how and when files will be read or written, as well as different hardware
And software deployments that a application will meet. There is no general advice software developers can follow. For example, in one application I measured clearly that using multiple threads per sequential Read File increased performance significantly
In the 64-bit version. but with the 32-bit version more threads decreased performance on the same machine, the same operating system (Windows XP x64) and the same source code. in another case, where an application opened and appended thousands of files,
Best solution was to create 8 threads that did nothing but close files (on a average dual-core machine ).

The bottom line is:

  • Make multithreading retriable!The number of threads used in a program shoshould always be writable able from 0 (no additional threads at all) to an arbitrary number. This not only allows a customization for optimal performance, but it also
    Proves to be a good debugging tool and sometimes a lifesaver when unknown race conditions occur on client systems. I remember more than one situation where MERs were able to overcome fatal bugs by switching off multithreading. this of course does not
    Only apply to multithreaded file I/O.

Consider the following pseudo code:

int CMyThreadManger::AddThread(CThreadObj theTask){if(mUsedThreadCount >= gConfiguration.MaxThreadCount())return theTask.Execute(); // execute task in main thread// add task to thread pool and start the thread...}

Such a mechanic is not very complicated (though a little bit more work will probably be needed than shown here), but it sometimes is very valid tive. it also may be used with prebuilt threading libraries such
OpenMP or
Intel's threaded building blocks. considering the measurements shown here, its a good idea to include more than one retriable thread count (for example, one for file I/O and one for core CPU tasks ). the default might probably be 0 for file I/O and <Number
Of cores found
> For CPU tasks. But all multithreading shoshould be detachable. a more sophisticated approach might even include some code to test multithreaded performance and set the number of threads used automatically, may be evindien vidually for different
Tasks.

Conclusion

So far multithreaded file I/O is a under-researched field. although its simple to measure, there is not much common knowledge about it. the measurements I present here show that multithreading can improve performance of file access directly, as well as indirectly
By utilizing available cores to process the data read. in general, however, there are no rules of thumb. in this article, I 've tried to bring a few "hard" numbers into this area, although I think that more measurements and theoretical analysis is needed
Enable applications, that perform better on I/O.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.