[Reprint] Reading and Writing Windows files

Source: Internet
Author: User
Original article address: Workshop. I would like to share with you a little bit.     The bottleneck restricting the reading and writing speed of Windows files is actually due to the inherent characteristics of our hard disk, the speed of the disk itself and the serial chemical mechanism of the hard disk. All we can do is to improve the software implementation method to approach the maximum read/write speed of the hard disk. When copying and pasting files, we actually use the implementation of Windows itself. One of the major impacts on the speed is that they all use the File caching mechanism of windows, when you copy a large file, Windows caches a large part of the file to the system cache based on the size of the file you want to copy. At this time, you will see that the system cache is soaring, and the machine performance is greatly reduced. The overall copy speed is about 10 Mb/s. IDE
The read/write speed of 7200 RPM hard disks is generally about 30 Mb/s, which wastes a lot of hard disk read/write speed. When we read and write multiple files in parallel, the speed is slower than the speed of reading and writing multiple files in serial mode. This is because of the restrictions of the serial working mechanism of the hard disk, time spent on head swing. In addition, the hit rate of cache reading will be greatly reduced. Therefore, we should avoid using the Windows cache mechanism, and try not to read and write multi-segment files at the same time, as far as possible to read and write continuous file blocks.      In general, we operate a windows I/O handle using the Windows file read/write APIs: createfile, readfile, writefile, and so on. These APIs can not only read and write file handles, all I/O device handles can be operated through these Apis. Such as socket descriptor, serial port descriptor, and pipeline descriptor. By setting their parameters, we can choose to operate IO in different ways. For example, createfile, the prototype is as follows:

Handle createfile (
 Lptstr lpfilename,   // Pointer to the file name
 DWORD dwdesiredaccess,   // Access mode (write/read)
 DWORD dw1_mode,   // Sharing Mode
 Lpsecurity_attributes lpsecurityattributes,// Pointer to the Security Attribute
 DWORD dwcreationdisposition,  // How to create
 DWORD dwflagsandattributes,  // File attributes
 Handle htemplatefile   // Used to copy the file handle
);

The most important parameter for reading and writing speed is the dwflagsandattributes parameter. For the value of this parameter, see msdn:

 

Attributes:
This parameter can receive any combination of the following attributes, unless all other file attributes ignore file_attribute_normal.
File_attribute_archiveFiles will be archived. The program uses this attribute to mark files to be backed up or removed.

File_attribute_hiddenThe file is hidden and will not be loaded in the general folder list.

File_attribute_normalThe file is not set with any properties.

File_attribute_offlineFile data is not used immediately. Indicates that the file is being used offline.

File_attribute_readonlyThis file can only be read. The program can read the file, but cannot write or delete it.

File_attribute_systemA file is part of the system or is dedicated to the system.

File_attribute_temporaryAfter a file is used, the file system will try to maintain a piece of memory for fast access to all data (file. Temporary files should be deleted in time when the program is unavailable.

 

Flags:

Any combination of the following logos is acceptable.

File_flag_write_through

Indicates that the system writes data directly to the disk through the quick cache,

File_flag_overlapped

Indicates the system initialization object. This operation sets a reference count for the process and returns error_io_pending. after processing, the specified object is set to the signal state. when you specify file_flag_overlapped, the function for reading and writing files must specify an overlapped structure. and. when file_flag_overlapped is specified, the program must execute overlapping parameters (pointing to the overlapped structure) to read and write files. this flag can also be executed by more than one operation.

File_flag_no_buffering 

Indicates that the system does not use a fast buffer or cache. When it is combined with file_flag_overlapped, this flag gives the most
Large asynchronous operations, because I/O does not rely on the asynchronous operations of the Memory Manager. However, some I/O operations will run longer, because the data is not controlled in the cache.    

When you use file_flag_no_buffering to open a file, the program must meet the following requirements:
  

1. The Byte offset starting with file access must be an integral multiple of the slice size.
2. the number of bytes for file access must be an integral multiple of the slice size. for example, if the slice size is 512 bytes. the program can read or write 2048, 335,981, or 7171 bytes, but cannot be or bytes.

3.The address used for read and write operations must be aligned with the slice. The address alignment in the memory is a slice. the integer of the dimension. one way to align the buffer with the slice size is to use the virtualalloc function. it allocates the memory address that is aligned with the integer of the operating system memory page size. because the memory page size and fan area size-2 are their power. the inner address is also aligned with the integral multiple of the slice size. the program can call getdiskfreespace to determine the size of the slice.

 

File_flag_random_access
The specified file is randomly accessed. This flag allows the system to optimize the File Buffer.

File_flag_sequential_scan
The specified file will be continuously accessed from start to end. This flag prompts the system to optimize the File Buffer. If the program
Random Access to the file to move the file pointer, optimization may not occur; however, the correct operation can still be guaranteed
Certificate. specifying this flag can improve the performance of the program to read large files in sequential access mode. The performance improvement is obvious when many programs read large ordered files. however, there may be a small number of bytes missing.

File_flag_delete_on_close

Indicates that the system immediately deletes the file after all opened handles are closed. Not only the file with file_flag_delete_on_close specified.
File_pai_delete
If file_cmd_delete is not used, subsequent file opening requests will fail.

File_flag_backup_semantics

Windows NT: indicates that the system opens or creates a file to perform a backup or recovery operation.
If a process ignores the security options of a file, it must have a privilege. the related privileges are se_backup_name and se_restore_name. you can also use this flag to obtain a folder handle. A folder handle can be passed to some Win32 functions like a file handle.

File_flag_posix_semantics

Specifies that the file complies with POSIX standards. This is a standard in MS-DOS and 16-bit windows.

File_flag_open_reparse_point

Specify this flag to restrict the NTFS partition pointer. This flag cannot be used with creat_always.

File_flag_open_no_recall

Specifies that file data is required, but will continue to be received from remote storage. It will not store data in local storage. This flag is used by remote storage system or Hierarchical Storage Manager System.

 

   As you can see, there are many flags and attributes that can be used, but the most important influence on speed here is the file_flag_no_buffering and file_flag_overlapped of the red part.   

 

   File_flag_no_buffering indicates that the windows cache mechanism is not used for file operations. file_flag_overlapped indicates that file operations will be performed asynchronously. That is to say, if I/O operations are not completed, the read/write function will return the result. This requires the overlapping Io mechanism and you need to do different things for the IO status. Basically, getoverlappedresult and waitformultiobject are used.

   When file_flag_no_buffering is used separately, it takes about 22 seconds to copy and paste a MB file, which is close to 20 Mb/s. However, when file_flag_no_buffering is specified, the file location and cache size are displayed, the file size is limited, that is, it must be aligned with the size of the sector (see the red part ). If this is not the case, the read/write operation fails. This indeed increases a lot of memory allocation operations, but the speed increases significantly.

   When I used file_flag_overlapped to split the file into multiple parts for simultaneous read/write, I found that the speed was slow. Back to the beginning, this is the limitation of the hard disk itself. However, when I referred to the source code of fastcopy (a free file copy software), I found that it also opened multiple files for reading and writing at the same time. However, the speed is not slow. The specific cause must be studied.

   The above are all in the case of local hard disk operations, there is no network restrictions, and when I want to copy files on the server, the biggest bottleneck becomes the network. In this case, my idea is that the server's hard disk reading speed should be much higher than our machine's hard disk, so we can read files in multiple segments at the same time to get network bandwidth, when writing data, the data is written into consecutive files in serial mode. In this way, the network can be fully utilized and the read/write speed of the local hard disk can be avoided. Of course, the specific results must also be tested by the company.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.