Abstract: This article provides a convenient and practical solution for reading and storing large files. Program Code The specific implementation process is introduced.
Introduction
File operations are one of the most basic functions of applications. Both Win32 APIs and MFC provide functions and classes that support file processing. Commonly Used functions include createfile () and writefile () of Win32 APIs () readfile () and cfile class provided by MFC. In general, these functions can meet the requirements of most scenarios, but for some special application fields, the massive storage needs may be dozens of GB, hundreds of GB, or even several TB, it is obviously not feasible to use the common file processing method. Currently, the operations on such large files are generally handled in the memory ing file mode. This article will discuss this core windows programming technology.
Memory ing File
The memory ing file is similar to the virtual memory. You can use the memory ing file to reserve a region of the address space and submit the physical memory to this region, the physical memory for memory file ing only comes from a file that already exists on the disk, rather than a system page file. Before operating the file, you must map the file first, it is like loading the entire file from the disk to the memory. It can be seen that, when using a memory ing file to process files stored on the disk, you do not have to execute/O operations, which means that when processing files, you do not have to apply for and allocate cache for the files. All File Cache operations are directly managed by the system, the process of loading file data to the memory, writing back data from the memory to the file, and releasing memory blocks is canceled, this makes memory ing files play an important role in processing large data volumes. In addition, the system in the actual project usually needs to share data among multiple processes. If the data volume is small, the processing method is flexible and changeable. If the shared data capacity is large, therefore, we need to use the memory ing file. In fact, the memory ing file is the most effective solution to data sharing between multiple local processes.
The memory ing file is not a simple file I/O operation, the core Programming Technology of Windows is actually used--Memory Management. Therefore, if you want to have a deeper understanding of memory ing files, you must have a clear understanding of the memory management mechanism of the Windows operating system. The knowledge of memory management is very complex, beyond the scope of this article, I will not go into details here. Interested readers can refer to other related books. The following describes how to use memory ing files:
First, you must use the createfile () function to create or open a file kernel object, which identifies the file to be used as the memory ing file on the disk. After createfile () is used to advertise the location of the file image in the physical storage to the operating system, only the path of the image file is specified, and the image length is not specified yet. To specify the size of the physical storage space required by the file ing object, you must use the createfilemapping () function to create a file ing kernel object to inform the System of the file size and file access method. After creating a file ing object, you must retain an address space area for the file data and submit the file data as a physical storage mapped to the region. The mapviewoffile () function maps all or part of the object mapped to the process address space through system management. In this case, the use and processing of memory ing files are basically the same as that of file data normally loaded into the memory. When the memory ing file is used, you also need to perform a series of operations to clear and release resources that have been used. This part is relatively simple. You can use unmapviewoffile () to cancel the image of file data from the address space of the process, and use closehandle () to close the previously created file ing object and file object.
Memory ing file functions
When using memory ing files, the APIs used are mainly the functions mentioned above. The following describes them respectively:
Handle createfile (lpctstr lpfilename,
DWORD dwdesiredaccess,
DWORD dw1_mode,
Lpsecurity_attributes lpsecurityattributes,
DWORD dwcreationdisposition,
DWORD dwflagsandattributes,
Handle htemplatefile );
The createfile () function is often used to create and open files even during normal file operations. This function is used to create a memory ing file./Open a file Kernel Object and return its handle. When calling this function, you need to set the dwdesiredaccess and dwredmode parameters based on whether data reading/writing and file sharing are required, incorrect Parameter settings may cause operation failures.
Handle createfilemapping (handle hfile,
Lpsecurity_attributes lpfilemappingattributes,
DWORD flprotect,
DWORD dwmaximumsizehigh,
DWORD dwmaximumsizelow,
Lptstr lpname );
The createfilemapping () function creates a file ing Kernel Object and specifies the file handle to be mapped to the process address space through the hfile parameter (this handle is obtained by the return value of the createfile () function ). Because the physical memory of the memory ing file is actually stored in a file on the disk, rather than the memory allocated from the page file of the system, the system will not take the initiative to reserve the address space area for it, the bucket of the file is not automatically mapped to this region. To enable the system to determine the protection attribute of the page, you need to set the parameter flprotect, protection attributes page_readonly, page_readwrite, and page_writecopy indicate that after the file ing object is mapped, the file data can be read and written. When page_readonly is used, make sure that the generic_read parameter is used for createfile (); If page_readwrite is used, createfile () must use generic_read.|The generic_write parameter. For the property page_writecopy, you only need to ensure that createfile () uses either generic_read or generic_write. DWORD parameters dwmaximumsizehigh and dwmaximumsizelow are also very important. The maximum number of bytes of the file is specified. Because these two parameters are 64-bit, the maximum file length supported is 16eb, it can meet almost all the requirements for processing large data files.
Lpvoid mapviewoffile (handle hfilemappingobject,
DWORD dwdesiredaccess,
DWORD dwfileoffsethigh,
DWORD dwfileoffsetlow,
DWORD dwnumberofbytestomap );
The mapviewoffile () function maps file data to the address space of the process. The hfilemappingobject parameter is the file image object handle returned by createfilemapping. The dwdesiredaccess parameter specifies the access method to the file data again, and must also match the protection attribute set by the createfilemapping () function. Although repeated setting of protection attributes seems redundant, the application can effectively control data protection attributes. The mapviewoffile () function allows all or part of the ing files. During ing, you must specify the offset address of the data file and the length to be mapped. The file offset address is specified by a 64-bit value consisting of DWORD parameters dwfileoffsethigh and dwfileoffsetlow, and must be an integer multiple of the distribution granularity of the operating system. For Windows operating systems, the allocation granularity is fixed to 64 KB. Of course, you can also use the following code to dynamically obtain the distribution granularity of the current operating system:
System_info sinf;
Getsysteminfo (&Sinf );
DWORD dwallocationgranularity=Sinf. dwallocationgranularity;
The dwnumberofbytestomap parameter specifies the ing length of the data file. Note that for Windows 9x operating systems, if mapviewoffile () cannot find a large enough area to store the entire file ing object, the return value is null. However, in Windows 2000, mapviewoffile () only needs to find a large area for the necessary view, regardless of the size of the entire file ing object.
After processing the file mapped to the process address space area, you need to release the file data image using the unmapviewoffile () function. The prototype declaration of this function is as follows:
Bool unmapviewoffile (lpcvoid lpbaseaddress );
The unique parameter lpbaseaddress specifies the base address of the returned region and must be set to the return value of mapviewoffile. After mapviewoffile () is used, a corresponding unmapviewoffile () call is required. Otherwise, the reserved region cannot be released before the process ends. In addition, the file Kernel Object and file ing kernel object have been created by the createfile () and createfilemapping () functions. It is necessary to release the object through closehandle () before the process ends, otherwise, resource leakage may occur.
In addition to the required API functions, You must select other auxiliary functions as needed when using memory ing files. For example, when using memory ing files, the system caches the data pages of files at a high speed to improve the speed, and does not immediately update the disk image of the files when processing the file ing view. To solve this problem, you can use the flushviewoffile () function, which forces the system to re-write some or all of the modified data to the disk image, this ensures that all data updates can be saved to the disk in a timely manner.
Example of processing large files with memory ing files
The following describes how to use memory ing files based on a specific instance. The instance receives data from the port and stores the data on the disk in real time. Due to the large amount of data (dozens of GB), the memory ing file is used for processing. The following shows some of the main code in mainproc, Which is started when the thread is running. When data arrives on the port, the event hevent [ 0 ], waitformultipleobjects () the function will save the received data to the disk after the event occurs. If it stops receiving the event, the hevent [ 1 ]. The event processing process is responsible for releasing resources and closing files. The following describes the implementation process of the thread processing function:
......
// create a file kernel object, the handle is stored in hfile
handle hfile = createfile ( " recv1.zip " ,
generic_write | generic_read,
file_pai_read,
null,
create_always,
file_flag_sequential_scan,
null);
//Create a file ing Kernel Object and save the handle to hfilemapping
Handle hfilemapping=Createfilemapping (hfile, null, page_readwrite,
0,Zero X 4000000, Null );
//Release file kernel objects
Closehandle (hfile );
// Set parameters such as size and offset
_ Int64 qwfilesize = Zero X 4000000 ;
_ Int64 qwfileoffset = 0 ;
_ Int64 t = 600 * Sinf. dwallocationgranularity;
DWORD dwbytesinblock = 1000 * Sinf. dwallocationgranularity;
// Map File data to the address space of the process
Pbyte pbfile = (Pbyte) mapviewoffile (hfilemapping,
File_map_all_access,
(DWORD) (qwfileoffset > 32 ), (DWORD) (qwfileoffset & 0 xffffffff ), Dwbytesinblock );
While (Bloop)
{
// Capture event hevent [0] and event hevent [1]
DWORD RET = Waitformultipleobjects ( 2 , Hevent, false, infinite );
RET -= Wait_object_0;
Switch (RET)
{
// Receive data event trigger
Case 0 :
// Receive data from the port and save it to the memory ing File
Nreadlen = Syio_read (Port [ 1 ], Pbfile + Qwfileoffset, queuelen );
Qwfileoffset + = Nreadlen;
// When data is fully written at 60%, a new ing view needs to be created later to prevent data overflow.
If (Qwfileoffset > T)
{
T = Qwfileoffset + 600 * Sinf. dwallocationgranularity;
Unmapviewoffile (pbfile );
Pbfile = (Pbyte) mapviewoffile (hfilemapping,
File_map_all_access,
(DWORD) (qwfileoffset > 32 ), (DWORD) (qwfileoffset & 0 xffffffff ), Dwbytesinblock );
}
Break ;
//Trigger the termination event
Case 1:
Bloop=False;
//Detaches a file data image from the address space of a process
Unmapviewoffile (pbfile );
//Close the file ing object
Closehandle (hfilemapping );
Break;
}
}
...
During the process of terminating event triggering, if only the unmapviewoffile () and closehandle () functions are simply executed, the actual file size cannot be correctly identified, that is, if the opened memory ing file is 30 GB, the received data is only 14 GB, so after the above program is executed, the stored file length is still 30 GB. That is to say, after processing, the file must be restored to the actual size through the memory ing file again. The following is the main code to achieve this requirement:
// Create another file Kernel Object
hfile2 = createfile ( " recv.zip " ,
generic_write | generic_read,
file_pai_read,
null,
create_always,
file_flag_sequential_scan,
null);
//Create another file ing Kernel Object with actual Data Length
Hfilemapping2=Createfilemapping (hfile2,
Null,
Page_readwrite,
0,
(DWORD) (qwfileoffset&0 xffffffff),
Null );
//Disable file kernel objects
Closehandle (hfile2 );
//Map File data to the address space of the process
Pbfile2=(Pbyte) mapviewoffile (hfilemapping2,
File_map_all_access,
0,0, Qwfileoffset );
//Copy data from the original memory ing file to this memory ing File
Memcpy (pbfile2, pbfile, qwfileoffset );
File://Detaches a file data image from the address space of a process
Unmapviewoffile (pbfile );
Unmapviewoffile (pbfile2 );
//Close the file ing object
Closehandle (hfilemapping );
Closehandle (hfilemapping2 );
//Delete temporary files
Deletefile ("Recv1.zip");
Conclusion
in actual tests, memory ing files have good performance in processing large data volumes, compared with the cfile class and readfile () and writefile () such functions have obvious advantages in file processing methods. In Windows 98, the code described in this article is calculated by Microsoft Visual C + 6 . 0 is compiled.