---- The author's program has some problems, and the program in the article has been modified correctly ---
Use memory ing files in VC to process large files
Abstract: This article uses memory ing files to access large-size files. It also gives a detailed introduction to the concepts of memory ing files and general programming processes.
Keywords: Memory ing file; large file processing; allocation granularity
Introduction
File operations are one of the most basic functions of applications. Both Win32 APIs and MFC provide functions and classes that support file processing. Commonly Used functions include createfile () and writefile () of Win32 APIs () readfile () and cfile class provided by MFC. In general, these functions can meet the requirements of most scenarios, but for some special application fields, the massive storage needs may be dozens of GB, hundreds of GB, or even several TB, it is obviously not feasible to use the common file processing method. Currently, the operations on such large files are generally handled in the memory ing file mode. This article will discuss this core windows programming technology.
Memory ing file Overview
Memory file ing is also a memory management method for windows. It provides a unified memory management feature that allows applications to access files on disks through memory pointers, the process is like accessing the memory of the loaded file. By using file ing, You can map all or part of the disk file to a region in the virtual address space of the process and directly access the mapped file, you do not need to perform file I/O operations or buffer the file content. Memory file ing is very suitable for managing large-size files.
When memory ing files are used for I/O processing, the system transfers data by page. All internal memory pages are managed by the Virtual Memory Manager, which determines when the Memory Page is paged to the disk, which pages should be released to provide free space for other processes, and how many page spaces each process can have beyond the actually allocated physical memory. Because the Virtual Memory Manager processes all disk I/O in a unified way (reads and writes memory data on pages ), therefore, this optimization enables it to process memory operations at a sufficient speed.
Any actual I/O INTERACTION performed when a memory ing file is used is performed in the memory and accessed in the form of a standard memory address. The periodic paging of disks is also implemented by the operating system in the background, which is completely transparent to applications. This feature of memory ing files provides high benefits when performing disk transaction operations on large files.
It should be noted that during normal paging operations of the system, the memory ing file is not static and will be updated on a regular basis. If the page to be used is currently occupied by a memory ing file, the system releases the page. If the page data is not saved, the system automatically writes data on the page to the disk before releasing the page.
For Windows operating systems that use web virtual storage management, the memory ing file is an extension of its existing memory management components. Applications composed of executable code pages and data pages can be swapped in or out of memory by the operating system as needed. If a page in the memory is no longer needed, the operating system revokes the control of the page's original user and releases the page for other processes to use. The memory will be re-read from the executable files on the disk only when the page becomes a required page again. Similarly, when a process is initialized and started, the pages in the memory are used to store static and dynamic data of the application. Once operations on the application are submitted, these pages will also be backed up to system page files, which is similar to the process in which executable files are used to back up and execute code pages. Figure 1 shows the backup process of the code page and data page on disk storage:
Figure 1 backup of the Process Code Page and data page on disk storage
Obviously, if you can process code and data pages in the same way, it will undoubtedly improve the execution efficiency of the program, and the use of memory ing files can meet this requirement.
Management of large files
The memory ing file object does not need to undo all views of the memory ing file before closing the object. Before an object is released, all dirty pages are automatically written to the disk. Close the memory ing file object through closehandle (), but only release the object. If the memory ing file represents a disk file, you also need to call the standard file I/O function to close it. When processing large files, the memory ing file represents an excellent advantage. It only consumes a very small amount of physical resources and has little impact on the system. The following describes the general programming process of the memory ing file:
Figure 2 general process of using memory ing files
In some special industries, a 32-bit process often faces massive files with dozens of or even dozens of GB capacity. The virtual address space of a 32-bit process is only 232 = 4 GB, obviously, you cannot map all file images at a time. In this case, you can only map all parts of a large file to a small address space in the process in sequence. This requires appropriate changes to the preceding general process:
1) The image starting with the ing file.
2) access the image.
3) cancel this image
4) map a new image starting from a deeper shift in the file.
5) Repeat Step 2 until you access the full file data.
The following is a piece of code written based on this description for processing files larger than 4 GB:
// Select a file
Cfiledialog filedlg (true, "*. txt", "*. txt", null, "text file (*. txt) | *. txt |", this );
Filedlg. m_ofn.flags | = ofn_filemustexist;
Filedlg. m_ofn.lpstrtitle = "read data through memory ing Files ";
If (filedlg. domodal () = idok)
{
// Create a file object
Handle hfile = createfile (filedlg. getpathname (), generic_read | generic_write,
0, null, open_existing, file_attribute_normal, null );
If (hfile = invalid_handle_value)
{
Trace ("failed to create file object, error code: % d/R/N", getlasterror ());
Return;
}
// Create a file ing object
Handle hfilemap = createfilemapping (hfile, null, page_readwrite, 0, 0, null );
If (hfilemap = NULL)
{
Trace ("failed to create file ing object, error code: % d/R/N", getlasterror ());
Return;
}
// Obtain the system allocation granularity
System_info sysinfo;
Getsysteminfo (& sysinfo );
DWORD dwgran = sysinfo. dwallocationgranularity;
// Obtain the file size
DWORD dwfilesizehigh;
_ Int64 qwfilesize = getfilesize (hfile, & dwfilesizehigh );
Qwfilesize | = (_ int64) dwfilesizehigh) <32 );
// Close the object
Closehandle (hfile );
// Offset address
_ Int64 qwfileoffset = 0;
// Block size
DWORD dwblockbytes = dwgran;
While (qwfilesize> 0)
{
// Ing View
If (qwfilesize <dwgran)
Dwblockbytes = (DWORD) qwfilesize;
Lpbyte lpbmapaddress = (lpbyte) mapviewoffile (hfilemap, file_map_all_access, (DWORD) (qwfileoffset> 32), (DWORD) (qwfileoffset & 0 xffffffffff), dwblockbytes );
If (lpbmapaddress = NULL)
{
Trace (" ing file failed, error code: % d/R/N", getlasterror ());
Return;
}
// Access the mapped View
For (DWORD I = 0; I <dwblockbytes; I ++)
{
Byte temp = * (lpbmapaddress + I );
}
// Undo the file Image
Unmapviewoffile (lpbmapaddress );
// Modify parameters
Qwfileoffset + = dwblockbytes;
Qwfilesize-= dwblockbytes;
}
// Close the file ing object handle
Closehandle (hfilemap );
Afxmessagebox ("Successful access to files ");
}
In this example, get the high 32-bit and low 32-bit values of the processed file length (64-bit) through getfilesize. In the ing process, set the size of each mapped block to 1000 times the allocation granularity. If the file length is smaller than 1000 times the allocation granularity, set the block size to the actual length of the file. During the processing, ing, access, and undo ing constitute a loop. After each file block is processed, the file ing object is closed to sort out each file block. Functions such as createfilemapping () and mapviewoffile () are used for memory file ing.
The following describes these key functions:
1) createfile (): The createfile () function is a widely used function. The usage here is nothing special, but pay attention to the following points: first, the access mode parameter dwdesiredaccess. This parameter sets the access type for the file kernel object. The permission can be set to generic_read, generic_write, generic_read | generic_write, and device query permissions. When using a ing file, you can only open those files with the read permission, that is, you can only apply the combination of generic_read and generic_read | generic_write. You must also note that the shared mode parameter dww.mode is used. This parameter defines how to share file kernel objects. It may be set to file_cmd_read, file_cmd_write, and 0, and can be used in combination. When it is set to 0, shared objects are not allowed; file_assist_read and file_assist_write are shared objects only when read-only and write-only access is required.
Because the data can be shared among multiple processes through the memory ing file, the impact of the dwsharemode parameter setting on running results should be considered during such an application.
2) createfilemapping (): This function is used to create a file ing kernel object to inform the System of the size of the physical memory required for the file ing object. Creating a memory ing file object has almost no impact on system resources and does not affect the virtual address space of processes. In addition to the internal resources that need to be used to indicate the object, virtual memory is usually not allocated to it. However, if the memory ing file object is used for shared memory, when creating an object, the system reserves sufficient space in the system page file for memory ing files.
The first hfile parameter of the function is the handle that identifies the file to be mapped to the address space of the process. Although the physical storage of memory ing files comes from files on disks, rather than system page files, creating a memory ing file is like retaining an address space area and submitting the physical storage to this area. The second parameter is the pointer to the security_attributes structure of the file ing kernel object. This determines whether the subprocess can inherit the returned handle. Generally, a null value is passed for the returned handle. The default security attribute is used to prevent the return handle from being inherited.
The following parameters are used to set the protection attribute of the file image after the file is mapped. The possible values are page_readonly, page_readwrite, and page_writecopy. When creating a file ing object, the system does not reserve the address space region for it, nor map the file storage to the region. However, when the system maps the memory to the address space of the process, the system must know exactly the protection attribute that should be granted to the physical storage page. When setting the protection attribute, it must match the access ID specified when the file is opened using the createfile () function. Otherwise, execution of createfilemapping () fails. Set the page_readwrite attribute here. In addition to the preceding three page protection attributes, there are also four section protection attributes that can be used together:
Zone Protection attributes |
Description |
Sec_commit |
Allocate physical storage to all pages in the zone in memory or disk page files |
Sec_image |
Inform the system that the mapped file is a portable EXE file Image |
Sec_nocache |
Inform the system that no memory ing file of the file is put into the high-speed cache, which is mostly used by hardware driver developers. |
Sec_reserve |
Reserve all pages in a zone without allocating physical storage |
The following two Parameters specify the high 32-bit value and the low 32-bit value of the maximum number of bytes of the file ing object to be created, in fact, the maximum number of bytes of the file is set (up to 16 EB files can be processed ). These two parameters can meet the basic condition of ensuring that the file ing object can obtain enough physical storage. When the size set by the parameter is smaller than the actual size of the file, the system maps the specified number of bytes from the file. Set it to 0, so that the created file ing object will be the current size of the file. In both cases, the file size cannot be changed. If the specified parameter is greater than the actual file size, the system will expand the file before the createfilemapping () function returns. It should be noted that the size of the file ing object is static and cannot be changed once it is created. If the size of the configured file ing object is too small, the file cannot be fully accessed.
As mentioned earlier in this section, creating a file ing object does not need to spend any system resources. Therefore, follow the principle of "more than never missing, generally, the size of the file ing object should be set to the same value of the file size. The final parameter of the function can be named as the ing object. To open an existing file ing object, you must name it. The requirement for this name string is limited to names not used by other objects.
3) mapviewoffile (): After a memory ing file object is created and a valid handle is obtained, the handle can be used to map an image of the file in the virtual address space of the process. If the memory ing file object already exists, the image can be mapped to or unmapped. When a file image is mapped, the system must retain an address space area for the file data, and submit the file data as the physical storage mapped to the region. In the process address space, a large enough continuous address space (usually enough to overwrite the entire file image) will be specified to this file image. However, the physical pages of the memory are allocated based on actual usage requirements. A physical memory page that corresponds to the memory ing file image page is allocated when the page is interrupted, which will be automatically completed when any address on the memory page is read and written for the first time. Mapviewoffile () is an image mapped to the memory ing file,
The first parameter of the function is the handle of the memory ing file returned by createfilemapping (), and the second parameter specifies the access type of the file image, optional values include file_map_write, file_map_read, file_map_all_access, and file_map_copy. The specific setting depends on the protection mode allowed by the file ing object. According to the preceding Code settings, the file_map_all_access parameter should be used here. This mechanism allows the Object Creator to control the method of ing the object. The following two Parameters specify the low 32-bit and high 32-bit addresses of the 64-bit offset addresses of the memory ing file, which are the distance from the memory ing File Header location to the image start location. The final parameter specifies the view size. If it is set to 0, the preceding offset address is ignored and the system maps the entire file to an image. If mapviewoffile () is successfully executed, a pointer pointing to the starting address of the file image in the process's address space is returned. If it fails, null is returned. In the process, you can create multiple file images for the same file ing object. These images can exist and overlap in the system, or they can be different from the size of the corresponding file ing object, but it cannot be larger than the size of the file ing object.
4) unmapviewoffile (): when you no longer need to retain the file image data mapped to the process address space area, you can call the unmapviewoffile () function to release it. The function structure is very simple. You only need to provide the starting address (base address of the region) of the image in the process as the parameter. The input parameter of this function is the pointer to the starting address of the file image in the process's address space returned when mapviewoffile () is called. After mapviewoffile () is called, you must ensure that the unmapviewoffile () function can be executed before the process exits. Otherwise, the areas previously reserved after the process ends will not be released, even if the mapviewoffile () system is repeatedly called when the process is started again, a new region is always retained in the address space of the process, and all previously reserved regions are not released.
A special case is that two identical images are mapped to the same memory ing file for revocation. As mentioned above, the same memory ing file can have multiple images, and these images can overlap. Therefore, this situation is legal. In this case, although it seems impossible to have two images with identical base addresses in the address space of a single process, they cannot be distinguished. However, the base address returned by mapviewoffile () is only the starting base address of the file image in the process address space, therefore, when you map two identical images of the same memory ing file, two identical images with different base addresses will be generated for the same part of the memory ing file. You can call unmapviewoffile () in the same way () undo it from the address space of the process.
5) closehandle (): Like most Win32 objects, closehandle () function is used to close opened kernel objects. If you forget to close the object, resource leakage will occur when the program continues to run. When the program exits, the operating system automatically closes any objects that have been opened but not closed in the process. However, during the process, too many resource handles will be accumulated. Therefore, it makes sense to close an object through closehandle () when it is no longer needed.
Summary
This article describes in detail the application of memory ing files in large file processing. The actual test shows that memory ing files have good performance in processing large data volumes, such as the cfile class and readfile () and writefile () such functions have obvious advantages in file processing methods. The program code described in this article is compiled by Microsoft Visual C ++ 2000 under Windows 6.0 professional.