VC + + in the use of memory map file processing large files

Source: Internet
Author: User
Tags file handling function prototype readfile

Abstract: This paper presents a convenient and practical way to solve large files, such as reading, storage and other processing, combined with related programs
The code describes the specific implementation process.

Introduction

File operations are one of the most basic features of an application, and both the WIN32 API and MFC provide support for file processing
Numbers and classes, commonly used CreateFile (), WriteFile (), ReadFile () with Win32 APIs, and the CFile provided by MFC
classes, and so on. In general, these functions can meet the requirements of most occasions, but for some special application areas
Requires dozens of GB, hundreds of GB, or even a few terabytes of mass storage, and then the usual file processing method to handle the obvious
It won't work. Currently, the operation of these large files is usually handled in the form of a memory-mapped file.
, this article discusses the Windows core programming techniques.

Memory-Mapped files

Memory-mapped files are somewhat similar to virtual memory, where a memory-mapped file preserves an area of an address space.
The physical memory is submitted to this zone at the same time, except that the physical memory of the memory file mapping comes from an existing disk
File, not the system's page file, and you must first map the file before you manipulate the file,
Like loading an entire file from disk into memory. As you can see, using a memory-mapped file to process the storage on disk
File, you will no longer have to perform I/O operations on the file, which means that you will no longer have to request files for file processing and
Allocation cache, all file cache operations are directly managed by the system, due to the cancellation of loading the file data into memory, the number
Memory-to-file write-back and free-memory blocks, etc., making it possible to handle large volumes of files
Can play a very important role. In addition, systems in real-world projects often need to share data between multiple processes if
The data volume is small, the processing method is flexible, if the shared data capacity is huge, then need to use the memory-mapped text
To be carried out. In fact, a memory-mapped file is the most efficient way to resolve data sharing across multiple processes on-premises.

The memory-mapped file is not a simple file I/O operation, it actually uses the core programming technology of Windows--the memory tube
Acting Therefore, if you want to have a deeper understanding of memory-mapped files, you must have a memory management mechanism for the Windows operating system
Have a clear understanding of the memory management of the relevant knowledge is very complex, beyond the scope of this article, we will not repeat here,
Interested readers can refer to other related books. The following is a general way to use a memory-mapped file:

The first thing to do is to create or open a file kernel object through the CreateFile () function, which identifies the disk
The file that will be used as the memory-mapped file. The file image is advertised in the location of the physical memory with CreateFile ()
When the system is made, only the path of the image file is specified, and the length of the image is not specified. In order to specify that the file mapping object requires
How much physical storage space is also required through the createfilemapping () function to create a file-mapped kernel object to tell
The size of the system file and how the file is accessed. After you create a file-mapping object, you must also keep the file data
An address space region and submits the file data as a physical memory mapped to the region. by Mapviewoff
The Ile () function is responsible for mapping all or part of a file-mapped object to the process address space through the management of the system. At this time
The use and processing of memory-mapped files is essentially the same as the processing of file data that is normally loaded into memory, at the completion
The use of memory-mapped files, it is also necessary to complete a series of operations to clear and use the release of resources.
This section is relatively simple and can be done by UnmapViewOfFile () to undo the file data from the address space of the process.
The file-mapping object and the file object that you created earlier, like, through CloseHandle ().

Memory-mapped file-related functions

When using a memory-mapped file, the API functions used are mostly the ones mentioned earlier, the following respectively
To introduce it:

HANDLE CreateFile (LPCTSTR lpfilename,
DWORD dwDesiredAccess,
DWORD dwShareMode,
Lpsecurity_attributes lpSecurityAttributes,
DWORD dwCreationDisposition,
DWORD dwFlagsAndAttributes,
HANDLE htemplatefile);


function CreateFile () is often used to create, open, and process memory even in normal file operations.
When you map a file, the function creates/opens a file kernel object and returns its handle, which is required when calling the function.
Set parameters dwDesiredAccess and dwShareMode based on whether data read-write and file sharing are required, error
Parameter settings will cause the corresponding operation to fail.

HANDLE createfilemapping (HANDLE hfile,
Lpsecurity_attributes Lpfilemappingattributes,
DWORD Flprotect,
DWORD Dwmaximumsizehigh,
DWORD Dwmaximumsizelow,
LPCTSTR lpname);


The CreateFileMapping () function creates a file-mapped kernel object that is specified by the parameter hfile to be mapped into the
Handle to the file handle of the path address space (the handle is obtained by the return value of the CreateFile () function). Because the memory-mapped file
Physical memory is actually a file stored on disk, not the memory allocated from the system's page file, so the
will not automatically map the storage space of the file to the region, in order to allow the system to
Can determine what protection properties to take on the page, it needs to be set by the parameter Flprotect, and the property is protected page_read
Only, Page_readwrite, and page_writecopy, respectively, indicate that the file mapping object is mapped to read, read and write
File data. When using page_readonly, you must ensure that the CreateFile () is using the Generic_read parameter; P
Age_readwrite required CreateFile () to use the generic_read| Generic_write parameter; As for the attribute P
Age_writecopy only needs to ensure that CreateFile () uses Generic_read and Generic_write, one of which is
Can The parameters of the DWORD type Dwmaximumsizehigh and Dwmaximumsizelow are also quite important, specifying the file's
The maximum number of bytes, since these two parameters are 64 bits, the maximum supported file length is 16EB, can meet almost any
He data volume file processing requirements.

LPVOID MapViewOfFile (HANDLE hfilemappingobject,
DWORD dwDesiredAccess,
DWORD Dwfileoffsethigh,
DWORD Dwfileoffsetlow,
DWORD Dwnumberofbytestomap);


The MapViewOfFile () function is responsible for mapping the file data to the address space of the process, parameter Hfilemappingobjec
T is a handle to the file image object returned by CreateFileMapping (). The parameter dwdesiredaccess again specifies the text
Access to the data, and also to match the protection attributes set by the CreateFileMapping () function. Although
Repeated setting of the protection property here may seem redundant, but it can make the application more protective of the data
Effective control of sexual practices. The MapViewOfFile () function allows all or part of the mapping file to be specified when the data is mapped
The offset address of the file and the length to be mapped. Where the offset address of the file is Dwfileoffseth by the DWORD type parameter
IgH and Dwfileoffsetlow consist of 64-bit values to specify, and must be an integer multiple of the operating system's allocation granularity,
On the Windows operating system, the allocation granularity is fixed to 64KB. Of course, the following code can also be used to dynamically get the current operation
Allocation granularity of the system:

System_info Sinf;
GetSystemInfo (&sinf);
DWORD dwallocationgranularity = sinf.dwallocationgranularity;


The parameter dwnumberofbytestomap specifies the length of the mapping for the data file, which needs to be noted specifically for
Windows 9x operating system, if MapViewOfFile () cannot find a large enough area to hold the entire file mapping object
, a null value is returned (null), but under Windows 2000, MapViewOfFile () only needs to find the necessary views for the
An area large enough to take into account the size of the entire file mapping object.

After you complete the processing of files mapped to the process address space area, you need to complete the function unmapviewoffile ()
For the release of the file data image, the function prototype is declared as follows:

BOOL unmapviewoffile (lpcvoid lpbaseaddress);


The unique parameter lpbaseaddress specifies the base address of the return range, which must be set to MapViewOfFile ()
's return value. After you have used the function mapviewoffile (), you must have a corresponding unmapviewoffile () call,
Otherwise, the reserved area will not be freed until the process terminates. In addition, the front was also preceded by CreateFile () and Crea
The tefilemapping () function creates a file kernel object and a file-mapped kernel object, which is necessary before the process terminates.
CloseHandle () to release it, or there will be a resource leak issue.

In addition to these required API functions, there are other things to choose from when using memory-mapped files
Auxiliary functions. For example, when using a memory-mapped file, in order to speed up the system, the file's data page is
Cache, and does not immediately update the disk image of the file when processing the file map view. To solve this problem, consider making
Using the Flushviewoffile () function, this function forces the system to re-write some or all of the modified data to the disk image.
This ensures that all data updates can be saved to disk in a timely manner.
Using memory-mapped files to handle large file application examples

The following is a concrete example of how a memory-mapped file can be used. The instance receives the number from the port
It is stored on disk in real time, and is processed with a memory-mapped file, due to the large amount of data (dozens of GB).
The following is a section of the main code that is located in worker thread Mainproc, which starts when the program runs, when the port
An event will be emitted when the data arrives hevent[0],waitformultipleobjects () function waits until the event has occurred
Save the received data to disk, and if terminating the receive will issue an event Hevent[1], the event processing will be responsible for completing
The release of resources and the closing of files. The following is a concrete implementation of this thread-handling function:

......
Creates a file kernel object whose handle is saved in hfile
HANDLE hfile = CreateFile ("Recv1.zip",
Generic_write | Generic_read,
File_share_read,
Null
Create_always,
File_flag_sequential_scan,
NULL);

Create a file-mapped kernel object with a handle saved in hfilemapping
HANDLE hfilemapping = createfilemapping (Hfile,null,page_readwrite,
0, 0x4000000, NULL);
Releasing a file kernel object
CloseHandle (hfile);

Setting parameters such as size, offset, etc.
__int64 qwfilesize = 0x4000000;
__int64 qwfileoffset = 0;
__int64 T = * sinf.dwallocationgranularity;
DWORD dwbytesinblock = * sinf.dwallocationgranularity;

Map file data to the address space of a process
Pbyte pbfile = (pbyte) mapviewoffile (hfilemapping,
File_map_all_access,
(DWORD) (qwfileoffset>>32), (DWORD) (QWFILEOFFSET&0XFFFFFFFF), dwbytesinblock);
while (bloop)
{
Capture event Hevent[0] and event Hevent[1]
DWORD ret = waitformultipleobjects (2, Hevent, FALSE, INFINITE);
RET-= WAIT_OBJECT_0;
Switch (ret)
{
Receive data event triggering
Case 0:
receiving data from a port and saving it to a memory-mapped file
Nreadlen=syio_read (port[1], Pbfile + qwfileoffset, Queuelen);
Qwfileoffset + = Nreadlen;

When the data is full 60%, to prevent data overflow, you need to open a new map view after
if (Qwfileoffset > T)
{
T = qwfileoffset + * sinf.dwallocationgranularity;
UnmapViewOfFile (Pbfile);
Pbfile = (pbyte) mapviewoffile (hfilemapping,
File_map_all_access,
(DWORD) (qwfileoffset>>32), (DWORD) (QWFILEOFFSET&0XFFFFFFFF), dwbytesinblock);
}
Break

Terminating event triggering
Case 1:
Bloop = FALSE;

Undoing a file data image from the address space of a process
UnmapViewOfFile (Pbfile);

Close File mapping objects
CloseHandle (hfilemapping);
Break
}
}
...


If only simple execution of the UnmapViewOfFile () and CloseHandle () functions is performed during the termination event triggering process
The actual size of the file will not be correctly identified, that is, if the open memory mapping file is 30GB, and the data received is only 14
GB, the saved file length is still 30GB after the above program executes. In other words, once the processing is complete,
To restore a file to its actual size through a memory-mapped file, here is the main code to implement this requirement:

Create another file kernel object
HFile2 = CreateFile ("Recv.zip",
Generic_write | Generic_read,
File_share_read,
Null
Create_always,
File_flag_sequential_scan,
NULL);

Create another file mapping kernel object with actual data length
HFileMapping2 = CreateFileMapping (HFile2,
Null
Page_readwrite,
0,
(DWORD) (QWFILEOFFSET&0XFFFFFFFF),
NULL);

Close File Kernel objects
CloseHandle (HFile2);

Map file data to the address space of a process
PbFile2 = (pbyte) mapviewoffile (HFileMapping2,
File_map_all_access,
0, 0, qwfileoffset);

Copy data from the original memory-mapped file to this memory-mapped file
memcpy (PbFile2, Pbfile, Qwfileoffset);

file://undoing a file data image from the address space of a process
UnmapViewOfFile (Pbfile);
UnmapViewOfFile (PbFile2);

Close File mapping objects
CloseHandle (hfilemapping);
CloseHandle (HFILEMAPPING2);

Delete temporary files
DeleteFile ("Recv1.zip");


Conclusion

In practice, memory-mapped files exhibit good performance when working with large data volumes, compared to the usual use of CF
The file handling of functions such as Ile class and ReadFile () and WriteFile () has obvious advantages. The code described in this article is
Windows 98 was compiled by Microsoft Visual C + + 6.0.

VC + + in the use of memory map file processing large files

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.