VC + + Use memory-mapped file to process large files

Source: Internet
Author: User
Tags file handling file size win32

Absrtact: This paper gives a convenient and practical method to solve the processing of large file reading and storage, and introduces the specific implementation process with the relevant program code.

Introduction

File operations are one of the most basic functions of an application, WIN32 APIs and MFC provide functions and classes that support file handling, and are commonly used with Win32 API CreateFile (), WriteFile (), ReadFile (), and MFC-provided CFile classes. Generally speaking, these functions can meet the requirements of most occasions, but for some special application areas require dozens of GB, hundreds of GB, or even a few terabytes of mass storage, and then the usual file processing method to deal with the obvious is not feasible. Currently, the operation of such a large file is typically done in memory-mapped files, which is discussed below for this type of Windows core programming technology.

Memory-Mapped files

Memory-mapped files are similar to virtual memory, with memory-mapped files that preserve an area of an address space and submit physical storage to this area, except that the physical memory of the memory file map comes from a file that already exists on disk, rather than a system page file. And you must first map the file before you operate on it, just as you would load the entire file from disk into memory. As you can see, when you use a memory-mapped file to work with files that are stored on disk, you will no longer have to perform I/O to the file, which means that you will no longer have to request and allocate caching for the file while processing the file, all file caching operations are managed directly by the system, because the file data is not loaded into memory, Steps such as the writeback of data from memory to file and the release of memory blocks make memory-mapped files play a significant role in processing large volumes of data. In addition, the actual project system often needs to share data between multiple processes, if the data volume is small, the processing method is flexible, if the share data capacity is huge, then need to use the memory mapping file. In fact, memory-mapped files are the most efficient way to resolve data sharing between multiple processes on a local.

Memory-mapped files are not simple file I/O operations, but actually use the core programming technology of Windows-memory management. Therefore, if you want to have a deeper understanding of memory-mapped files, you must have a clear understanding of the memory management mechanism of the Windows operating system, memory management related knowledge is very complex, beyond the scope of this article, no longer repeat, interested readers can refer to other related books. The following is a general method for using memory-mapped files:

First you create or open a file kernel object by using the CreateFile () function, which identifies the file on disk that will be used as a memory-mapped file. After the file image is advertised to the operating system in the physical storage with CreateFile (), only the path to the image file is specified, and the length of the image is not specified. To specify how much physical storage is required for a file map object, you also need to create a file mapping kernel object by using the CreateFileMapping () function to tell the system file size and how to access the file. After the file mapping object has been created, you must also maintain an address space area for the file data and submit the file data as physical storage mapped to the zone. The MapViewOfFile () function is responsible for mapping all or part of a file mapping object to the process address space through system management. At this point, the use and processing of the memory-mapped file is essentially the same as that of the file data that is normally loaded into memory, and when the memory-mapped file is used, the purge and use of the resource are completed through a series of operations. This section is relatively simple and can be done by UnmapViewOfFile () to undo the image of the file data from the address space of the process, and to close the file mapping object and file object created earlier by CloseHandle ().

Memory-mapped file-related functions

When using memory-mapped files, the API functions used are essentially the ones mentioned earlier, which are described below:

HANDLE CreateFile(LPCTSTR lpFileName,
DWORD dwDesiredAccess,
DWORD dwShareMode,
LPSECURITY_ATTRIBUTES lpSecurityAttributes,
DWORD dwCreationDisposition,
DWORD dwFlagsAndAttributes,
HANDLE hTemplateFile);

function CreateFile () is often used to create and open files even in normal file operations, when processing memory-mapped files, the function creates/opens a file kernel object, and returns its handle. When calling this function, you need to set the parameters dwDesiredAccess and dwShareMode based on whether you need to read and write the data and share the files, and the incorrect parameter settings will cause the failure of the corresponding operation.

HANDLE CreateFileMapping(HANDLE hFile,
LPSECURITY_ATTRIBUTES lpFileMappingAttributes,
DWORD flProtect,
DWORD dwMaximumSizeHigh,
DWORD dwMaximumSizeLow,
LPCTSTR lpName);

The CreateFileMapping () function creates a file map kernel object that specifies the file handle to be mapped to the process address space through the parameter hfile (the handle is fetched by the return value of the CreateFile () function). Because the physical storage of a memory-mapped file is actually a file stored on disk, instead of allocating memory from the system's page file, the system does not actively reserve the address space for it, nor does it automatically map the file's storage space to the zone, in order for the system to determine what kind of protection properties to take on the page, Needs to be set through parameter Flprotect, protection properties page_readonly, Page_readwrite, and page_writecopy, respectively, indicate that the file mapping object is mapped to read and write file data. When using page_readonly, it is necessary to ensure that the CreateFile () is using the Generic_read parameter, and Page_readwrite requires CreateFile () to use generic_read| Generic_write parameters; As for the attribute page_writecopy, you only need to ensure that CreateFile () takes one of Generic_read and Generic_write. The parameter Dwmaximumsizehigh and Dwmaximumsizelow of the DWORD are also very important, specifying the maximum number of bytes for the file, because the two parameters are 64 bits, so the maximum supported file length is 16EB. Can almost meet the requirements of any large data processing occasions.

LPVOID MapViewOfFile(HANDLE hFileMappingObject,
DWORD dwDesiredAccess,
DWORD dwFileOffsetHigh,
DWORD dwFileOffsetLow,
DWORD dwNumberOfBytesToMap);

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.