In doing scientific research, the implementation of some big data algorithm, often to call some file I/O functions, in the large amount of data, in addition to the design of algorithms and data structure of time, in fact, the main time is the file I/O. Because the general method is to read out the contents of the disk file into memory, then modify, and then write back to disk. A read disk file is a system call that copies the contents of a file from disk to a buffer in the kernel space, and then copies the data to the user space, which is actually a two-time copy of the data. Writing back also requires two copies of the data. So the whole process will basically have at least four copies of the data, the file is slightly larger, I/O overhead is still very large. Therefore, it is necessary to take some measures to reduce the time-consuming. A memory-mapped file is a mechanism provided by the operating system that maps the address of a data file into the address space of a process, which is a memory-mapped data file, which is handy when manipulating large amounts of data.
Principle Interpretation
Memory-mapped files (memeory-mapped file) is a mechanism of memory management of the operating system itself, and its idea is to keep a certain area of address space and submit the physical memory to this region. The physical memory here differs from the virtual memory from a file that is already on disk, not the system's paging file. Once this file is mapped, it can be accessed as if it were loaded into memory. This idea is provided by Microsoft, and there are related interface functions, the idea is not simple I/O operation, involving the core technology of the Windows operating system-memory management knowledge.
Using memory-mapped files can be used to load and execute on one hand. EXE and DLL files, greatly saving the page file space and the time it takes the application to start running, on the other hand, it can be used to access data files on disk, do not have to perform I/O operations on files, and can not cache file content. In addition, using a memory-mapped file allows you to share data between multiple threads running on the same machine. Windows also provides other ways of communicating between processes to communicate data between processes, but these methods are implemented using memory-mapped files, which makes the memory-mapped file the most efficient way to communicate between multiple processes on a single machine.
Use steps
The steps for using a memory-mapped file are as follows:
(1) Create or open a file kernel object that identifies the file on disk that needs to be used as a memory-mapped file.
(2) Create a file map kernel object that tells the system the size of the file and how this file is accessed.
(3) Let the system map all or part of the file mapping object to the address space of the process.
When you are finished using the memory-mapped file, you must follow these steps to clear it:
(1) tells the system to undo the image of the file-mapped kernel object from the process's address space.
(2) Close the file map kernel object.
(3) Close the file kernel object.
Step 1: Create or open a file kernel object, you can call the CreateFile function:
HANDLE CreateFile (LPCTSTR lpfilename, //Pointer to file name DWORD dwdesiredaccess, //access mode (write/Read) DWORD dwShareMode, //Shared mode lpsecurity_attributes lpsecurityattributes,//Pointer to security attribute DWORD dwCreationDisposition, //How to create a DWORD dwflagsandattributes, //File attribute HANDLE htemplatefile // Used to copy a file handle);
By calling the CreateFile function, you can tell the location of the physical memory of the operating system file image, and the path name passed is used to indicate the exact location of the physical memory supporting the file image on the disk (or network or disc). At this time, you must also tell
The operating system, how much physical memory is required for the file mapping object, and the following function needs to be called to complete this operation.
Step 2: Create a file map kernel object and call the CreateFileMapping function.
HANDLE createfilemapping ( HANDLE hfile, //physical file handle lpsecurity_attributes lpattributes,//security setting DWORD Flprotect, //Protection Settings DWORD Dwmaximumsizehigh, //High File size DWORD Dwmaximumsizelow, //low file size LPCTSTR pszname //Shared memory name);
The first parameter, hfile, is used to identify the file handle that needs to be mapped to the process address space, which is returned by the previously called CreateFile function.
After creating a file mapping object, the system still needs to preserve an area of address space for the file's data and submit the file's data as a physical memory mapped to the region, calling the following function to complete the operation.
Step 3: Map the file data to the address space of the process, you can call the MapViewOfFile function:
LPVOID mapviewoffile ( HANDLE hfilemappingobject, //File map object handle created by DWORD dwdesiredaccess, //access mode DWORD Dwfileoffsethigh, //file offset high 32-bit DWORD Dwfileoffsetlow, //Low 32-bit DWORD for file offset Dwnumberofbytestomap //map view size);
The first parameter, Hfilemappingobject, is used to identify a handle to a file-mapped object that is returned when the previous call to CreateFileMapping or openfilemapping.
When you map a file to the address space of a process, you do not have to map the entire file at once, you can map only a small part of the file to the address space, and this portion of the file that is mapped to the process's address space is called a view. When you map a file view to the address space of a process, you must do two things. First, you must tell the system which byte in the data file should be mapped as the first byte in the view, which can be done using both the Dwfileoffsethigh and Dwfileoffsetlow parameters. Second, you must tell the system how many bytes of data files are mapped to the address space, which can be set using the Dwnumberofbytestomap parameter.
Step 4: Undo the image of the file data from the process's address space, and you can call the UnmapViewOfFile function to release it:
BOOL unmapviewoffile (lpcvoid lpbaseaddress);
LPBASEADDRESS Specifies the base address of the return range, which must be set to the same value as the return value of MapViewOfFile (). If this function is not called, the reserved area will not be freed until the process terminates. Whenever the MapViewOfFile function is called, a new zone is retained in the process address space, and all previously reserved areas are not freed.
Steps 5 and 6: Close the file Map object and file object, in order to prevent the problem of resource leakage, the CreateFile () and createfilemapping () function created the file kernel object and the file map kernel object, it is necessary to pass the CloseHandle before the process terminates ( ) to release it.
In the specific programming, the approximate calling process is as follows:
HANDLE hfile=createfile (...); HANDLE hfilemapping=createfilemapping (hfile,...); CloseHandle (hfile); PVOID pvfile=mapviewoffile (hfilemapping,...); CloseHandle (hfilemapping);//Use a memory-mapped file ... UnmapViewOfFile (Pvfile);
If you use the same file to create more file-mapping objects, or to map multiple views of the same file-mapping object,
Then you cannot call the CloseHandle function earlier, because you might need to use their handles later to make more calls to the createfilemapping and MapViewOfFile functions, respectively.
Using memory maps to handle large files, such as mapping 16TB files to a smaller memory-mapped space, direct full mapping is not possible, and you must map a file view that contains only a small subset of the file data. You can think of this by first mapping a view of the beginning of a file, after you have finished accessing the first view of the file, you can cancel the mapping to that part, and then map the view that starts in the later position in the file. Repeat the operation until the entire file is accessed.
In addition, memory-mapped files are consistent. The system allows the same data for a file to be mapped to multiple views, such as mapping 10KB at the beginning of a file to a view, and then mapping the 4KB at the beginning of the file to another process. As long as you map the same file-mapped object, the system ensures that the mapped view data is consistent. For example, if an application changes the contents of a file in a view, the data in the other views will change accordingly.
Example
Reference to the book, "Windows core Programming", is a variant of the example in the book.
In a 8GB binary and 32-bit address space, calculate the number of all 0 bytes in the binary file.
#include <Windows.h> #include <iostream> #include <time.h>using namespace Std;__int64 count () {// Get system allocation granularity system_info SysInfo; GetSystemInfo (&sysinfo); HANDLE Hfile=createfile (TEXT ("D:\\data.dat"), generic_read| Generic_write,file_share_read,0,open_existing,file_flag_sequential_scan,null); if (HFile==INVALID_HANDLE_VALUE) { cout<< "Create file object failed with error code:" <<getlasterror () <<endl;return 0;} HANDLE hfilemapping=createfilemapping (Hfile,null,page_readonly,0,0,null); if (hfilemapping==null) {cout<< " Failed to create file mapping object, error code: "<<getlasterror () <<endl;return 0;} DWORD Dwfilesizehigh;__int64 qwfilesize=getfilesize (Hfile,&dwfilesizehigh); qwfilesize+= (((__int64) Dwfilesizehigh) <<32); CloseHandle (hfile); __int64 qwfileoffset=0,qwnumof0s=0;while (qwfilesize>0) {//number of bytes mapped to view DWORD dwbytesinblock= Sysinfo.dwallocationgranularity;if (qwfilesize<sysinfo.dwallocationgranularity) dwBytesInBlock= (DWORD) Qwfilesize; Pbyte pbfile= (pbyte) MapViewOfFile (Hfilemapping,file_map_read, (DWORD) (qwfileoffset) >>32,//starting bytes (DWORD) (QWFILEOFFSET&0XFFFFFFFF),//in filedwbytesinblock//# of bytes to map), for (DWORD dwbyte=0;dwbyte<dwbytesinblock;dwbyte++) {if (pbfile[dwbyte]==0) {qwnumof0s++;}} UnmapViewOfFile (pbfile); qwfileoffset+=dwbytesinblock;qwfilesize-=dwbytesinblock;} CloseHandle (hfilemapping); return qwnumof0s;} int main () {clock_t start,end;start=clock (); __int64 num=count (); End=clock ();cout<< "8GB binary file statistics total time:" << ( End-start)/clocks_per_sec<< "s" <<endl;cout<< "0 Number:" <<num<< "<<endl;system" ("pause"); return 0;}
Summary
The Windows operating system itself provides a number of excellent mechanisms that enable applications to share data and information quickly and easily, such as RPC, COM, OLE, DDE, window messages, clipboard, mail slots, pipelines, sockets, and so on. In Windows, the lowest-level mechanism for sharing data on a single machine is the memory-mapped file. About memory-mapped files, not only in C + +, but also in the new IO type of Java, there are related classes, such as Mappedbytebuffer, etc.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Memory-Mapped files