Use memory ing files in VC ++ to process large files

Source: Internet
Author: User
In VC ++, the memory ing file is used to process large files-general Linux technology-Linux programming and kernel information. The following is a detailed description. Abstract: This article provides a convenient and practical solution for reading and storing large files, and introduces the specific implementation process based on the relevant program code.

Introduction

File operations are one of the most basic functions of applications. Both Win32 APIs and MFC provide functions and classes that support file processing. Commonly Used functions include CreateFile () and WriteFile () of Win32 APIs () readFile () and CFile class provided by MFC. In general, these functions can meet the requirements of most scenarios, but for some special application fields, the massive storage needs may be dozens of GB, hundreds of GB, or even several TB, it is obviously not feasible to use the common file processing method. Currently, the operations on such large files are generally handled in the memory ing file mode. This article will discuss this core Windows programming technology.

Memory ing File

The memory ing file is similar to the virtual memory. You can use the memory ing file to reserve a region of the address space and submit the physical memory to this region, the physical memory for memory file ing only comes from a file that already exists on the disk, rather than a system page file. Before operating the file, you must map the file first, it is like loading the entire file from the disk to the memory. It can be seen that when using memory ing files to process files stored on disks, I/O operations on files are no longer required, this means that when processing a file, you do not have to apply for and allocate a cache for the file. All File Cache operations are directly managed by the system, the process of loading file data to the memory, writing back data from the memory to the file, and releasing memory blocks is canceled, this makes memory ing files play an important role in processing large data volumes. In addition, the system in the actual project usually needs to share data among multiple processes. If the data volume is small, the processing method is flexible and changeable. If the shared data capacity is large, therefore, we need to use the memory ing file. In fact, the memory ing file is the most effective solution to data sharing between multiple local processes.

Memory ing files are not simple file I/O operations, but actually use the core Programming Technology of Windows-memory management. Therefore, if you want to have a deeper understanding of memory ing files, you must have a clear understanding of the memory management mechanism of the Windows operating system. The knowledge of memory management is very complex, beyond the scope of this article, I will not go into details here. Interested readers can refer to other related books. The following describes how to use memory ing files:

First, you must use the CreateFile () function to create or open a file kernel object, which identifies the file to be used as the memory ing file on the disk. After CreateFile () is used to advertise the location of the file image in the physical storage to the operating system, only the path of the image file is specified, and the image length is not specified yet. To specify the size of the physical storage space required by the file ing object, you must use the CreateFileMapping () function to create a file ing kernel object to inform the System of the file size and file access method. After creating a file ing object, you must retain an address space area for the file data and submit the file data as a physical storage mapped to the region. The MapViewOfFile () function maps all or part of the object mapped to the process address space through system management. In this case, the use and processing of memory ing files are basically the same as that of file data normally loaded into the memory. When the memory ing file is used, you also need to perform a series of operations to clear and release resources that have been used. This part is relatively simple. You can use UnmapViewOfFile () to cancel the image of file data from the address space of the process, and use CloseHandle () to close the previously created file ing object and file object.

Memory ing file functions

When using memory ing files, the APIs used are mainly the functions mentioned above. The following describes them respectively:

HANDLE CreateFile (LPCTSTR lpFileName,
DWORD dwDesiredAccess,
DWORD dw1_mode,
LPSECURITY_ATTRIBUTES lpSecurityAttributes,
DWORD dwCreationDisposition,
DWORD dwFlagsAndAttributes,
HANDLE hTemplateFile );

The CreateFile () function is often used to create and open files even in common file operations. When Processing Memory ing files, this function creates/opens a file kernel object, and return the handle. When calling this function, you need to set the parameters dwDesiredAccess and dwredmode based on whether data reading and writing and file sharing are required, incorrect Parameter settings may cause operation failures.

HANDLE CreateFileMapping (HANDLE hFile,
LPSECURITY_ATTRIBUTES lpFileMappingAttributes,
DWORD flProtect,
DWORD dwMaximumSizeHigh,
DWORD dwMaximumSizeLow,
Lptstr lpName );

The CreateFileMapping () function creates a file ing Kernel Object and specifies the file handle to be mapped to the process address space through the hFile parameter (this handle is obtained by the return value of the CreateFile () function ). Because the physical memory of the memory ing file is actually stored in a file on the disk, rather than the memory allocated from the page file of the system, the system will not take the initiative to reserve the address space area for it, the bucket of the file is not automatically mapped to this region. To enable the system to determine the protection attribute of the page, you need to set the parameter flProtect, protection attributes PAGE_READONLY, PAGE_READWRITE, and PAGE_WRITECOPY indicate that after the file ing object is mapped, the file data can be read and written. When PAGE_READONLY is used, make sure that CreateFile () adopts the GENERIC_READ parameter; PAGE_READWRITE requires that CreateFile () adopts the GENERIC_READ | GENERIC_WRITE parameter; as for the property PAGE_WRITECOPY, you only need to ensure that CreateFile () use either GENERIC_READ or GENERIC_WRITE. DWORD parameters dwMaximumSizeHigh and dwMaximumSizeLow are also very important. The maximum number of bytes of the file is specified. Because these two parameters are 64-bit, the maximum file length supported is 16EB, it can meet almost all the requirements for processing large data files.

LPVOID MapViewOfFile (HANDLE hFileMappingObject,
DWORD dwDesiredAccess,
DWORD dwFileOffsetHigh,
DWORD dwFileOffsetLow,
DWORD dwNumberOfBytesToMap );

The MapViewOfFile () function maps file data to the address space of the process. The hFileMappingObject parameter is the file image object handle returned by CreateFileMapping. The dwDesiredAccess parameter specifies the access method to the file data again, and must also match the protection attribute set by the CreateFileMapping () function. Although repeated setting of protection attributes seems redundant, the application can effectively control data protection attributes. The MapViewOfFile () function allows all or part of the ing files. During ing, you must specify the offset address of the data file and the length to be mapped. The file offset address is specified by a 64-bit value consisting of DWORD parameters dwFileOffsetHigh and dwFileOffsetLow, and must be an integer multiple of the distribution granularity of the operating system. For Windows operating systems, the allocation granularity is fixed to 64 KB. Of course, you can also use the following code to dynamically obtain the distribution granularity of the current operating system:

SYSTEM_INFO sinf;
GetSystemInfo (& sinf );
DWORD dwAllocationGranularity = sinf. dwAllocationGranularity;

The dwNumberOfBytesToMap parameter specifies the ing length of the data file. Note that for Windows 9x operating systems, if MapViewOfFile () cannot find a large enough area to store the entire file ing object, the return value is NULL. However, in Windows 2000, MapViewOfFile () only needs to find a large area for the necessary view, regardless of the size of the entire file ing object.

After processing the file mapped to the process address space area, you need to release the file data image using the UnmapViewOfFile () function. The prototype declaration of this function is as follows:

BOOL UnmapViewOfFile (LPCVOID lpBaseAddress );

The unique parameter lpBaseAddress specifies the base address of the returned region and must be set to the return value of MapViewOfFile. After MapViewOfFile () is used, a corresponding UnmapViewOfFile () call is required. Otherwise, the reserved region cannot be released before the process ends. In addition, the file Kernel Object and file ing kernel object have been created by the CreateFile () and CreateFileMapping () functions. It is necessary to release the object through CloseHandle () before the process ends, otherwise, resource leakage may occur.

In addition to the required API functions, You must select other auxiliary functions as needed when using memory ing files. For example, when using memory ing files, the system caches the data pages of files at a high speed to improve the speed, and does not immediately update the disk image of the files when processing the file ing view. To solve this problem, you can use the FlushViewOfFile () function, which forces the system to re-write some or all of the modified data to the disk image, this ensures that all data updates can be saved to the disk in a timely manner.

Example of processing large files with memory ing files

The following describes how to use memory ing files based on a specific instance. The instance receives data from the port and stores the data on the disk in real time. Due to the large amount of data (dozens of GB), the memory ing file is used for processing. The following describes some of the main code in the MainProc of the working thread, Which is started when the thread is running. When data arrives on the port, the event hEvent [0], WaitForMultipleObjects () will be triggered () after the function waits for the event to occur, it saves the received data to the disk. If it stops receiving the event, it will issue the event hEvent [1]. the event processing process is responsible for releasing resources and closing files. The following describes the implementation process of the thread processing function:

......
// Create a file kernel object, whose handle is stored in hFile
HANDLE hFile = CreateFile ("Recv1.zip ",
GENERIC_WRITE | GENERIC_READ,
File_pai_read,
NULL,
CREATE_ALWAYS,
FILE_FLAG_SEQUENTIAL_SCAN,
NULL );

// Create a file ing Kernel Object and save the handle to hFileMapping
HANDLE hFileMapping = CreateFileMapping (hFile, NULL, PAGE_READWRITE,
0, 0x4000000, NULL );
// Release the file Kernel Object
CloseHandle (hFile );

// Set parameters such as size and offset
_ Int64 qwFileSize = 0x4000000;
_ Int64 qwFileOffset = 0;
_ Int64 T = 600 * sinf. dwAllocationGranularity;
DWORD dwBytesInBlock = 1000 * sinf. dwAllocationGranularity;

// Map the file data to the address space of the process
PBYTE pbFile = (PBYTE) MapViewOfFile (hFileMapping,
FILE_MAP_ALL_ACCESS,
(DWORD) (qwFileOffset> 32), (DWORD) (qwFileOffset & 0 xFFFFFFFF), dwBytesInBlock );
While (bLoop)
{
// Capture events hEvent [0] and events hEvent [1]
DWORD ret = WaitForMultipleObjects (2, hEvent, FALSE, INFINITE );
Ret-= WAIT_OBJECT_0;
Switch (ret)
{
// Triggered when a data event is received
Case 0:
// Receive data from the port and save it to the memory ing File
NReadLen = syio_Read (port [1], pbFile + qwFileOffset, QueueLen );
QwFileOffset + = nReadLen;

// When data is fully written at 60%, a new ing view needs to be created later to prevent data overflow.
If (qwFileOffset> T)
{
T = qwFileOffset + 600 * sinf. dwAllocationGranularity;
UnmapViewOfFile (pbFile );
PbFile = (PBYTE) MapViewOfFile (hFileMapping,
FILE_MAP_ALL_ACCESS,
(DWORD) (qwFileOffset> 32), (DWORD) (qwFileOffset & 0 xFFFFFFFF), dwBytesInBlock );
}
Break;

// Trigger the termination event
Case 1:
BLoop = FALSE;

// Undo the file data image from the address space of the process
UnmapViewOfFile (pbFile );

// Close the file ing object
CloseHandle (hFileMapping );
Break;
}
}
...

During the process of terminating event triggering, if only the UnmapViewOfFile () and CloseHandle () functions are simply executed, the actual file size cannot be correctly identified, that is, if the opened memory ing file is 30 GB, the received data is only 14 GB, so after the above program is executed, the stored file length is still 30 GB. That is to say, after processing, the file must be restored to the actual size through the memory ing file again. The following is the main code to achieve this requirement:

// Create another file Kernel Object
HFile2 = CreateFile ("Recv.zip ",
GENERIC_WRITE | GENERIC_READ,
File_pai_read,
NULL,
CREATE_ALWAYS,
FILE_FLAG_SEQUENTIAL_SCAN,
NULL );

// Create another file to map the kernel object with the actual data length
HFileMapping2 = CreateFileMapping (hFile2,
NULL,
PAGE_READWRITE,
0,
(DWORD) (qwFileOffset & 0 xFFFFFFFF ),
NULL );

// Close the file Kernel Object
CloseHandle (hFile2 );

// Map the file data to the address space of the process
PbFile2 = (PBYTE) MapViewOfFile (hFileMapping2,
FILE_MAP_ALL_ACCESS,
0, 0, qwFileOffset );

// Copy data from the original memory ing file to this memory ing File
Memcpy (pbFile2, pbFile, qwFileOffset );

File: // undo the file data image from the address space of the process
UnmapViewOfFile (pbFile );
UnmapViewOfFile (pbFile2 );

// Close the file ing object
CloseHandle (hFileMapping );
CloseHandle (hFileMapping2 );

// Delete a temporary file
DeleteFile ("Recv1.zip ");

Conclusion

The actual test shows that memory ing files have good performance in processing large data volumes, such as the CFile class and ReadFile () and WriteFile () such functions have obvious advantages in file processing methods. The code described in this article is compiled by Microsoft Visual C ++ 6.0 under Windows 98.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.