C ++ uses memory ing files to process large files

Source: Internet
Author: User

Introduction file operations are one of the most basic functions of applications. Both Win32 APIs and MFC provide functions and classes that support file processing. Commonly Used functions include CreateFile () and WriteFile () of Win32 APIs () readFile () and CFile class provided by MFC. In general, these functions can meet the requirements of most scenarios, but for some special application fields, the massive storage needs may be dozens of GB, hundreds of GB, or even several TB, it is obviously not feasible to use the common file processing method. Currently, the operations on such large files are generally handled in the memory ing file mode. This article will discuss this core Windows programming technology. The memory ing file of the memory ing file is similar to that of the virtual memory. The memory ing file can reserve an address space area and submit the physical memory to this area, the physical memory for memory file ing only comes from a file that already exists on the disk, rather than a system page file. Before operating the file, you must map the file first, it is like loading the entire file from the disk to the memory. It can be seen that when using memory ing files to process files stored on disks, I/O operations on files are no longer required, this means that when processing a file, you do not have to apply for and allocate a cache for the file. All File Cache operations are directly managed by the system, the process of loading file data to the memory, writing back data from the memory to the file, and releasing memory blocks is canceled, this makes memory ing files play an important role in processing large data volumes. In addition, the system in the actual project usually needs to share data among multiple processes. If the data volume is small, the processing method is flexible and changeable. If the shared data capacity is large, therefore, we need to use the memory ing file. In fact, the memory ing file is the most effective solution to data sharing between multiple local processes. Memory ing files are not simple file I/O operations, but actually use the core Programming Technology of Windows-memory management. Therefore, if you want to have a deeper understanding of memory ing files, you must have a clear understanding of the memory management mechanism of the Windows operating system. The knowledge of memory management is very complex, beyond the scope of this article, I will not go into details here. Interested readers can refer to other related books. The following describes how to use a memory ing file: First, you must use the CreateFile () function to create or open a file kernel object, which identifies the file to be used as a memory ing file on the disk. After CreateFile () is used to advertise the location of the file image in the physical storage to the operating system, only the path of the image file is specified, and the image length is not specified yet. To specify the size of the physical storage space required by the file ing object, you must use the CreateFileMapping () function to create a file ing kernel object to inform the System of the file size and file access method. After creating a file ing object, you must retain an address space area for the file data and submit the file data as a physical storage mapped to the region. The MapViewOfFile () function maps all or part of the object mapped to the process address space through system management. In this case, the use and processing of memory ing files are basically the same as that of file data normally loaded into the memory. When the memory ing file is used, you also need to perform a series of operations to clear and release resources that have been used. This part is relatively simple. You can use UnmapViewOfFile () to cancel the image of file data from the address space of the process, and use CloseHandle () to close the previously created file ing object and file object. When memory ing file-related functions use memory ing files, the APIs used are mainly the previously mentioned functions. The following describes HANDLE CreateFile (LPCTSTR lpFileName, DWORD dwDesiredAccess, DWORD dwShareMode, LPSECURITY_ATTRIBUTES lpSecurityAttributes, DWORD dwCreationDisposition, DWORD watermark, HANDLE hTemplateFile); function CreateFile () is often used to create and open files even in common file operations, when Processing Memory ing files, this function creates/opens a file Kernel Object and returns its handle, when calling this function, you must set the dwDesiredAccess and dw1_mode parameters based on whether data reading and writing and file sharing are required. The parameter settings are incorrect. The Operation will fail. HANDLE CreateFileMapping (HANDLE hFile, LPSECURITY_ATTRIBUTES lpFileMappingAttributes, DWORD flProtect, DWORD dwMaximumSizeHigh, DWORD plugin, lptstr lpName); CreateFileMapping () function creates a file ing kernel object, use the hFile parameter to specify the file handle to be mapped to the process address space (this handle is obtained from the return value of the CreateFile () function ). Because the physical memory of the memory ing file is actually stored in a file on the disk, rather than the memory allocated from the page file of the system, the system will not take the initiative to reserve the address space area for it, the bucket of the file is not automatically mapped to this region. To enable the system to determine the protection attribute of the page, you need to set the parameter flProtect, protection attributes PAGE_READONLY, PAGE_READWRITE, and PAGE_WRITECOPY indicate that after the file ing object is mapped, the file data can be read and written. When PAGE_READONLY is used, make sure that CreateFile () adopts the GENERIC_READ parameter; PAGE_READWRITE requires that CreateFile () adopts the GENERIC_READ | GENERIC_WRITE parameter; as for the property PAGE_WRITECOPY, you only need to ensure that CreateFile () use either GENERIC_READ or GENERIC_WRITE. DWORD parameters dwMaximumSizeHigh and dwMaximumSizeLow are also very important. The maximum number of bytes of the file is specified. Because these two parameters are 64-bit, the maximum file length supported is 16EB, it can meet almost all the requirements for processing large data files. LPVOID MapViewOfFile (HANDLE hFileMappingObject, DWORD offline, DWORD dwFileOffsetHigh, DWORD offline, DWORD offline); MapViewOfFile () function maps file data to the address space of the process. The parameter hFileMappingObject is CreateFileMapping () handle of the returned file image object. The dwDesiredAccess parameter specifies the access method to the file data again, and must also match the protection attribute set by the CreateFileMapping () function. Although repeated setting of protection attributes seems redundant, the application can effectively control data protection attributes. The MapViewOfFile () function allows all or part of the ing files. During ing, you must specify the offset address of the data file and the length to be mapped. The file offset address is specified by a 64-bit value consisting of DWORD parameters dwFileOffsetHigh and dwFileOffsetLow, and must be an integer multiple of the distribution granularity of the operating system. For Windows operating systems, the allocation granularity is fixed to 64 KB. Of course, you can also use the following code to dynamically obtain the distribution granularity of the current operating system: SYSTEM_INFO sinf; GetSystemInfo (& sinf); DWORD dwAllocationGranularity = sinf. dwAllocationGranularity; The dwNumberOfBytesToMap parameter specifies the ing length of the data file. Note that for Windows 9x operating systems, if MapViewOfFile () you cannot find a large enough area to store the entire file ing object. a null value (NULL) is returned. However, in Windows 2000, MapViewOfFile () you only need to find a region large enough for the necessary view, without considering the size of the entire file ing object. After processing the file mapped to the process address space area, you need to release the file data image using the UnmapViewOfFile () function. The prototype declaration of this function is as follows: BOOL UnmapViewOfFile (LPCVOID lpBaseAddress); the unique parameter lpBaseAddress specifies the base address of the returned region and must be set to the return value of MapViewOfFile. After MapViewOfFile () is used, a corresponding UnmapViewOfFile () call is required. Otherwise, the reserved region cannot be released before the process ends. In addition, the file Kernel Object and file ing kernel object have been created by the CreateFile () and CreateFileMapping () functions. It is necessary to release the object through CloseHandle () before the process ends, otherwise, resource leakage may occur. In addition to the required API functions, You must select other auxiliary functions as needed when using memory ing files. For example, when using memory ing files, the system caches the data pages of files at a high speed to improve the speed, and does not immediately update the disk image of the files when processing the file ing view. To solve this problem, you can use the FlushViewOfFile () function, which forces the system to re-write some or all of the modified data to the disk image, this ensures that all data updates can be saved to the disk in a timely manner. The following is an example of using a memory ing file to process large files. A specific example is used to describe how to use the memory ing file. The instance receives data from the port and stores the data on the disk in real time. Due to the large amount of data (dozens of GB), the memory ing file is used for processing. The following describes some of the main code in the MainProc of the working thread, Which is started when the thread is running. When data arrives on the port, the event hEvent [0], WaitForMultipleObjects () will be triggered () after the function waits for the event to occur, it saves the received data to the disk. If it stops receiving the event, it will issue the event hEvent [1]. the event processing process is responsible for releasing resources and closing files. The specific implementation process of this thread processing function is as follows :...... // Create a file kernel object whose handle is stored in hFileHANDLE hFile = CreateFile ("Recv1.zip", GENERIC_WRITE | GENERIC_READ, file_assist_read, NULL, CREATE_ALWAYS, FILE_FLAG_SEQUENTIAL_SCAN, NULL ); // create a file ing kernel object, and store the handle in hFileMappingHANDLE hFileMapping = CreateFileMapping (hFile, NULL, PAGE_READWRITE, 0, 0x4000000, NULL ); // release the file Kernel Object CloseHandle (hFile); // set the size, offset, and other parameters _ int64 qwFileSize = 0x4000000 ;__ int64 qwFileOffset = 0; __int64 T = 600 * sin F. dwAllocationGranularity; DWORD dwBytesInBlock = 1000 * sinf. dwAllocationGranularity; // map file data to the address space of the Process PBYTE pbFile = (PBYTE) MapViewOfFile (hFileMapping, FILE_MAP_ALL_ACCESS, (DWORD) (qwFileOffset> 32), (DWORD) (qwFileOffset & 0 xFFFFFFFF), dwBytesInBlock); while (bLoop) {// capture event hEvent [0] and event hEvent [1] DWORD ret = WaitForMultipleObjects (2, hEvent, FALSE, INFINITE); ret-= WAIT_OBJECT_0; switch (ret) {// receives data event triggers Sending case 0: // receive data from the port and save it to the memory ing file nReadLen = syio_Read (port [1], pbFile + qwFileOffset, QueueLen); qwFileOffset + = nReadLen; // when the data is fully 60% written, a new if ing view if (qwFileOffset> T) {T = qwFileOffset + 600 * sinf needs to be created to prevent data overflow. dwAllocationGranularity; UnmapViewOfFile (pbFile); pbFile = (PBYTE) MapViewOfFile (hFileMapping, encoding, (DWORD) (qwFileOffset> 32), (DWORD) (qwFileOffset & 0 xFFFFFFFF ), dwBytesInBlock);} break ;// Case 1: bLoop = FALSE when the termination event is triggered; // undo the file data image UnmapViewOfFile (pbFile) from the address space of the process; // close the file ing object CloseHandle (hFileMapping ); break ;}}... During the process of terminating event triggering, if only the UnmapViewOfFile () and CloseHandle () functions are simply executed, the actual file size cannot be correctly identified, that is, if the opened memory ing file is 30 GB, the received data is only 14 GB, so after the above program is executed, the stored file length is still 30 GB. That is to say, after processing, the file must be restored to the actual size through the memory ing file again. The following is the main code to achieve this requirement: // create another file Kernel Object hFile2 = CreateFile ("Recv.zip", GENERIC_WRITE | GENERIC_READ, file_assist_read, NULL, CREATE_ALWAYS, FILE_FLAG_SEQUENTIAL_SCAN, NULL ); // create another file ing Kernel Object hFileMapping2 = CreateFileMapping (hFile2, NULL, PAGE_READWRITE, 0, (DWORD) (qwFileOffset & 0 xFFFFFFFF), NULL) based on the actual data length ); // close the file Kernel Object CloseHandle (hFile2); // map the file data to the address space of the Process pbFile2 = (PBYTE) MapViewO FFile (hFileMapping2, FILE_MAP_ALL_ACCESS, 0, 0, qwFileOffset); // copy data from the original memory ing file to this memory ing file memcpy (pbFile2, pbFile, qwFileOffset); file: // undo the file data image UnmapViewOfFile (pbFile); UnmapViewOfFile (pbFile2); // close the file ing object CloseHandle (hFileMapping); CloseHandle (hFileMapping2 ); // Delete the temporary file DeleteFile ("Recv1.zip"). Conclusion: The Memory ing file demonstrates good performance when processing large data files, it has obvious advantages over the file processing methods using functions such as CFile, ReadFile (), and WriteFile. The code described in this article is compiled by Microsoft Visual C ++ 6.0 under Windows 98.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.