High-speed massive data collection and storage technology based on memory ing principle ZZ

Source: Internet
Author: User

Based on memory ing principleHigh-speed massive  and storage technology

Cutedeer(Add my code)

The memory ing file technology is a new file data access mechanism provided by the Windows operating system. Using the memory ing file technology, the system can reserve a part of the space for the file in the 2 GB address space, and map the file to this reserved space. Once the file is mapped, the operating system manages page ing, buffering, high-speed buffering, and other tasks, you do not need to call the API functions for allocating and releasing memory blocks and file input/output, or provide any caching algorithms.
When there is a large amount of data files, the memory ing file technology can allocate enough memory for us to meet the request. This notable feature is closely related to the memory management of the operating system. In the Win32 operating system, each Win32 process has its own address space. Although they can have pointers of the same value for addressing, to ensure mutual independence between processes, A process cannot access the private data of another process, which improves the robustness of the system. On the other hand, each Win32 process uses a 4 GB address space, however, the 4 GB address space is only a virtual address rather than a real physical memory. Generally, the OS is submitted only when necessary. In different cases, the physical memory types submitted by the system are different, which may be Ram, it may also be the Virtual Memory Simulated by the hard disk. In short, the memory ing file technology is implemented only by memory management of the operating system.
In Windows, the address space of Win32 processes is divided as follows:
Memory ing files are divided into three situations: Memory ing of executable files, which is mainly used by the operating system itself; Memory ing of data files; the third is the memory ing of Page Swap files. When the system is working, first map part of the data file to the virtual address space (the ing area is 0x8000 ~ 0 xbfff), but do not submit Ram, access this memory command will generate a page exception, the system caught this exception, assign a page of RAM, and map it to the address where the current process encounters an exception. Then, the system reads the corresponding data in the file to this page and continues to execute the command that generated the exception just now. This is why the application itself does not need to call the I/O function, and is also the working mechanism of the memory ing file technology.
File operations are one of the most basic functions of applications. Both Win32 APIs and MFC provide functions and classes that support file processing. Commonly Used functions include createfile () and writefile () of Win32 APIs () readfile () and cfile class provided by MFC. In general, these functions can meet the requirements of most scenarios, but for some special application fields, the massive storage needs may be dozens of GB, hundreds of GB, or even several TB, it is obviously not feasible to use the common file processing method. Currently, the operations on such large files are generally handled in the memory ing file mode. This article will discuss this core windows programming technology.
Memory ing File
The memory ing file is similar to the virtual memory. You can use the memory ing file to reserve a region of the address space and submit the physical memory to this region, the physical memory for memory file ing only comes from a file that already exists on the disk, rather than a system page file. Before operating the file, you must map the file first, it is like loading the entire file from the disk to the memory. It can be seen that when using memory ing files to process files stored on disks, I/O operations on files are no longer required, this means that when processing a file, you do not have to apply for and allocate a cache for the file. All File Cache operations are directly managed by the system, the process of loading file data to the memory, writing back data from the memory to the file, and releasing memory blocks is canceled, this makes memory ing files play an important role in processing large data volumes. In addition, the system in the actual project usually needs to share data among multiple processes. If the data volume is small, the processing method is flexible and changeable. If the shared data capacity is large, therefore, we need to use the memory ing file. In fact, the memory ing file is the most effective solution to data sharing between multiple local processes.
Memory ing files are not simple file I/O operations, but actually use the core Programming Technology of Windows-memory management. Therefore, if you want to have a deeper understanding of memory ing files, you must have a clear understanding of the memory management mechanism of the Windows operating system. The knowledge of memory management is very complex, beyond the scope of this article, I will not go into details here. Interested readers can refer to other related books. The following describes how to use memory ing files:
First, you must use the createfile () function to create or open a file kernel object, which identifies the file to be used as the memory ing file on the disk. After createfile () is used to advertise the location of the file image in the physical storage to the operating system, only the path of the image file is specified, and the image length is not specified yet. To specify the size of the physical storage space required by the file ing object, you must use the createfilemapping () function to create a file ing kernel object to inform the System of the file size and file access method. After creating a file ing object, you must retain an address space area for the file data and submit the file data as a physical storage mapped to the region. The mapviewoffile () function maps all or part of the object mapped to the process address space through system management. In this case, the use and processing of memory ing files are basically the same as that of file data normally loaded into the memory. When the memory ing file is used, you also need to perform a series of operations to clear and release resources that have been used. This part is relatively simple. You can use unmapviewoffile () to cancel the image of file data from the address space of the process, and use closehandle () to close the previously created file ing object and file object.
Memory ing file functions
When using memory ing files, the APIs used are mainly the functions mentioned above. The following describes them respectively:
Handle createfile (lpctstr lpfilename,
DWORD dwdesiredaccess,
DWORD dw1_mode,
Lpsecurity_attributes lpsecurityattributes,
DWORD dwcreationdisposition,
DWORD dwflagsandattributes,
Handle htemplatefile );
The createfile () function is often used to create and open files even in common file operations. When Processing Memory ing files, this function creates/opens a file kernel object, and return the handle. When calling this function, you need to set the parameters dwdesiredaccess and dwredmode based on whether data reading and writing and file sharing are required, incorrect Parameter settings may cause operation failures.
Handle createfilemapping (handle hfile,
Lpsecurity_attributes lpfilemappingattributes,
DWORD flprotect,
DWORD dwmaximumsizehigh,
DWORD dwmaximumsizelow,
Lptstr lpname );
The createfilemapping () function creates a file ing Kernel Object and specifies the file handle to be mapped to the process address space through the hfile parameter (this handle is obtained by the return value of the createfile () function ). Because the physical memory of the memory ing file is actually stored in a file on the disk, rather than the memory allocated from the page file of the system, the system will not take the initiative to reserve the address space area for it, the bucket of the file is not automatically mapped to this region. To enable the system to determine the protection attribute of the page, you need to set the parameter flprotect, protection attributes page_readonly, page_readwrite, and page_writecopy indicate that after the file ing object is mapped, the file data can be read and written. When page_readonly is used, make sure that createfile () adopts the generic_read parameter; page_readwrite requires that createfile () adopts the generic_read | generic_write parameter; as for the property page_writecopy, you only need to ensure that createfile () use either generic_read or generic_write. DWORD parameters dwmaximumsizehigh and dwmaximumsizelow are also very important. The maximum number of bytes of the file is specified. Because these two parameters are 64-bit, the maximum file length supported is 16eb, it can meet almost all the requirements for processing large data files.
Lpvoid mapviewoffile (handle hfilemappingobject,
DWORD dwdesiredaccess,
DWORD dwfileoffsethigh,
DWORD dwfileoffsetlow,
DWORD dwnumberofbytestomap );
The mapviewoffile () function maps file data to the address space of the process. The hfilemappingobject parameter is the file image object handle returned by createfilemapping. The dwdesiredaccess parameter specifies the access method to the file data again, and must also match the protection attribute set by the createfilemapping () function. Although repeated setting of protection attributes seems redundant, the application can effectively control data protection attributes. The mapviewoffile () function allows all or part of the ing files. During ing, you must specify the offset address of the data file and the length to be mapped. The file offset address is specified by a 64-bit value consisting of DWORD parameters dwfileoffsethigh and dwfileoffsetlow, and must be an integer multiple of the distribution granularity of the operating system. For Windows operating systems, the allocation granularity is fixed to 64 KB. Of course, you can also use the following code to dynamically obtain the distribution granularity of the current operating system:
System_info sinf;
Getsysteminfo (& sinf );
DWORD dwallocationgranularity = sinf. dwallocationgranularity;
The dwnumberofbytestomap parameter specifies the ing length of the data file. Note that for Windows 9x operating systems, if mapviewoffile () cannot find a large enough area to store the entire file ing object, the return value is null. However, in Windows 2000, mapviewoffile () only needs to find a large area for the necessary view, regardless of the size of the entire file ing object.
After processing the file mapped to the process address space area, you need to release the file data image using the unmapviewoffile () function. The prototype declaration of this function is as follows:
Bool unmapviewoffile (lpcvoid lpbaseaddress );
The unique parameter lpbaseaddress specifies the base address of the returned region and must be set to the return value of mapviewoffile. After mapviewoffile () is used, a corresponding unmapviewoffile () call is required. Otherwise, the reserved region cannot be released before the process ends. In addition, the file Kernel Object and file ing kernel object have been created by the createfile () and createfilemapping () functions. It is necessary to release the object through closehandle () before the process ends, otherwise, resource leakage may occur.
In addition to the required API functions, You must select other auxiliary functions as needed when using memory ing files. For example, when using memory ing files, the system caches the data pages of files at a high speed to improve the speed, and does not immediately update the disk image of the files when processing the file ing view. To solve this problem, you can use the flushviewoffile () function, which forces the system to re-write some or all of the modified data to the disk image, this ensures that all data updates can be saved to the disk in a timely manner.
Bool flushviewoffile (lpvoid lpbaseaddress, DWORD dwnumberofbytestoflush );
This function calls the mapped view Address returned by mapviewoffile () and the number of bytes written to the disk. If flushviewoffile is called and no data has been changed, this function only returns and does not write any data to the disk.

The following is an example of my own program code:

1. Create a ing File

// Create a memory ing file of the specified size
M_hfile = createfile (m_filename, generic_read | generic_write, 0, null, open_always, file_attribute_normal, null );
If (m_hfile = invalid_handle_value)
{
Afxmessagebox ("An error occurred while creating the memory ing file! ");
Return;
}
// Create a file ing object
M_hmap = createfilemapping (m_hfile, null, page_readwrite, 0, file size (MB) * 1024*1024, null );
If (m_hmap = NULL)
{
Afxmessagebox ("An error occurred while creating the file ing object! ");
Return;
}
Closehandle (m_hfile );
// Obtain the system allocation granularity first
System_info sysinfo;
Getsysteminfo (& sysinfo );
Dwgran = sysinfo. dwallocationgranularity;
// Obtain the high 32-bit and low 32-bit values of the processed file length (64-bit ).
DWORD dwfilesizehigh;
Qwfilesize = getfilesize (m_hfile, & dwfilesizehigh );
Qwfilesize | = (_ int64) dwfilesizehigh) qwfileoffset = 0;
Qwfilealarm = 600 * dwgran;
Dwbytesinblock = 1000 * dwgran;

DWORD dwblockbytes = file size (MB) * 1024*1024; // 1000 * dwgran;
// Ing View
Lpbmapaddress = (pdword) mapviewoffile (m_hmap, file_map_all_access, (DWORD) (qwfileoffset> 32), (DWORD) (qwfileoffset & 0 xffffffffff), dwblockbytes );

2. ing data

Use the offset address in your collection thread to write data into the memory for ing:

* (Lpbmapaddress + qwfileoffset) = data;
// Automatically accumulate memory ing addresses
Qwfileoffset ++;

3. Save data

Exit the thread after the collection ends and save the data

M_hfilesave = createfile (m_filename (save file), generic_read | generic_write, 0, null, open_always, file_attribute_normal, null );

// Create another file to map the kernel object with the actual data length
M_hmapsave = createfilemapping (m_hfilesave, null, page_readwrite, 0, (DWORD) qwfileoffset & 0 xffffffff) * sizeof (DWORD);, null );
// Close the file Kernel Object
Closehandle (m_hfilesave );
// Map the file data to the address space of the process
Lpbmapaddresssave = (pdword) mapviewoffile (m_hmapsave, file_map_all_access, 0, 0, qwfileoffset * sizeof (DWORD ));
// Copy data from the original memory ing file to this memory ing File
Memcpy (lpbmapaddresssave, lpbmapaddress, qwfileoffset * sizeof (DWORD )););
// Undo the file data image from the address space of the process
Unmapviewoffile (lpbmapaddress );
Unmapviewoffile (lpbmapaddresssave );
// Close the file ing object

Closehandle (m_hmap );
Closehandle (m_hmapsave );
// Delete a temporary file
Deletefile (m_filename );

4. Notes

Note that the data type to be saved must correspond to the Data Type mapped to the memory and the Data Type stored at the end. Otherwise, a large amount of data will be lost.

_ Int64 qwfilesize; // memory ing File Size
_ Int64 qwfileoffset; // memory ing file data storage offset
-The end

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.