Turn: http://community.csdn.net/Expert/topic/5631/5631339.xml? Temp =. 5729029.
[Ref]
Windows provides a wealth of operations for reading and writing files, such:
1. FILE * fp, fstearm...; (C/C ++)
2. CFile, CStdioFile...; (MFC)
3. CreateFile, ReadFile...; (API)
...
It is sufficient to process general files (text/non-text. However, when processing large files, such
Dozens of M, hundreds of M, or even GB of files, then the general means of processing, the system is obviously unable to do so.
To read the file and write it into the file, the CPU utilization, memory, and IO operations are required frequently. This is obviously
Unbearable
To solve the memory consumption, CPU occupation, and IO bottlenecks, windows core programming provides the memory ing file technology.
(Maping File)
As for the principle of Maping File, I will not talk much about it. I only want to repost a basket of resources from the application layer.
To consider how to use this technology to implement applications in daily projects.
For example:
A large number of constants may be frequently used in the project, and these constants are replaced by macros and then written in the source file.
Obviously not desirable. It is generally written in a file, and some numbers are given to constants, which are indexed by numbers.
When files are relatively small, the common practice is to pre-read them to the memory. After all, reading from the memory is faster than reading from the file (the bottleneck of IO operations)
For better practice, read the following in stl map:
For example, an index file:
SEU07201213 = a leaf in Wang Yang
JIANGSHENG = Jiang Sheng
SEU07201214 = CSDN
............
Open the file and parse the = sign. There are CString operations, strtok, strstr, boost regular expression matching, and so on in parsing, but I prefer
Sscanf (szIndex, "% [^ =] = % [^ =]", sName, sValue );
Sscanf (szIndex, "% [^ =] = % s", sName, sValue );
Fscanf (stream, "% [^ =] = % [^ =]", sName, sValue );
And so on,
Then define a map:
Map <string, string> m_Map;
M_Map [sName] = sValue;
However, when the file size is large, I did a test and used the above method to process a 15 M, 0.25 million lines of text files, occupying a lot of memory.
The processing speed is also very slow, which does not include writing back to files.
At this time, the Maping File will be used. The map application is discarded when processing large files (because the container occupies a lot of memory)
Instead, the character pointer is directly used for operations. No other encapsulation is needed. For more information, see the example:
# Pragma warning (disable: 4786)
# Include <windows. h>
# Include <stdio. h>
# Include <iostream>
# Include <string>
Using namespace std;
String GetValue (const TCHAR *, const TCHAR *); // obtain the value based on the name
Void main (int argc, char * argv [])
{
// Create a file object (C: est. tsr)
HANDLE hFile = CreateFile ("C: \ test. tsr", GENERIC_READ | GENERIC_WRITE,
0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL );
If (hFile = INVALID_HANDLE_VALUE)
...{
Printf ("failed to create file object, error code: % d", GetLastError ());
Return;
}
// Create a file ing object
HANDLE hFileMap = CreateFileMapping (hFile, NULL, PAGE_READWRITE, 0, 0, NULL );
If (hFileMap = NULL)
...{
Printf ("failed to create file ing object, error code: % d", GetLastError ());
Return;
}
// Obtain the system allocation granularity
SYSTEM_INFO SysInfo;
GetSystemInfo (& SysInfo );
DWORD dwGran = SysInfo. dwAllocationGranularity;
// Obtain the file size
DWORD dwFileSizeHigh;
_ Int64 qwFileSize = GetFileSize (hFile, & dwFileSizeHigh );
QwFileSize | = (_ int64) dwFileSizeHigh) <32 );
// Close the object
CloseHandle (hFile );
// Offset address
_ Int64 qwFileOffset = 0;
// Block size
DWORD dwBlockBytes = 1000 * dwGran;
If (qwFileSize <1000 * dwGran)
DwBlockBytes = (DWORD) qwFileSize;
If (qwFileOffset> = 0)
...{
// Ing View
TCHAR * lpbMapAddress = (TCHAR *) MapViewOfFile (hFileMap, FILE_MAP_ALL_ACCESS,
0, 0,
DwBlockBytes );
If (lpbMapAddress = NULL)
...{
Printf (" ing file ing failed, error code: % d", GetLastError ());
Return;
}
// ----------------------- Start of data access -------------------------
Cout <GetValue (lpbMapAddress, "SEU07201213") <endl;
Getchar ();
// ----------------------- End of data access -------------------------
// Undo the file Image
UnmapViewOfFile (lpbMapAddress );
}
// Close the file ing object handle
CloseHandle (hFileMap );
}
String GetValue (const TCHAR * lpbMapAddress, const TCHAR * sName)
{
String sValue; // The value after =
TCHAR * p1 = NULL, * p2 = NULL; // character pointer
If (p1 = strstr (lpbMapAddress, sName ))! = NULL) // find the position where the sName appears
{
If (p2 = strstr (p1, "\ r \ n") * p2 = '\ 0'; // find the position where "\ r \ n" (line feed) appears
SValue = p1 + strlen (sName) + strlen ("="); // move the pointer after "sName" + "="
* P2 = '\ R'; // restore * p2 value, because not restoring will change the original file structure
}
Return sValue;
}
...
The preceding simple process of matching value Based on index name is implemented. After testing, it takes less than 1 second to match the same rows of files, and
Does not occupy the memory of this process.
The above modification of the value of lpbMapAddress does not need to be written back to the file, which greatly improves the efficiency of file reading and writing.
This article Reprinted from the network base camp: http://www.pushad.com/Info/13520.Html
This article from the CSDN blog, reproduced please indicate the source: http://blog.csdn.net/zdl1016/archive/2007/07/03/1676403.aspx