How to read large files in C language-memory ing

Source: Internet
Author: User

Windows provides a wealth of operations for reading and writing files, such:
1. FILE * fp, fstearm...; (C/C ++)
2. CFile, CStdioFile...; (MFC)
3. CreateFile, ReadFile...; (API)
...
It is sufficient to process general files (text/non-text. However, when processing large files, such
Dozens of M, hundreds of M, or even GB of files, then the general means of processing, the system is obviously unable to do so.
To read the file and write it into the file, the CPU utilization, memory, and IO operations are required frequently. This is obviously
Unbearable
To solve the memory consumption, CPU occupation, and IO bottlenecks, windows core programming provides the memory ing file technology.
(Maping File)
As for the principle of Maping File, I will not talk much about it. I only want to repost a basket of resources from the application layer.
To consider how to use this technology to implement applications in daily projects.
For example:
A large number of constants may be frequently used in the project, and these constants are replaced by macros and then written in the source file.
Obviously not desirable. It is generally written in a file, and some numbers are given to constants, which are indexed by numbers.
When files are relatively small, the common practice is to pre-read them to the memory. After all, reading from the memory is faster than reading from the file (the bottleneck of IO operations)
For better practice, read the following in stl map:
For example, an index file:
SEU07201213 = a withered grass
FANG = FANG
SEU07201214 = CSDN
............
Open the file and parse the = sign. There are CString operations, strtok, strstr, boost regular expression matching, and so on in parsing, but I prefer
Sscanf (szIndex, "% [^ =] = % [^ =]", sName, sValue );
Sscanf (szIndex, "% [^ =] = % s", sName, sValue );
Fscanf (stream, "% [^ =] = % [^ =]", sName, sValue );

And so on,
Then define a map:
Map <string, string> m_Map;
M_Map [sName] = sValue;
However, when the file size is large, I did a test and used the above method to process a 15 M, 0.25 million lines of text files, occupying a lot of memory.
The processing speed is also very slow, which does not include writing back to files.
At this time, the Maping File will be used. The map application is discarded when processing large files (because the container occupies a lot of memory)
Instead, the character pointer is directly used for operations. No other encapsulation is needed. For more information, see the example:
 
 
# Pragma warning (disable: 4786)
# Include <windows. h>
# Include <stdio. h>
# Include <iostream>
# Include <string>
 
Using namespace std;
 
String GetValue (const TCHAR *, const TCHAR *); // obtain the value based on the name
Void main (int argc, char * argv [])
{
// Create a file object (C: est. tsr)
HANDLE hFile = CreateFile ("C:/test. tsr", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL );
If (hFile = INVALID_HANDLE_VALUE)
{
Printf ("failed to create file object, error code: % d", GetLastError ());
Return;
}
// Create a file ing object
HANDLE hFileMap = CreateFileMapping (hFile, NULL, PAGE_READWRITE, 0, 0, NULL );
If (hFileMap = NULL)
{
Printf ("failed to create file ing object, error code: % d", GetLastError ());
Return;
}
// Obtain the system allocation granularity
SYSTEM_INFO SysInfo;
GetSystemInfo (& SysInfo );
DWORD dwGran = SysInfo. dwAllocationGranularity;
// Obtain the file size
DWORD dwFileSizeHigh;
_ Int64 qwFileSize = GetFileSize (hFile, & dwFileSizeHigh );
QwFileSize | = (_ int64) dwFileSizeHigh) <32 );
// Close the object
CloseHandle (hFile );
// Offset address
_ Int64 qwFileOffset = 0;
// Block size
DWORD dwBlockBytes = 1000 * dwGran;
If (qwFileSize <1000 * dwGran)
DwBlockBytes = (DWORD) qwFileSize;
If (qwFileOffset> = 0)
{
// Ing View
TCHAR * lpbMapAddress = (TCHAR *) MapViewOfFile (hFileMap, FILE_MAP_ALL_ACCESS,
0, 0,
DwBlockBytes );
If (lpbMapAddress = NULL)
...{
Printf (" ing file ing failed, error code: % d", GetLastError ());
Return;
}
 

// ----------------------- Start of data access -------------------------
Cout <GetValue (lpbMapAddress, "SEU07201213") <endl;
Getchar ();
// ----------------------- End of data access -------------------------

// Undo the file Image
UnmapViewOfFile (lpbMapAddress );
}
// Close the file ing object handle
CloseHandle (hFileMap );
}
String GetValue (const TCHAR * lpbMapAddress, const TCHAR * sName)
{
String sValue; // The value after =
TCHAR * p1 = NULL, * p2 = NULL; // character pointer
If (p1 = strstr (lpbMapAddress, sName ))! = NULL) // find the position where the sName appears
{
If (p2 = strstr (p1, "/r/n") * p2 = '/0'; // find the position where "/r/n" (line feed) appears
SValue = p1 + strlen (sName) + strlen ("="); // move the pointer after "sName" + "="
* P2 = '/R'; // restore * p2 value, because not restoring will change the original file structure
}
Return sValue;
}
...
 
The preceding simple process of matching value Based on index name is implemented. It is tested that the matching takes less than 1 second for 25 million rows of files and does not occupy the memory of this process.
The above modification of the value of lpbMapAddress does not need to be written back to the file, which greatly improves the efficiency of file reading and writing.
This article is from the blog "Love forever (I) the grass to be dry"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.