Research on VC ++ IE Cache Management, vcie

Source: Internet
Author: User
Tags filetime delete cache

Research on VC ++ IE Cache Management, vcie

Introduction
There is very little information about IE Cache Management. Although there are some articles on the Internet, there are also tools to download (such as search cache or clear cache), but they are not comprehensive and in-depth.
In addition, IE Cache Management mainly relies on several index. dat files and wininet libraries. MSDN is very easy to help the wininet library, and there is no sample code.
Most of the information that can be found on the Internet is about wininet's http protocol processing interface, which is rarely described in the cache Processing Section. In addition, this part of interface definition is a bit obscure, so it seems a bit difficult.
Based on your project experience, this article provides a comprehensive and in-depth description of the cache mechanism of IE, especially the Cache Management Interface in the wininet library.
After understanding the wininet Cache Management Interface, it will be very easy to write cache monitoring, cache search, cache cleaning, and other tools.
 
2. Definition of Terms
None.

3. cache files for IE
3.1. IE cache category
IE cache is mainly divided into three categories: cookie, temporary files (including unexpired resources and offline files), and historical records.
Each of the three types of cache has an index file named index. dat.
The storage path is as follows (my computer is used as an example ):
Cache Content index: C:/Documents and Settings/liyafeng/Local Settings/Temporary Internet Files/Content. IE5/index. dat
Cache cookie index: C:/Documents and Settings/liyafeng/Cookies/index. dat
History index: C:/Documents and Settings/liyafeng/Local Settings/History. IE5/index. dat.
These three index. dat file formats have been encrypted, and Microsoft has not made public or intends to make public the file format. This approach has been criticized by many people, but Microsoft is still operating on its own.
The index. dat file cannot be deleted when the cache is cleared using internet Options of IE, but the three index. dat files are deleted when the temporary files are cleared on the disk.
The method to clear cache via IE is as follows:
 
Currently, the only way to access these index files is through the wininet library interface, which is also the focus of this article.
Note: The cached temporary file is specially processed by the file system and can only be accessed through the full directory path. Otherwise, it cannot be seen even if you display all the files and system files.

3.2. IE Cache Management Process
The process is as follows:
1. Start IE.
2. If it does not exist, create a cache Content index: C:/Documents and Settings/liyafeng/Local Settings/Temporary Internet Files/Content. IE5/index. dat.
3. If it does not exist, create a cookie index: C:/Documents and Settings/liyafeng/Cookies/index. dat.
4. If no historical record exists, create a historical index: C:/Documents and Settings/liyafeng/Local Settings/History. IE5/index. dat.
5. Call InternetOpen to initialize an IE application.
6. Set the callback function through InternetSetStatusCallback.
7. List cookies.
8. List temporary files.
9. List history records.
10. enumerate the access list of the current user.
11. Check whether the currently accessed url is cached.
12. If it is cached, the corresponding information will be read from the cache, including the last modification time, last access time, and expiration time.
13. Send an http request.
14. The http server checks whether the accessed resource has expired. If the resource has Not expired, the system returns "304 Not Modified ".
15. extract content from a local temporary file based on the cached index information.
16. If Step 1 is not cached, the resource will be downloaded directly from the http server.
 
4. Cache Management Interface of the Wininet Database
4.1. Introduction
The Wininet library has built-in simple but flexible buffer support. Any data received from the network is cached to the hard disk and then obtained in subsequent requests. Applications can control the cache of each request. For HTTP requests from the server, most of the received headers are also cached. When an HTTP request is received from the cache, the cached header data is returned to the caller. This makes data downloads transparent to users, whether from the cache or from the network.
Interfaces include enumeration cache, creating cache, querying cache, deleting cache, and operations on Cache groups.
 
4.2. Enumeration Cache
FindFirstUrlCacheEntry starts to enumerate the cache.
FindFirstUrlCacheEntryEx
FindNextUrlCacheEntry returns the next entry in the cache
FindNextUrlCacheEntryEx returns the next cache entry in the filter enumeration.
FindCloseUrlCache disables the specified enumeration handle
The FindFirstUrlCacheEntry and FindNextUrlCacheEntry functions can enumerate information stored in the cache. FindFirstUrlCacheEntry creates an enumeration handle using the input search mode, buffer zone, and size, and returns the first cache entry. FindNextUrlCacheEntry uses the handle created by FindFirstUrlCacheEntry, a buffer zone and its size, and returns the next cache entry.
Both functions store an INTERNET_CACHE_ENTRY_INFO entry in the buffer. For each entry, the struct size is different. If the input buffer size is not large enough, the function call fails. GetLastError returns ERROR_INSUFFICIENT_BUFFER. In this case, the buffer size parameter value indicates the buffer size required by the cache entry. You should allocate such a large buffer and then re-call the function.
The INTERNET_CACHE_ENTRY_INFO struct contains the struct size, cache URL, local file name, cache entry type, usage count, hit rate, size, last modification time, expiration time, last access time, last synchronization time, header information, header information size, and file extension information.
The FindFirstUrlCacheEntry function requires passing in the search mode and the buffer zone and its size used to store the INTERNET_CACHE_ENTRY_INFO struct. Currently, only the default search mode is implemented. It returns all cache entries.
After the cache enumeration is complete, use FindCloseUrlCache to disable the cache enumeration handle.
Note: When FindNextUrlCacheEntry returns FALSE, there are two possibilities. One is that the buffer allocated by INTERNET_CACHE_ENTRY_INFO entries is insufficient, and the other is that the enumeration ends.
As for how to handle insufficient cache areas and embedded pointers, the following sample code is provided in the query cache.
 
4.3. query cache entry information
GetUrlCacheEntryInfo
GetUrlCacheEntryInfoEx converts any cache redirection that will be used by HttpSendRequest in offline mode, and then searches for the specified URL.
The GetUrlCacheEntryInfo function gets the INTERNET_CACHE_ENTRY_INFO struct of the specified URL. Struct contains struct size, cache URL, local file name, cache entry type, usage count, hit rate, size, last modification time, expiration time, last access time, last synchronization time, Header information, header information size, and file extension information.
GetUrlCacheEntryInfo accepts a URL, a buffer for storing the INTERNET_CACHE_ENTRY_INFO struct, and its size. If a given URL is found, its information is copied to the buffer zone. Otherwise, the function call fails. GetLastError returns ERROR_FILE_NOT_FOUND. If the buffer size is insufficient to save the cache entry information, the function call fails. GetLastError returns ERROR_INSUFFICIENT_BUFFER. The buffer size parameter indicates the required buffer size.
GetUrlCacheEntry does not parse the URL. Therefore, for a URL containing the anchor (#), the function cannot be found even if the requested resource is in the cache. For example, if you specify http://example.com/example.htm?sample, then the error _ file_not_foundwill be returned when the parameter is saved.
Typedef struct _ INTERNET_CACHE_ENTRY_INFOA {
DWORD dwStructSize; // version of cache system.
LPSTR lpszSourceUrlName; // embedded pointer to the URL name string.
LPSTR lpszLocalFileName; // embedded pointer to the local file name.
DWORD CacheEntryType; // cache type bit mask.
DWORD dwUseCount; // current users count of the cache entry.
DWORD dwHitRate; // num of times the cache entry was retrieved.
DWORD dwSizeLow; // low DWORD of the file size.
DWORD dwSizeHigh; // high DWORD of the file size.
FILETIME LastModifiedTime; // last modified time of the file in GMT format.
FILETIME ExpireTime; // expire time of the file in GMT format
FILETIME LastAccessTime; // last accessed time in GMT format
FILETIME LastSyncTime; // last time the URL was synchronized
// With the source
LPSTR lpHeaderInfo; // embedded pointer to the header info.
DWORD dwHeaderInfoSize; // size of the above header.
LPSTR lpszFileExtension; // File extension used to retrive the urldata as a file.
Union {// Exemption delta from last access time.
DWORD dwReserved;
DWORD dwExemptDelta;
}; // Exemption delta from last access
} INTERNET_CACHE_ENTRY_INFOA, * LPINTERNET_CACHE_ENTRY_INFOA;
Note: embedded pointer is an embedded pointer, which is a relatively obscure point. The true meaning is to allocate a continuous memory after the struct, and then let the pointer point to it, as if embedded in the structure, therefore, it is named.
I don't know whether to explain it clearly. I will explain it using the sample code later.
Sample Code:
INTERNET_CACHE_ENTRY_INFOA * lpCacheEntryInfo = new INTERNET_CACHE_ENTRY_INFOA;
DWORD cbCacheEntryInfo = sizeof (INTERNET_CACHE_ENTRY_INFOA );
BOOL fOk = GetUrlCacheEntryInfoA (lpszUrl, lpCacheEntryInfo, & cbCacheEntryInfo );
If (! FOk & cbCacheEntryInfo> sizeof (INTERNET_CACHE_ENTRY_INFOA ))
{
LpCacheEntryInfo = (LPINTERNET_CACHE_ENTRY_INFOA) new char [cbCacheEntryInfo]; // force conversion to ensure that the pointer can point to the inside of the struct.
FOk = GetUrlCacheEntryInfoA (lpszUrl, lpCacheEntryInfo, & cbCacheEntryInfo );
PR_DEBUG ("do again: fOk: % d, cbCacheEntryInfo: % d", fOk, cbCacheEntryInfo );
}
 
4.4. Create cache entries
CreateUrlCacheEntry allocates the requested cache storage and creates a local file name to save the cache entries of the corresponding source name.
CommitUrlCacheEntry caches data in a specific file in the memory and associates it with a specified URL.
You can use CreateUrlCacheEntry and CommitUrlCacheEntry to create cache entries.
CreateUrlCacheEntry accepts the URL, the expected file size, and the file extension, and creates a local file name used to save the corresponding cache entry. You can use this file name to write data to a local file. Call CommitUrlCacheEntry after the data is written.
CommitUrlCacheEntry accepts URL, local file name, expiration time, last modification time, cache entry type, header information, size, and file extension, and stores file data in the cache storage, and associate with the given URL.
Note: CreateUrlCacheEntry only creates cache entries in the memory. You must call CommitUrlCacheEntry to write the entries to the index. dat file. In addition, CreateUrlCacheEntry creates an empty temporary file. CommitUrlCacheEntry verifies the relationship between file names and URLs. This is a way for IE to prevent users from moving temporary files to other places.

4.5. Delete cache entries
DeleteUrlCacheEntry: If the cache contains a file related to the source name, delete it.
DeleteUrlCacheEntry deletes cache files related to the specified URL. If no cached file is found, the function call fails. GetLastError returns ERROR_FILE_NOT_FOUND. If the cached file is currently locked or in use, the function call fails. GetLastError returns ERROR_ACCESS_DENIED. Files are deleted after they are unlocked.
 
4.6. Obtain cached files
RetrieveUrlCacheEntryFile obtains a cache entry from the cache as a file
UnlockUrlCacheEntryFile is used to unlock the cache because the RetrieveUrlCacheEntryFile is used to obtain the cached entries that are locked for use from the cache.
You can use the RetrieveUrlCacheEntryFile and UnlockUrlCacheEntryFile functions for started applications that require a resource file name.
Programs that do not require file names should use RetrieveUrlCacheEntryStream, ReadUrlCacheEntryStream, and UnlockUrlCacheEntryStream to obtain cache information.
RetrieveUrlCacheEntryFile accepts a URL and the buffer zone and its size used to save the INTERNET_CACHE_ENTRY_INFO struct, and obtains and locks the cache file for the caller.
After using the cached file, you should call UnlockUrlCacheEntryFile to unlock the file.
Note: although you can directly use readfile to read cache entries, there is no lock on the file at this time, so it may be modified or deleted during the read process. Therefore, RetrieveUrlCacheEntryFile is recommended.
Usage: Call RetrieveUrlCacheEntryFile to obtain the entry information and lock the file. The entry information contains the local file name of the cached file, call readfile to read the file content, and then call UnlockUrlCacheEntryFile to unlock the file.
 
4.7. Get the cache stream
RetrieveUrlCacheEntryStream provides the most efficient and unrelated method for accessing cached data.
ReadUrlCacheEntryStream reads cached data from the stream opened by RetrieveUrlCacheEntryStream.
RetrieveUrlCacheEntryStream, ReadUrlCacheEntryStream, and UnlockUrlCacheEntryStream are used to obtain cached resources.
RetrieveUrlCacheEntryStream accepts a URL, a buffer for storing the INTERNET_CACHE_ENTRY_INFO struct and its size, and a Boolean value indicating whether random reading is allowed. If a cached file is found, the function creates a file handle. The function does not parse the URL. Therefore, for a URL containing the anchor (#), it cannot be found even if the resource is in the cache. For example, if you specify http://example.com/example.htm?sample, then the error _ file_not_foundwill be returned when the parameter is saved.
ReadUrlCacheEntryStream requires that the handle, file offset, buffer zone, and size created by RetrieveUrlCacheEntryStream be input. If the buffer is insufficient to accommodate available data, function calling will fail. GetLastError returns ERROR_INSUFFICIENT_BUFFER, And the buffer size parameter is set to the buffer size required for downloading resources.
After obtaining the cache file, you should call UnlockUrlCacheEntryStream to disable the handle created by RetrieveUrlCacheEntryStream.
Note: stream operations are mainly applicable to applications that do not care about the local file name. All stream reading and closing operations must use the handle returned by RetrieveUrlCacheEntryStream.
 
4.8. cache group operations
CreateUrlCacheGroup generates a cache group ID
SetUrlCacheEntryGroup: add or delete entries to or from the cache group.
DeleteUrlCacheGroup releases the GROUPID and any State related to the cached index file.
To create a cache group, you must call CreateUrlCacheGroup to generate a GROUPID. Use the SetUrlCacheEntryGroup function to provide the URL of the cache entry and the INTERNET_CACHE_GROUP_ADD flag to add the entry to the cache group. To delete entries from the cache group, enter the URL of the entry and use the INTERNET_CACHE_GROUP_REMOVE flag.
The FindFirstUrlCacheEntryEx and FindNextUrlCacheEntryEx functions can enumerate entries in a specified cache group. After completing the enumeration, use FindCloseUrlCache to close the enumeration handle.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.