I. Basic knowledge of Map
Map, also known as Dictionary, is a form set of elements consisting of key and value.
Generally, for map, the given key can be used to quickly retrieve the corresponding elements from the unit set. Therefore, when you need to search for a large amount of data and the search performance occupies an important position, map is undoubtedly an ideal container. For example, in MFC, map is used to implement handlemaps and some other internal data structures. MFC also provides public map classes. With the public map class, the MFC programmer can easily and efficiently implement custom mappings in the program according to their own needs.
Generally, when a map object is deleted, or when the elements are removed, the keywords and element values are also completely deleted.
From the perspective of Data Structure, typical map operations include:
1. insert element units with given keywords into map.
2. Search for element units with given keywords in map.
3. Delete element units with given keywords in map.
4. enumerate (traverse) All element units in the map.
Various map implementations in MFC provide member functions that implement the preceding operations. For the convenience of the discussion, we will explain it with cmap as the representative.
Once you have inserted a key-Value Pair (key-Value Pair) unit into the map, you can access the map by using the keyword, in this way, you can effectively search, add, or delete element elements, or traverse all elements in the map.
In addition to keyword access methods, cmap and so on in MFC also have a different type-position, which can also be used as an auxiliary way to access element units, you can use a position to "remember" an element unit or enumerate the map. You may think that the position traversal is equivalent to the map traversal using the keyword. In fact, this is not the case. To be exact, the equivalence of the two searches is uncertain.
Templates-based cmap classes are provided in MFC. The cmap template class can be used to process specific data types, such as custom classes or struct. In addition, MFC provides a non-Template Class Based on the specified data type, including:
Class Name keyword type element value type
Cmapwordtoptr words void pointers
Cmapptrtoword void pointers words
Cmapptrtoptr void pointers
Cmapwordtoob words objects
Cmapstringtoob strings objects
Cmapstringtoptr strings void pointers
Cmapstringtostring strings string
Ii. Working Principle of Map
The biggest advantage of using map is its excellent performance in fast search. The key to achieving the optimal performance is to minimize the number of element checks (comparisons) required during the search cycle. The performance of sequential search is the worst, because if you use the sequential search algorithm to search for an element in a map containing N element units, it is possible (in the worst case) N independent comparison operations are required.
The performance of Binary Search (compromise search) is slightly better. However, a problem that cannot be ignored is that binary search requires the sequence to be queried to be sorted, this will undoubtedly reduce the flexibility of map operations. In our understanding, the so-called best algorithm should be no matter the number of element units or the order in which elements are arranged, the search process does not require any additional comparison operations, by simply using a simple computing method, you can directly point to the fast and efficient algorithm of the final corresponding elements. This sounds a bit mysterious, but in fact, this algorithm is indeed possible (and, I believe, map can do it ).
In the cmap of MFC and Its Related map classes, as long as the map is correctly set, The lookup function can usually find any element at a time in place, however, it is seldom necessary to perform two or more searches.
How is this efficient search implemented?
This document uses the cmap template class in MFC as an example. After a map is created (usually the moment before the first element is inserted), the system allocates memory for a hash table pointing to the pointer array of the cassoc struct. MFC uses the cassoc struct to describe the combination of element values and keywords.
The cassoc struct is described as follows:
Struct cassoc
{
Cassoc * pnext;
Uint nhashvalue;
Cstring key;
Cstring value;
};
Whenever an element value-a keyword unit is added to a map, a new cassoc struct is created, calculate the corresponding hash value based on the actual value of the keyword in the unit. Copy a pointer to the cassoc struct and insert it to the position where the index value in the hash table is I. The formula for calculating I is as follows:
I = nhashvalue % nhushtablesize
In formula, nhashvalue is the hash value calculated by the actual value of the key keyword; nhashtablesize is the number of elements in the hash table (17 by default ).
If the position where the index value in the hash table is I already contains a cassoc pointer, MFC will create a separate list of cassoc struct ), the address of the first cassoc struct in the linked list is stored in the hash table, and the address of the second cassoc struct is stored in the pnext field of the previous cassoc struct, and so on. It shows a possible implementation of a hash table. In this hash table, there are a total of 10 elements, five of which are uniquely stored, the other five are stored in two linked lists with two or three lengths.
When calling a map Lookup () function, MFC calculates the corresponding hash value based on the actual value of the input keyword, and then converts the hash value to the index value using the formula mentioned above, and retrieves the cassoc pointer from the corresponding position in the hash table.
Ideally, this position contains only one cassoc pointer, not the cassoc pointer linked list. If the fact is exactly as we expected, a single address corresponds to a single cassoc pointer, then the element unit will be able to be located at a time and read directly; if the pointer header address of the cassoc linked list is retrieved from the hash table, the MFC sequence compares the keywords contained in the cassoc structure of the linked list element until the correct results are found. However, as we have discussed earlier, as long as the map is correctly set, there are generally no more than three elements in the linked list, which means that the search can usually be completed within the three-dimensional element comparison operation.
3. Optimize search efficiency
In the map of MFC, the search performance mainly depends on two factors:
1. Size of the hash table
2. An excellent algorithm that generates unique hash values as much as possible
The size of the hash table is very important for map search performance. For example, if a map contains 1000 element units but a hash table can only provide 17 spaces for storing cassoc pointers, each cassoc linked list in the hash table will also contain 58 or 59 cassoc struct. Naturally, in this case, the query performance will be severely impaired.
Hash algorithms are also an important factor affecting search efficiency. If the hash algorithm used can only generate a small number of different hash values (and thus only a small number of different hash table index values), the query performance will also be reduced.
The most effective way to optimize map search performance is to increase the hash table as much as possible to reduce the possibility of conflicts due to the same index value. Microsoft recommends setting the size of the hash table to 110% ~ 120%, so that the application performance of map is balanced between memory consumption and search efficiency.
In MFC, specify the hash table size. You can call the inithashtable () function:
Map. inithashtable (1200 );
If map needs to store 1000 elements, according to Microsoft's recommendation, the size of the hash table is extended to 120% of the actual number of stored elements, that is, the map size is set to 1200.
Statistically, using an odd number as the size of a hash table can also help reduce conflicts. Therefore, the inithashtable () function for initializing a hash table that stores 1000 elements can be used as follows:
Map. inithashtable (1201 );
At the same time, when calling the inithashtable () function, it should be noted that this function should be enabled before the map contains any element. If a map already contains one or more elements, changing the map size will cause assertion errors.
Although the hash algorithm used in MFC can be used in most cases, you can use your own algorithm to replace the original algorithm if you really need it or if you want it. To calculate the hash value of an input keyword, MFC usually calls a global template function hashkey (). For most data types, hashkey () functions are implemented in the following way:
Afx_inline uint afxapi hashkey (arg_key key)
{
File: // The default Algorithm in general.
Return (uint) (void *) (DWORD) Key)> 4;
}
But for strings, the specific implementation method is as follows:
Uint afxapi hashkey (lpcwstr key) // unicode encoded string
{
Uint nhash = 0;
While (* key)
Nhash = (nhash <5) + nhash + * Key ++;
Return nhash;
}
Uint afxapi hashkey (lpcstr key) file: // ANSI encoded string
{
Uint nhash = 0;
While (* key)
Nhash = (nhash <5) + nhash + * Key ++;
Return nhash;
}
To implement a user-defined hash algorithm corresponding to a specific data type, you can use the hashkey () function of the preceding string version as a reference to write a similar hashkey () of a specific type () function.
4. Use the cmap class in MFC
For the overview of the cmap class in MFC, the above paragraphs have been mentioned one after another, and I will not go into details here. Next, we will list the basic member functions of the cmap class and use a brief program snippet to roughly demonstrate how to use the cmap class.
Constructor:
Cmap constructs a collection class for key-value ing.
Operation:
Lookup searches for the corresponding element values using the given keywords.
Setat inserts an element unit into the map. If a key word is matched, it is replaced.
OPERATOR [] inserts an element-setat sub-operation into map
Removekey removes element units marked by keywords
Removeall removes all element units from the map.
Getstartposition returns the position of the first element unit.
Getnextassoc reads the next element Unit
Gethashtablesize returns the size of the hash table (number of element units)
Inithashtable initializes the hash table and specifies its size.
Status:
Getcount returns the number of elements in the map.
Isempty checks whether MAP is empty (no element Unit)
The application example is as follows:
Cmap mymap;
File: // initialize the hash table and specify its size (ODD ). Mymap. inithashtable (257 );
File: // Add element units to mymap.
For (INT I = 0; I <200; I ++)
Mymap. setat (I, cpoint (I, I ));
File: // Delete the element unit corresponding to the keyword with an even number.
Position Pos = mymap. getstartposition ();
Int nkey;
Cpoint pt;
While (Pos! = NULL)
{
Mymap. getnextassoc (Pos, nkey, pt );
If (nkey % 2) = 0)
Mymap. removekey (nkey );
}
# Ifdef _ debug
Afxdump. setdepth (1 );
Afxdump <"mymap:" <& mymap <"/N ";
# Endif
In the above application snippet, we can understand the common usage of the cmap class.
1. First, we use the cmap template class to define an instance-mymap object.
2. Next we need to initialize the size of the hash table of the mymap object. In this case, you should first estimate the potential capacity requirements of mymap, and then select an odd number of values -- or, if possible, the effect of using prime numbers will be better -- as the initial values of the hash table.
3. Then, add element units to mymap.
4. Use mymap to map, search, and traverse data.
5. Call the mymap. removeall () function to remove all elements and release the memory space occupied by mymap.
Cmap corresponds to implement_serial, which allows you to perform serialization and dumping operations on its elements. When pouring into the independent elements of cmap, you must set the depth of the dump context to 1 or a larger number.