This article supplements the "Bullet HashMap memory compact hash Table" debt owed (hereinafter referred to as "Bthashmap").
"Bthashmap" is a theoretical analysis of the bullet hash map (BTHASHMAP) and C + + standard library hash map (STD::UNORDERED_MAP) memory layout. Bthashmap and Std::unordered_map, like most dictionary of voice environments, are designed to be dynamically grown containers; I asserted in bthashmap that when size is small, bthashmap is relatively std::unordered_ Map has better performance, but it does not point out that--size has better performance in terms of Bthashmap, and that this number may be associated with those environmental parameters. This article will answer both questions using the Experiment (test code) and the data (test results).
Bullet source code has recently migrated to GitHub, GitHub connection: Https://github.com/bulletphysics/bullet3
All test code for this article:
Github.com:https://github.com/xusiwei/hashmapbenchmark
(Spare Https://code.csdn.net/xusiwei1236/bthashmapbenchmark) warm up, let bthashmap for my use
Bthashmap is part of the bullet project, then the first question comes-how to use it.
Bthashmap's definition and statement are located in Src/linearmath/bthashmap.h, according to the source is not difficult to find, it also relies on btAlignedObjectArray.h ' btAlignedAllocator.h, BtAlignedAllocator.cpp, BtScalar.h. With these files, Bthashmap can work properly. Let's start with a simple example and see how to use it.
View Bthashmap source code, you can find that Bthashmap key depends on the. Gethash () to get the hash value, you can also find that btHashMap.h defines several classes for the key, such as Bthashint, bthashstring.
With these basics, you can easily write a demo, as follows
WarmUp.cpp:
Bthashmap Warm up example by Xu, http://blog.csdn.net/xusiwei1236
#include "btHashMap.h"
#include < stdio.h>
int main ()
{
bthashmap<bthashint, bthashint> btmap;
int k = 1234, v = 5678;
Btmap.insert (Bthashint (k), Bthashint (v));
bthashint* PVal = Btmap.find (Bthashint (k));
if (PVal = = NULL) {
printf ("Key:%d not found in btmap\n", k);
}
else {
printf ("Found key:%d, value:%d in btmap\n", K, v);
}
return 0;
}
Now, there are 6 files in the current directory:
xusiwei1236@blog.csdn.net:~/data/test/bthashmap$ ls
btAlignedAllocator.cpp btAlignedAllocator.h BtAlignedObjectArray.h btHashMap.h btScalar.h warmUp.cpp
Attempt to compile WarmUp.cpp:
xusiwei1236@blog.csdn.net:~/data/test/bthashmap$ g++ warmup.cpp/tmp/ccz7i4vi.o:in function ' btAlignedAllocator< int, 16u>::d eallocate (int*) ': WarmUp.cpp: (. text._zn18btalignedallocatoriilj16ee10deallocateepi[ Btalignedallocator<int, 16u>::d eallocate (int*)]+0x18): Undefined reference to ' btalignedfreeinternal (void*) '/ tmp/ccz7i4vi.o:in function ' Btalignedallocator<bthashint, 16u>::d eallocate (bthashint*) ': WarmUp.cpp: (. text._ Zn18btalignedallocatori9bthashintlj16ee10deallocateeps0_[btalignedallocator<bthashint, 16u>::d eallocate ( bthashint*)]+0x18): Undefined reference to ' btalignedfreeinternal (void*) '/tmp/ccz7i4vi.o:in function ' Btalignedallocator<bthashint, 16u>::allocate (int, bthashint const**) ': WarmUp.cpp: (. text._ Zn18btalignedallocatori9bthashintlj16ee8allocateeippks0_[btalignedallocator<bthashint, 16u>::allocate (int , Bthashint const**)]+0x25): Undefined reference to ' btalignedallocinternal (unsigned long, int) '/tmp/ccz7i4vi.o:in func tion ' BtaligNedallocator<int, 16u>::allocate (int, int const**) ': WarmUp.cpp: (. text._ Zn18btalignedallocatoriilj16ee8allocateeippki[btalignedallocator<int, 16u>::allocate (int, int const**)]+ 0x25): Undefined reference to ' btalignedallocinternal (unsigned long, int) ' Collect2:ld returned 1 exit status
A link error occurred and Btalignedallocator member functions were not found because Btalignedallocator member functions are implemented in. cpp, so you need to compile them separately and then link them:
xusiwei1236@blog.csdn.net:~/data/test/bthashmap$ g++-C btAlignedAllocator.cpp
xusiwei1236@blog.csdn.net:~/ data/test/bthashmap$ ls btalignedallocator.*
btAlignedAllocator.cpp btAlignedAllocator.h BTALIGNEDALLOCATOR.O
xusiwei1236@blog.csdn.net:~/data/test/bthashmap$ g++-C warmUp.cpp
xusiwei1236@blog.csdn.net:~/data/test/bthashmap$ ls warmup.*
warmUp.cpp warmup.o
xusiwei1236@blog.csdn.net:~/data/test/bthashmap$ g++ warmup.o btalignedallocator.o
xusiwei1236@blog.csdn.net: ~/data/test/bthashmap$ ls
a.out
A.out generated, running:
xusiwei1236@blog.csdn.net:~/data/test/bthashmap$./a.out
found key:1234, value:5678 in Btmap
about Timing
For the measurement of the time performance of the program, you need to consider whether the actual time used (real times) or the CPU time of the processes (process CPU times). The difference is that the actual time includes the CPU time used by the test process (also including test process hibernation, process scheduling, and so on).
From the actual user point of view of the test scenario, need to use the actual time, but for the details of the algorithm measurement, often need to use the process of CPU time to measure (more can reflect the advantages and disadvantages of the algorithm itself). get the CPU time of the process
C + + tip:how to measure CPU times for Benchmarking (hereinafter referred to as HMCT, this article details how to obtain the CPU time of a process on a common operating system platform. and implement a Cross-platform CPU time measurement function Getcputime) said:
A process ' s CPU time accumulates as the process runs and consumes CPU cycles. During I/O operations, thread locks, and other operations this cause the process to pause, CPU time accumulation Also Es until the process can again make headway.
The CPU time of a process accumulates the number of CPU cycles (PS: cycles) that are consumed by the process running, and this is for fixed-frequency CPUs, which should be said to be able to convert the CPU to the current mainstream. The accumulation of CPU time during the I/O operation, thread Lock (suspend), and other operations that cause the process to suspend will be halted until the process is executed again.
Here's a quick summary of the two timer functions clock and clock_gettime that are on the POSIX platform:
Clock is a function defined in the ISO C89 standard, and its declaration is located in the <time.h> of the C standard library (<ctime> of the C + + standard library):
clock_t clock (void);
There are several mainstream platforms, but on different platforms, the return value meaning, the actual type of clock_t may be slightly different, mostly: from the program boot to the call clock () always pass the tick number, constant CLOCKS_PER_SEC defines the number of ticks in a second. In addition, the HMCT indicates the wall clock time returned by clock () on Windows, not the time when the process was started.
Clock_gettime is POSIX-defined, and its declarations are generally located in Time.h:
int Clock_gettime (clockid_t __clock_id, struct timespec *__tp);
One of the obvious benefits of its relative clock is that it can get a higher time precision. This function and struct TIMESEPC are available on all POSIX-compliant OSS, but the corresponding Clockid parameters differ on different OS, such as Linux availability Clock_process_cputime_ ID gets the CPU time of the process for the Clockid parameter.
Clock () corresponding to the number of tick clocks_per_sec in C89, the C99 standard is defined as 1000,000, in the glibc is also the value, the theoretical accuracy should be able to achieve 1ms, and I actually measured the accuracy of the timing can only 10ms (Ubuntu 12.04, kernel version 3.11.0-26,GCC version 4.6.3).
"HMCT" also provides a compatible implementation of different operating system getcputime (), the function interface declaration is as follows:
Double Getcputime ();
This is no longer listed (can be found in the original text). Timer
According to the Getcputime () of "hmct", a Timer class (modeled Boost::timer) that is packaged for timing:
Modify from Boost::timer, by Xu, http://blog.csdn.net/xusiwei1236
class timer
{public
:
timer () {_ Start_time = Getcputime (); }
void restart () {_start_time = Getcputime ();}
Double elapsed () const//return elapsed time in seconds
{return getcputime ()-_start_time;}
Private:
double _start_time;
//Timer
several application scenarios
The following constructs a few concrete test scenarios, and gradually improve. Although it is yy,:-) benchmark 1 word statistics.
Word statistics ———— This test comes from the programming Zhuji, reads a text file, and counts the frequency of words appearing in the file.
Here to modify it slightly, leaving the statistical process (including insert, find operation), delete the output of the entire map (iterative operation), the core part of Pseudocode (Python-style):
For word in text:
if Dict.find (word):
Dict[word] + + 1
else:
dict.insert (Word, 0)
Bthashmap and Std::unordered_map differ slightly in the parameters and return values of Find and insert:
This is the Std::unordered_map version:
C++11 Auto key word, to indicates std::unordered_map<std::string, int>::iterator
Auto pos = dict.find (word);
if (POS!= dict.end ()) {//Found
pos->second++;
}
else {//not found
Dict.insert (Std::make_pair (Word, 1));
This is the Bthashmap version:
Bthashstring key (word); Bthashmap not supprt std::string.
bthashint* val = btdict.find (key);
if (Val!= NULL) {
val->setuid1 (VAL->GETUID1 () + 1);
}
else {
Btdict.insert (key, Bthashint (1));
}
Full test of Bthashmap:
void Btbench (const char* text, int length)
{
bthashmap<bthashstring, bthashint> btdict;
int count = 0;
int cursor = 0;
Char word[256];
Timer T;
Do {
cursor + = took (&text[cursor], Word, NULL);//took next word.
if (!word[0]) break; No more word.
count++;
Bthashstring key (word);
bthashint* val = btdict.find (key); Lookup
if (val!= NULL) {//Found
val->setuid1 (VAL->GETUID1 () + 1);
}
else {//not found
Btdict.insert (Key, Bthashint (1));
}
while (cursor < length);
Double timeused = t.elapsed ();
printf ("%9s:time used:%.3f, Word tooks:%d\n", __func__, timeused, count);
}
(Std similar, slightly)
Test program through the command line to pass in a text file name, the entire file read into memory, in order to extract words for Word statistics, the specific code see Benchmark.cpp.
Using the Oxford Dictionary as input, a set of data was measured as follows:
Stdbench:time used:0.196, Word tooks:695882
btbench:time used:0.061, Word tooks:695882
More than 690,000 words were counted, bthashmap significantly faster than Std::unordered_map.
Because the Std::map interface and Std::unordered_map are the same, replace the STD::MAP test results:
Stdbench:time used:0.687, Word tooks:695882
btbench:time used:0.061, Word tooks:695882
It can be found that the std::map is slower than Std::unordered_map, because the std::map of the underlying implementation is the red-black tree, the average complexity of Find/insert is O (log2 N), and Std::unordered_map is a hash table, The average complexity of Find/insert is O (1).
In fact, for the same word in this example, the hash algorithm used in two versions of find is not the same (Bthashmap is used Bthashstring::gethash, Std::unordered_map is std::hash<std: :string>). Benchmark 2 random number statistics
In benchmark 1, considering that the string hash algorithm used for the two hash map is different, the measured results may be slightly affected (in fact, the impact is very small). Here simply change the string to int to do the key, so that the performance parameters measured and hash algorithm is irrelevant.
So here's the question ———— where the data comes from, of course, it can be read from the command line, but it feels too low.
Simply using random numbers, you can start with a seed and generate two random sequences (Rand can guarantee that the same random sequence is generated).
Bthashmap version (Std::unordered_map similar, no paste)
Double btbench (int seed, long tests)
{
bthashmap<bthashint, bthashint> dict;
Srand (seed); Setup random seed.
Timer T;
for (long i = 0; I < tests ++i) {
int r = rand ();//Generate random int.
Lookup in the hash map.
bthashint* val = Dict.find (Bthashint (R));
if (Val!= NULL) {//found, update directly.
VAL->SETUID1 (VAL->GETUID1 () + 1);
}
else {//not found, insert <key, 1>
dict.insert (Key, Bthashint (1));
}
return t.elapsed ();
}
The test program passes the command line parameter to pass the test number, the printing test times and the time, the concrete code see BenchmarkII.cpp.
Follow the command, execute multiple test procedures, and pass the test times to get results:
$ for ((i = 2048 i <= 2**27; I *= 2)); do./B2 $i; Done
2048 0.001 0.000
4096 0.004
0.001 8192 0.005 0.002 16384 0.006 0.004
32768 0.022 0.007
65536 0.023 0.014 131072 0.054 0.029
262144 0.126 0.065 524288 0.289 0.175 1048576 0.509 0.434
2097152 1.056 1.038
4194304 2.182 2.314 8388608 4.549 4.935
16777216 9.307 10.411
33554432 20.002 21.396 67108864 41.641 47.422
134217728 90.582 104.393
From this set of data can be seen, i<=2097152 (2^21), Bthashmap are better than std::unordered_map performance. Below is an analysis of why when size reaches a certain number of times, bthashmap performance is less than Std::unordered_map. Rehash cost analysis of Bthashmap and Std::unordered_map
First of all, rehash not all of the hash table needs, it only in the dynamic growth of the hash table needs, most of the voice environment HashMap can dynamically grow, that is, the need for rehash.
When the hash map's current hold of memory is not enough to put down the new elements, you need to request more memory, and maintain the original logical relationship, which is the rehash of the hash table.
From the test result data above, we can see that the bthashmap performance is worse than std::unordered_map when the size is larger. This is because of the difference in the memory layout design between the two, resulting in a larger size when the two rehash cost difference.
In retrospect, Std::unordered_map's memory layout is "textbook"--a header called buckets, where each slot hangs with a hash value equal to its index of all benchmark 3 Find/insert individual tests
In benchmark 2, the Find/insert in the test process is cross and unpredictable. So here's a simple Find/insert performance test:
Bthashmap<bthashint, bthashint> btdict;
Double btinsertbench (int seed, long tests)
{
srand (seed);
Timer T;
for (long i = 0; I < tests ++i) {
Btdict.insert (Bthashint (rand ()), Bthashint (1));
return t.elapsed ();
}
Double btfindbench (int seed, long tests)
{
srand (seed);
Timer T;
for (long i = 0; I < tests ++i) {
btdict.find (Bthashint (rand ()));
return t.elapsed ();
}
When the test, first uses the btinsertbench to Btdict to fill the data, then uses the Btfindbench to measure the find performance data.
unit of measurement , the previous tests are measured in total time as "speed", where a more intuitive rate unit is used: OPS (Operation per second):
Double stdinserttime = Stdinsertbench (seed, tests);
Double Stdfindtime = Stdfindbench (seed, tests);
Double btinserttime = Btinsertbench (seed, tests);
Double Btfindtime = Btfindbench (seed, tests);
printf ("%11d\t% 5.3f\t% 5.3f\t% 5.3f\t% 5.3f\n", tests, Stdinserttime, Stdfindtime, Btinserttime, btfindtime); In the total Times
printf ("%11ld\t%11ld\t%11d\t%.0f\t%.0f\t%.0f\t%.0f\n",
tests, Stddict.size (), btdict.size (), Tests/stdinserttime, Tests/stdfindtime, Tests/btinserttime, tests/btfindtime); In OPS
Reference
Reprint Please indicate the source (http://blog.csdn.net/xusiwei1236) and the original link, welcome comments or email exchange of views.
C + + tip:how to measure CPU time for benchmarking