C++_benchmark_vector_list_deque

Source: Internet
Author: User
Tags benchmark intel core i7

Title:c++_benchmark_vector_list_deque
Date:2015-08-01 22:32:39

Titer1 + Zhangyu
Source: www.drysaltery.com + csdn Blog Sync
Contact: 13,073,161,968 (SMS only)
Disclaimer: This document is licensed under the following protocols: Free reprint-Non-commercial-non-derivative-retention Attribution | Creative Commons by-nc-nd 3.0, reproduced please specify the author and source.
Translation Source: C + + Benmark series

C++_benchmark_vector_list_deque

Last week, I compared STD at different workloads: vectors and std:: Vector benchmark. This previous article received a lot of comments and suggestions for improvement This article is a further expansion of the previous article.

In this article, we will use different data types under several different workloads to compare std: vectors, std:: List and std:: deque performance. In this article, the list, vector, and deque that I call are the corresponding functions of the standard library.

In common sense, a linked list should be used when random insertions and deletions are carried out, because at this point their relative complexity is O (n) for arrays and two-way queues, while the former is O (1) If we only look at complexity in these two data structures (Vector/deque) The scale of the linear search should be equivalent and the complexity of O (n). When a random insert/replace operation is performed in an array (vector) or double-ended queue (deque), all subsequent data needs to be moved so that the elements are also copied. This is why the size of the data type is an important factor in comparing these two data structures. Because the size of the data type will play an important role in the cost of copying elements.

In practice, however, there is a huge difference in the use of memory caches. All array (vector) data is contiguous, and the list in which it is allocated is a separate memory for each element. We will wait and see how this results in practice. The double-ended queue is a data structure that has the advantage of having the above two structures without their drawbacks, and we will see how it behaves in practice. Complexity analysis does not take into account the storage hierarchy (memory hierarchy) I believe that in practice, memory hierarchies and complexity analysis are just as important.

Keep in mind that all of the tests here are based on data (vectors), linked list (list), and bidirectional queues (deque), even if other data structures perform better under the given load.

In the and body, n refers to the number of elements in the collection.

All tests are based on the Intel Core i7 Q 820 @ 1.73GHz. The code uses GCC to compile in a 64-bit environment with parameters of-02 and-march=native. Oh, the code also supports the C + + 11 standard.

For each curve, the vertical axis represents the necessary time to perform the operation, so the smaller the value the better. The horizontal axis is always the number of elements in the collection. For some graphs, the logarithmic scale can be clearer, and a button is provided below the graph to change the vertical scale of each graph to the corresponding logarithmic scale.

Types of data of different sizes, they hold the array of longs, and the size of the array changes to change the size of the data type. Those non-standard data types, in extraordinary data types are made up of two bulls, have very foolish assignment operators and copy constructors, just do some math (completely meaningless, but expensive). One might think that it is not an ordinary copy constructor, neither a common assignment operator nor one will be correct, but the most important thing here is that it is expensive for the operator which is enough to make this benchmark.

Fill operation

The first Test performed is to populate the data structure by adding elements (with push_back) to the end of the container. Here you will use two arrays (vectors), where vector_pre is a standard vector that uses Vector:reserve to allocate memory at the outset, and it allocates the result of allocating memory nearly once throughout the process.

First let's look at the result of populating with a very small data type (8 bytes)

The pre-allocated array fill is the fastest, with a small gap between the vector/deque, and then the list is 3 times times slower than the other three types.

If we consider a larger size type (4k):

This time the vector and the list behave similarly, deque is faster than List/vector, the pre-allocated array is obviously the winner, the difference between the deque and the vector is probably caused by my system, and it is impossible to allocate that much memory at the same time on my machine. (This translation waits for the code to explain)

Finally, we will try a non-trivial type:

All data performance is not very significant, with Vector_pre being the fastest.
For the push_back operation, the pre-allocated vector is a good choice if you know beforehand the size of the allocation: Other data structures show little difference.
I prefer pre-allocated arrays (pre-allocated), and if some people find the reasons for these small gaps, please tell me that I am very interested.

Linear Lookup

The first action will be tested in the lookup, and the container will fill a random number between 0 and N. Where you can use Std:find for simple linear lookups to query every number from 0 to N. In theory, according to complexity, all data structures should behave consistently.

8 bytes


Obviously, the list looks bad in terms of search, and his time overhead is growing more than Vector/deque.

bytes

The only reason is because the buffer line is used. When a data is accessed, the data will be cached from the primary access. More than just the data being accessed, there is a whole buffer line to be taken. Since the data in the vector is continuous, when one accesses one of the data, the adjacent element is automatically placed in the cache. Because main memory's access speed is slow to cache an order of magnitude, there is a huge difference. In the case of a linked list, the processor spends all of its time waiting for the data to be accessed from the primary to the buffer, and for each fetch operation, the processor takes a large amount of almost useless unnecessary data.

The bidirectional queue deque is a bit slower than the vector, which is logically the past, because the presence of a fragment (segmented part) leads to more cache life.

Try a larger data type below,

The list is much slower than the other types, but interestingly, the intermittent between the two-way queue and array is getting smaller, and the next step is to try 4KB
The size of the data type.

4096 bytes


The list is still behaving poorly, but the gap with the other is getting smaller. The interesting point is that the two-way queue is faster than vector
I'm not sure what the result is, it's probably the size of this particular size. One thing is certain,
The larger the size of the data unit, the more chances that the processor is not going to be buffered because the element is not in the buffer.

For the lookup, the list is significantly slower, while the deque and vectors have similar performance, and look deque than the vector in
Faster for large-size find operations.

Random find (+ linear lookup) 8 bytes

bytes

128bytes

4096bytes

Non-trival 16bytes

Random deletion 8

Http://drysaltery.qiniudn.com/2015-08-02_002644.png

128

Http://drysaltery.qiniudn.com/2015-08-02_002658.png

4096

Http://drysaltery.qiniudn.com/2015-08-02_002710.png

Non-trival 16bytes

Http://drysaltery.qiniudn.com/2015-08-02_002723.png

Front plug 8bytes

Http://drysaltery.qiniudn.com/2015-08-02_002823.png

Sort 8bytes

Http://drysaltery.qiniudn.com/2015-08-02_002859.png

128bytes

Http://drysaltery.qiniudn.com/2015-08-02_002908.png

1024x768 bytes

Not 4096bytes here

Http://drysaltery.qiniudn.com/2015-08-02_002923.png

Non-trival 16bytes

Http://drysaltery.qiniudn.com/2015-08-02_002945.png

Destroy Operation 8 bytes

Http://drysaltery.qiniudn.com/2015-08-02_003055.png

bytes

Http://drysaltery.qiniudn.com/2015-08-02_003105.png

4096 bytes

Http://drysaltery.qiniudn.com/2015-08-02_003121.png

...

This time, we can see that deque is three times times slower than vector. And list is three orders of magnitude slower than vector! However, performance differences are decreasing

When testing with a data structure:

The performance of list and deque is not much different. The vectors are still twice times faster than they are.

Although vectors are always faster than even list and deque, this is due to the small cost of destructors, which can run out of microseconds. This affects only those programs that are sensitive to time, but most programs are less sensitive to time. In addition, each data structure is only refactored once, so the destructor is not a very important operation.

Numeric operation (number crunching)

Finally, we will test number crunching. Here, random elements are inserted into an ordered container. This means that the inserted position will be searched first before inserting. , the element tested in number crunching here is 8 bytes.

Even though there are only 100,000 elements, list is already one order of magnitude slower than the other two data structures. The result curve shows that the larger the amount of data, the worse the list behaves. The list's spatial locality is too poor, so it is not suitable for number crunching operations.

结论

Finally, for the previous data structure, we can get the following facts:

    • std: List because of the very poor spatial locality (spatial locality), so that the collection of the list of the traversal process is very slow.
    • std:: vector and std:: deque performance is always better for small data than std::list
    • STD:: list for handling large data structures
    • std::d eque is better than std::vector in random insertion, especially when inserting a lot of push_front.
    • STD:: deque and std: vector support for complex data types is not good because the cost of copying and assigning is too high

We can draw the ideal scenario for using these data structures:

    • Number crunching: Using std:: vector or std:: deque
    • Linear searching (Linear search): Using std:: vector or std:: deque
    • Random insertions/deletions (random insert/remove):
      • Small data: Using std:: vector
      • Large data: Using std:: List as long as there is no large number of search operations.
    • Complex data types (objects): Use std:: list, as long as there is no need for a large number of searches. However, when we want to modify the container multiple times, the list will become very slow.
    • Pre-insertion: Using std:: deque or std:: List

I confess, before writing this article, I do not know the std:: deque. However, std::d eque is a very good data structure. Whether the element is inserted before, after, or in the middle of the queue, std::d Eque has good performance and has good spatial locality (spatial locality). So, even if performance is worse than vector at times, in general we prefer to deque rather than vectors. Especially when working with medium-sized data structures.

If you have time, in practice, the best way to decide is to always benchmark each version and even try another data structure. Two operations have the same large o complexity in practice with completely different performances.

I hope this article will be of some help to you. If you have any comments or suggestions. Please do not hesitate to post a comment immediately.

Source code: Https://github.com/wichtounet/articles/blob/master/src/vector_list/bench.cpp

C++_benchmark_vector_list_deque

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.