The list container size method of the potholes-designed complexity to O (n) for splice )?

Last Update:2018-12-05 Source: Internet

Author: User

Tags usleep

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently, I was working on a project with high performance requirements. A server needs to process 20 thousand UDP packets per second, with 40 elements in each package (of course, this is the peak period ). The server needs a linked list, and the algorithm has a logic to add each element to the end of the linked list (only the pointer of this element object, there is no problem with object replication ), then extract these elements from the Linked List (another time point ). A single thread is doing this.

Since the logic is so simple, I naturally chose the C ++ standard STL container list (Linux GNU, SGI Implementation). To come up with such a simple thing, it's just a single Insert at the end, just retrieve the header once. Use the STL list container. I didn't think it was a painful start. I expected that the processing of 0.8 million RMB per second should be easy and easy. I didn't expect the server to be unable to handle one thousand packets per second. The processing algorithm thread occupies of the CPU, A large number of packages cannot be processed in time and time-out occurs. Because of the complexity of the algorithm, it took a lot of time to locate the problem and finally felt that the list container seemed to have a serious performance problem.

So I simply wrote a simple linked list, which greatly improved the performance after replacing the STL container. For this reason, I specially wrote a simple program that roughly imitates the scenario in my algorithm. The procedure is as follows:

Insert n elements (pointers) into the linked list every 3 seconds, and then remove the n elements from the linked list and release them. Check the time t. If t is less than 3 seconds, sleep (3-T) seconds and print the sleep time.

On my testing machine, there were very different test results. When we tested 20 thousand elements every 3 seconds, the pressure program using STL list caused the CPU to reach 70%, however, using a simple linked list you write is almost invisible. The CPU usage is 20 million only when 80% elements are tested every 3 seconds.There is a one thousand-fold gap in the results! There is no object replication here, and all I inserted into the linked list is pointers!

(The following is a test program. Here we only compare the performance of the two lists. The machine parameters are not important.Please pay attention to 71 lines of code)

# Include <list> # include <sys/time. h ># include <iostream> using namespace STD; // the object to be tested. Each element in the linked list is the pointer class A of object {}; // Insert at the end of the linked list every 3 seconds/The number of elements retrieved from the linked list header int testpressurenum = 40000; // test the STL linked list <A *> testlist; // self-written chain table typedef struct {A * P; void * Prev; void * Next;} selflistelement; selflistelement * mylisthead; selflistelement * mylisttail; intmylistsize; // Add the element bool add (A * packet) {selflistelement * ele = new selflistel to the header of the Self-written linked list Ement; ele-> P = packet; mylistsize ++; If (mylisthead = NULL) {mylisthead = mylisttail = ele; ele-> Prev = NULL; ele-> next = NULL; return true;} ele-> next = mylisthead; mylisthead-> Prev = ele; ele-> Prev = NULL; mylisthead = ele; return true ;} // retrieve the element selflistelement * Get () {If (mylisttail = NULL) return NULL; mylistsize --; selflistelement * P = mylisttail; if (mylisttail-> Prev = NULL) {mylisthead = mylisttai L = NULL;} else {mylisttail = (selflistelement *) mylisttail-> Prev; mylisttail-> next = NULL;} return P ;} // retrieve the element from the STL linked list and delete void testdelete1 () {While (testlist. size ()> 0) // This line of statements has serious performance problems. The complexity of size is not constant level, but O (N). Please note! Here, we jumped to {A * P = testlist. back (); testlist. pop_back (); Delete P; P = NULL ;}// retrieve elements from the simple linked list and delete void testdelete2 () {do {selflistelement * packet = mylisttail; if (packet = NULL) break; packet = get (); Delete packet-> P; Delete packet; packet = NULL;} while (true );} // Add the element void testadd1 () {for (INT I = 0; I <testpressurenum; I ++) {A * P = new A (); testlist to the STL linked list. push_front (p) ;}}// Add the element void testadd2 () {for (INT I = 0; I <testpressurenum; I ++) {A * P = new A (); add (p) ;}} void printusage (INT argc, char ** argv) {cout <"Usage:" <argv [0] <"[1 | 2] [oneroundpressuenum]" <Endl <"1 means STL, 2 means simple list \ noneroundpressuenum means in 3 seconds how many elements add/del in list "<Endl;} int main (INT argc, char ** argv) {// two parameters, if (argc <2) {printusage (argc, argv); Return-1;}, can be used for convenient testing ;} int type = atoi (argv [1]); If (type! = 1 & type! = 2) {printusage (argc, argv); Return-2;} If (argc> = 2) testpressurenum = atoi (argv [2]); cout <"every 3 seconds Add/del element number is" <testpressurenum <Endl; struct timeval time1, time2; gettimeofday (& time1, null); While (true) {gettimeofday (& time1, null); If (type = 1) {testadd1 (); cout <"STL list has" <testlist. size () <"elements" <Endl;} else {testadd2 (); cout <"my list has" <mylistsize <"elements" <Endl ;} // gettimeofday (& time2, null) every 3 seconds; unsigned long interval = 1000000*(time2. TV _ sec-time1. TV _sec) + time2. TV _ usec-time1. TV _usec; If (interval <3000000) {cout <"after add sleep" <3000000-interval <"USEC" <Endl; usleep (3000000-interval );} elsecout <"add cost time too much" <interval <Endl; gettimeofday (& time1, null); If (type = 1) {testdelete1 (); cout <"STL list has" <testlist. size () <"elements" <Endl;} else {testdelete2 (); cout <"my list has" <mylistsize <"elements" <Endl ;} // gettimeofday (& time2, null) every 3 seconds; interval = 1000000*(time2. TV _ sec-time1. TV _sec) + time2. TV _ usec-time1. TV _usec; If (interval <3000000) {cout <"after Delete sleep" <3000000-interval <"USEC" <Endl; usleep (3000000-interval );} elsecout <"delete cost time too much" <interval <Endl;} return 0 ;}

The performance gap between one thousand times is too exaggerated. Why is STL performance so poor? I have never used STL containers in some scenarios with high performance requirements, and I am not familiar with it. After this blog is published,Chen Shuo helped to point out that there was a problem with the size () method of 71st rows! After the size () method is changed to the empty () method, the list performance has been greatly improved. Of course, there is still a gap with the linked list written above, the performance of the Self-written linked list is about 70% higher than that of the STL list!

I'm curious about what the size () method does? Let's look at its implementation!(STL code is SGI 3.3)

  size_type size() const {    size_type __result = 0;    distance(begin(), end(), __result);    return __result;  }

The size () method does not use a variable to store the length of the linked list as the linked list. Instead, it calls the distance method to obtain the length. So what does this distance method do?

template <class _InputIterator, class _Distance>inline void distance(_InputIterator __first,                      _InputIterator __last, _Distance& __n){  __STL_REQUIRES(_InputIterator, _InputIterator);  __distance(__first, __last, __n, iterator_category(__first));}

Another layer of _ distance is blocked to see what it has done?

template <class _InputIterator>inline typename iterator_traits<_InputIterator>::difference_type__distance(_InputIterator __first, _InputIterator __last, input_iterator_tag){  typename iterator_traits<_InputIterator>::difference_type __n = 0;  while (__first != __last) {    ++__first; ++__n;  }  return __n;}

It turns out to be traversal!Why do we have to traverse all the linked list elements to obtain the length of a linked list, instead of using a variable to represent it? Why is the SGI idea of designing a list so different (Microsoft's STL implementation does not have the efficiency problem of this size method )?

Look at the author's explanation: http://home.roadrunner.com /~ Hinnant/on_list_size.html

Starting point: the author used

splice(iterator position, list& x, iterator first, iterator last);

For the sake of its implementation, the size method is designed as O (n ).

The splice method is used to directly concatenate some elements in linked list A into linked list B. If size () is designed as O (1) complexity, when doing splice, You need to traverse the length between first and last (then, the length of the linked list saved by linked list A minus the length between first and last (elements to be moved )! So the author considers that the size method is designed as O (n), and does not need to traverse the splice method during execution!

Let's look at the implementation of Splice:

  void splice(iterator __position, list&, iterator __first, iterator __last) {    if (__first != __last)       this->transfer(__position, __first, __last);  }
Let's take a look at what transfer has done:
  void transfer(iterator __position, iterator __first, iterator __last) {    if (__position != __last) {      // Remove [first, last) from its old position.      __last._M_node->_M_prev->_M_next     = __position._M_node;      __first._M_node->_M_prev->_M_next    = __last._M_node;      __position._M_node->_M_prev->_M_next = __first._M_node;       // Splice [first, last) into its new position.      _List_node_base* __tmp      = __position._M_node->_M_prev;      __position._M_node->_M_prev = __last._M_node->_M_prev;      __last._M_node->_M_prev     = __first._M_node->_M_prev;       __first._M_node->_M_prev    = __tmp;    }  }
The author does consider that when splice is executed, only a few pointers can be moved instead of traversing. Therefore, the efficiency of size is sacrificed!
How to evaluate this design? The author's starting point is good, but after all, most programmers will subconsciously think that the complexity of the size method is constant, and the size method is also the most commonly used! The author is digging for us!
This example tells us that we must exercise caution when using third-party software, especially when there are high requirements for it, so we must have a sufficient understanding of all the methods to use it, it is not satisfied with what it can do, but must know how it can do it. Otherwise, you should be honest and practical with the tools you are familiar.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More