C/C ++ string processing (5): STD: deque and STD: textpool

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Xu Shiwei
2008-4-4

Introduction

STD: textpool is implemented based on STD: deque. So although STD: deque is discussed in this article, all conclusions are equally effective for STD: textpool.

Implementation Overview

As the name suggests, this is a "double-ended queue )". This means that the performance of inserting (deleting) data from the beginning and end of the queue is good. To achieve this, STD: deque is based on a piecewise Continuous Data Structure Between arrays and linked lists, as shown below:

template <class _E>
class deque
{
enum { BlockSize = 512 };

typedef _E Block[BlockSize];

std::vector<Block*> m_storage;
iterator m_first, m_last;
};

Here, iterator is the deque: iterator class. The specific implementation is not discussed here, but the following message is displayed: m_first, m_last is the begin (), end () of the container (). They are required because the data in the first block and the last block of m_storage may not be filled up and a pointer is required to point out the boundary. Imagine if we only want to implement a one-way queue, we can remove the m_first Member here (because the first block will not be dissatisfied if it is not the last block at the same time ).

Why do we need to adopt this piecewise continuous data structure? The answer is: performance. Deque is rarely used by C ++ programmers, but this is not the case from the performance indicators of containers. Deque is a class with optimal overall performance in STL value-based containers, including list/slist, vector, deque, and basic_string.

Next we will analyze it carefully.

Time Performance Analysis push_back/push_front

The two operations are no different for deque. Vector does not support push_front (not provided due to poor performance ). Compare the push_back performance of each container. As follows:

Vector

The performance of vector: push_back depends on the implementation of vector, mainly on the memory Growth Model of vector. The typical practice we have seen is the N * 2 model. That is to say, after the applied memory is filled up, we apply for N * 2 bytes of memory. Here, n is the space occupied by the current vector. In this case, the moving time of an element is (1/n * n) = 1 is a constant (1/n is the average moving times, and N is the data volume of each moving operation ), therefore, the complexity of push_back is O (1 ). However, the time and performance of this practice are met, and there is a huge waste of space. However, if the growth model is n + delta (delta is a constant growth factor), the time for moving an element is (1/Delta * n) = O (n ). Space is saved, but time performance is poor. The Erlang language introduces an interesting Growth Model Based on the Fibonacci sequence. The space waste is better than the N * 2 model, taking into account the space performance and time performance.

List

List: the performance of push_back is O (1 ). The main time overhead is the new node time. If we use GC Allocator, list: push_back, the speed is very fast.

Deque

Deque: the performance of push_back is close to that of O (1 ). The reason for not O (1) is that when m_storage is full, it will cause the same memory migration problem as the vector. Assuming that vector <block *> uses a 2 * n growth model, the performance of deque: push_back is obviously O (1 ). If the N + DELTA model is used, the moving time of the element is (1/(blocksize * delta) * n/blocksize) = O (n ). Although it is O (n), N/delta and N/(delta * blocksize) are still very different. Because m_storage.size () is usually very small, deque: push_back still performs well even in the case of massive data volumes.

OPERATOR []

OPERATOR [] is used to retrieve data by subscript. Obviously, the list complexity is O (n), which is very slow. The values of vector and deque are O (1 ). Let's imagine the implementation of deque: operator:

_E deque::operator[](int i)
{
return m_storage[i/BlockSize][i%BlockSize];
}

We can see that deque only has one more memory access than vector.

Spatial performance analysis push_backvector

Unfortunately, if the vector uses the N * 2 memory growth model (normally), the space complexity is 2 * n in the worst case, the best case is n (all memory is used ). On average, the space complexity is 1.5 * n. That is to say, almost half of the memory is wasted.

List

The waste of list space is much less than that of vector. Its spatial complexity is (1 + sizeof (pointer) * 2/sizeof (_ E) * n. If we make the list Storage Element pointer (I .e. _ e = pointer), the space complexity is 3 * n, which is a waste of space than the vector.

Deque

Deque's worst case space complexity is n + sizeof (pointer) * 2 * n/(blocksize * sizeof (_ E )) (assume that vector <block *> also uses a 2 * n growth model, and the average complexity is changed from 2 to 1.5 ). If the saved element is Pointer (I .e. _ e = pointer) and blocksize is set to 512, the space complexity is n + N/256. That is to say, in the worst case, only N/256 of memory is wasted.

The addresses of other feature elements of deque remain unchanged.

Because deque does not perform data migration, an interesting feature is that the element address of deque only has push_back/push_front and remains unchanged when no insert/Erase exists.

Note that vector does not have this feature. The following code is invalid:

std::vector<int> vec;
...
int& elem = vec[i];
vec.push_back(100);
elem = 99; // error: can't access elem since vec was changed!

Since the push_back operation exists after obtaining ELEM, the obtained element address (& ELEM) may become invalid due to memory migration. However, if we change the container to STD: deque <int>, this code will not have any problems.

std::deque<int> dq;
...
int& elem = dq[i];
dq.push_back(100);
elem = 99; // ok!

In addition, it should be noted that the element address remains unchanged and does not mean the iterator remains unchanged. The following code deque does not support:

std::deque<int> dq;
...
std::deque<int>::iterator it = dq.begin() + i;
dq.push_back(100);
*it = 99; // error: can't access iterator since deque was changed!

Conclusion

By comparing the time and space performances of vector, list, And deque, we can see that we recommend using the deque container as much as possible. In particular, deque is the most suitable candidate to handle massive data.

References

STD: Vector
STD: List
STD: textpool

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

C/C ++ string processing (5): STD: deque and STD: textpool

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support