C/C ++ string processing (5): STD: deque and STD: textpool
Xu Shiwei
2008-4-4
Introduction
STD: textpool is implemented based on STD: deque. So although STD: deque is discussed in this article, all conclusions are equally effective for STD: textpool.
Implementation Overview
As the name suggests, this is a "double-ended queue )". This means that the performance of inserting (deleting) data from the beginning and end of the queue is good. To achieve this, STD: deque is based on a piecewise Continuous Data Structure Between arrays and linked lists, as shown below:
template <class _E>
class deque
{
enum { BlockSize = 512 };
typedef _E Block[BlockSize];
std::vector<Block*> m_storage;
iterator m_first, m_last;
};
Here, iterator is the deque: iterator class. The specific implementation is not discussed here, but the following message is displayed: m_first, m_last is the begin (), end () of the container (). They are required because the data in the first block and the last block of m_storage may not be filled up and a pointer is required to point out the boundary. Imagine if we only want to implement a one-way queue, we can remove the m_first Member here (because the first block will not be dissatisfied if it is not the last block at the same time ).
Why do we need to adopt this piecewise continuous data structure? The answer is: performance. Deque is rarely used by C ++ programmers, but this is not the case from the performance indicators of containers. Deque is a class with optimal overall performance in STL value-based containers, including list/slist, vector, deque, and basic_string.
Next we will analyze it carefully.
Time Performance Analysis push_back/push_front
The two operations are no different for deque. Vector does not support push_front (not provided due to poor performance ). Compare the push_back performance of each container. As follows:
Vector
The performance of vector: push_back depends on the implementation of vector, mainly on the memory Growth Model of vector. The typical practice we have seen is the N * 2 model. That is to say, after the applied memory is filled up, we apply for N * 2 bytes of memory. Here, n is the space occupied by the current vector. In this case, the moving time of an element is (1/n * n) = 1 is a constant (1/n is the average moving times, and N is the data volume of each moving operation ), therefore, the complexity of push_back is O (1 ). However, the time and performance of this practice are met, and there is a huge waste of space. However, if the growth model is n + delta (delta is a constant growth factor), the time for moving an element is (1/Delta * n) = O (n ). Space is saved, but time performance is poor. The Erlang language introduces an interesting Growth Model Based on the Fibonacci sequence. The space waste is better than the N * 2 model, taking into account the space performance and time performance.
List
List: the performance of push_back is O (1 ). The main time overhead is the new node time. If we use GC Allocator, list: push_back, the speed is very fast.
Deque
Deque: the performance of push_back is close to that of O (1 ). The reason for not O (1) is that when m_storage is full, it will cause the same memory migration problem as the vector. Assuming that vector <block *> uses a 2 * n growth model, the performance of deque: push_back is obviously O (1 ). If the N + DELTA model is used, the moving time of the element is (1/(blocksize * delta) * n/blocksize) = O (n ). Although it is O (n), N/delta and N/(delta * blocksize) are still very different. Because m_storage.size () is usually very small, deque: push_back still performs well even in the case of massive data volumes.
OPERATOR []
OPERATOR [] is used to retrieve data by subscript. Obviously, the list complexity is O (n), which is very slow. The values of vector and deque are O (1 ). Let's imagine the implementation of deque: operator:
_E deque::operator[](int i)
{
return m_storage[i/BlockSize][i%BlockSize];
}
We can see that deque only has one more memory access than vector.
Spatial performance analysis push_backvector
Unfortunately, if the vector uses the N * 2 memory growth model (normally), the space complexity is 2 * n in the worst case, the best case is n (all memory is used ). On average, the space complexity is 1.5 * n. That is to say, almost half of the memory is wasted.
List
The waste of list space is much less than that of vector. Its spatial complexity is (1 + sizeof (pointer) * 2/sizeof (_ E) * n. If we make the list Storage Element pointer (I .e. _ e = pointer), the space complexity is 3 * n, which is a waste of space than the vector.
Deque
Deque's worst case space complexity is n + sizeof (pointer) * 2 * n/(blocksize * sizeof (_ E )) (assume that vector <block *> also uses a 2 * n growth model, and the average complexity is changed from 2 to 1.5 ). If the saved element is Pointer (I .e. _ e = pointer) and blocksize is set to 512, the space complexity is n + N/256. That is to say, in the worst case, only N/256 of memory is wasted.
The addresses of other feature elements of deque remain unchanged.
Because deque does not perform data migration, an interesting feature is that the element address of deque only has push_back/push_front and remains unchanged when no insert/Erase exists.
Note that vector does not have this feature. The following code is invalid:
std::vector<int> vec;
...
int& elem = vec[i];
vec.push_back(100);
elem = 99; // error: can't access elem since vec was changed!
Since the push_back operation exists after obtaining ELEM, the obtained element address (& ELEM) may become invalid due to memory migration. However, if we change the container to STD: deque <int>, this code will not have any problems.
std::deque<int> dq;
...
int& elem = dq[i];
dq.push_back(100);
elem = 99; // ok!
In addition, it should be noted that the element address remains unchanged and does not mean the iterator remains unchanged. The following code deque does not support:
std::deque<int> dq;
...
std::deque<int>::iterator it = dq.begin() + i;
dq.push_back(100);
*it = 99; // error: can't access iterator since deque was changed!
Conclusion
By comparing the time and space performances of vector, list, And deque, we can see that we recommend using the deque container as much as possible. In particular, deque is the most suitable candidate to handle massive data.
References
- STD: Vector
- STD: List
- STD: textpool