std::d eque and Std::textpool for string processing

Source: Internet
Author: User
Tags constant

Introduction

Std::textpool is based on std::d eque implementation. So although this article discusses std::d Eque, all the conclusions are equally valid for Std::textpool.

Implementation overview

As the name suggests, this is a "two-way queue" (double-ended queue). This means the performance of inserting (removing) data from the beginning and end of the queue is good. To achieve this goal, std::d Eque is based on a segmented, contiguous data structure between arrays and linked lists, which is indicated as follows:

template <class _E>
class deque
{
  enum { BlockSize =  512 };

  typedef _E Block[BlockSize];

  std::vector<Block*> m_storage;
  iterator m_first, m_last;
};

Among them, iterator is the Deque::iterator class. Its specific implementation we do not open here discussion, just hint: here M_first, M_last is the container begin (), End (). They are needed because the data for the first block and last blocks of the m_storage may not be filled, and a pointer is needed to indicate the bounds. Imagine if we were just going to implement a one-way queue, you could remove the M_first member here (because the first block would not be dissatisfied if it was not the last block at the same time).

Why do you need to use this segmented continuous data structure? The answer is: performance. Deque is usually rarely used by C + + programmers, but it should not be the case in terms of the performance metrics of the container. It can be said that the deque is a value-based container in the STL (they include: list/slist, Vector, deque, basic_string, etc.) in the best comprehensive performance of the class.

Let's analyze it carefully.

Time performance analysis

Push_back/push_front

These two operations are no different to deque. Vector, however, does not support Push_front (because of poor performance and does not provide). We compare the push_back performance of each container. As follows:

Vector

Vector: The performance of the:p ush_back depends on the vector's implementation, mainly on the vector's memory growth model. The typical practice currently seen is the N*2 model, which means that when the requested memory fills up, it requests n*2 bytes of memory, where n is the space occupied by the current vector. In this case, the time that the element is moved is (1/n * N) = 1 is constant (where 1/n is the average number of moves, N is the amount of data each move), so the push_back complexity is O (1). But this kind of practice time performance has, the space has the huge waste. But if the growth model is a N+delta model (where Delta is a constant growth factor), then the time to move the element is (1/delta * n) = O (n). The space is economical, but the time performance is extremely poor. The Erlang language introduces an interesting growth model based on Fibonacci sequences, which is better than the N*2 model, taking into account both spatial and temporal performance.

List

List: The performance of the:p Ush_back is O (1). The primary time cost is the new node time. If we use GC Allocator,list::p Ush_back is very fast.

Deque

Deque: The performance of the:p Ush_back is close to O (1). The reason why is not O (1) is because after the m_storage is full, it will cause the same memory removal problem as vector. Assuming vector<block*> uses the 2*n growth model, then deque::p ush_back performance is obviously O (1). If the N+delta model is used, then the element move time is (1/(Blocksize*delta) * n/blocksize) = O (N). Although also O (N), but one is N/delta, one is n/(delta*blocksize*blocksize), or the difference is very large. Because M_storage.size () is usually very small, it is true that even in the case of massive data deque::p Ush_back still behaves well.

Operator[]

Operator[] refers to the data by subscript. Obviously the list has a complexity of O (N), which is very slow. The vectors and deque were all O (1). Let's imagine the implementation of deque::operator[]:

_E  deque::operator[](int i)
{
  return m_storage[i/BlockSize][i%BlockSize];
}

As you can see, deque only has one more memory access than vector.

Spatial performance analysis

Push_back

Vector

Unfortunately, if vectors adopt the n*2 memory growth model (usually), then in the worst case, the space complexity is 2*n, the best case is N (all memory is used). On average, the complexity of space is 1.5*n. In other words, almost half of the memory is usually wasted.

List

The space waste of list is more comparable than that of vector. Its spatial complexity is (1 + sizeof (pointer) *2/sizeof (_e)) *n. If we allow the list to store elements as pointer (i.e. _E = pointer), then the space complexity is 3*n and is more wasteful than vector.

Deque

The deque in the worst-case scenario is N + sizeof (pointer) *2*n/(blocksize*sizeof (_e)) (This assumes that vector<block*> also uses the 2*n growth model, The average complexity is changed from 2 to 1.5. If we save the element as pointer (i.e. _E = pointer) and blocksize take 512, the space complexity is N + n/256. In other words, the worst case is a waste of n/256 memory.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.