STL efficient programming (5)-use interval operation functions instead of single-element operation functions as much as possible.

Source: Internet
Author: User

 

Given two vectorv1 and V2, what is the simplest way to make V1 content the same as the second half of V2? Do not worry about "only half of V2 has an even number of elements.

 

Time is up! If your answer is

v1.assign(v2.begin() + v2.size() /2, v2.end()); 

Or other similar things, you can get the gold medal. If your answer involves calling more than one function, but does not use any form of loop, you are close to the correct answer, but there is no gold medal. If your answer involves a loop, you need to spend some time improving it. If your answer involves multiple cycles, well, we can only say that you really need this book. By the way, if your answer to this question contains "hey, is that true ?", Please note that you will learn something really useful.

 

This quiz has two purposes: first, it provides me with a chance to remind you of the existence of the assign member function. Too many programmers did not notice that this is a very convenient method. It is valid for all standard sequence containers (vector, String, deque, and list. Whenever you have to replace the content of a container, you should think of a value assignment. If you just copy a container to another container of the same type, operator = is the selection assignment function, but for the example demonstrated, when you want to give a container a completely new data set, assign can be used, but operator = cannot.

 

The second purpose of this test is to demonstrate why interval member functions take precedence over their single-element member functions. OneInterval member functionsIt is a member function like STL algorithm. It uses two iterator parameters to specify a range of elements for a specific operation. You do not need to use the range member function to solve the problem at the beginning of this clause. You must write an explicit loop, which may be like this:

Vector <widget> V1, V2; // assume that V1 and V2 are the vectorv1.clear () of the widget. For (vector <widget>: const_iterator CI = v2.begin () + v2.size () /2; CI! = V2.end (); ++ CI) v1.push _ back (* CI );

Clause 43 will verify why you should try to avoid explicit loop by hand, but you do not have to read that clause to know that writing this code requires much more work than writing assign calls. As we will see right away, loops also produce additional overhead, but we will handle it later.

One way to avoid loops is to use an algorithm instead of following the advice in Clause 43:

V1.clear ();

Copy (v2.begin () + v2.size ()/2, v2.end (), back_inserter (V1 ));


Writing this is still more work than writing assign calls. In addition, although this code does not show a loop, there is indeed a loop in copy. As a result, the efficiency loss still exists. Copy can be used by inserting an iterator (for example, through inserter, back_inserter, or front_inserter --Should-- It is replaced by calling the range member function. For example, the copy call can be replaced by an insert interval version:

v1.insert(v1.end(), v2.begin() + v2.size() / 2, v2.end());

This input is slightly less than the call of copy, but it also happens more directly than said: data is inserted into V1. Calling copy also expresses that meaning, but it is not so straightforward. This places the emphasis on errors. The focus on what happened should not be on copying elements, but adding new data to V1. The insert member function makes this clear. The use of copy makes it obscure. There is no concern about the fact that something is copied, because STL is built on the assumption that something is copied. Copying is very basic for STL. It is the topic of Article 3 in this book!

Too many STL programmers use copy too much, so I repeated my suggestion: almost all target intervals are inserted into the iterator. The specified copy can be replaced by the called range member function.

Return to the example of our assign. We have a yafan reef, × the thirsty guys have a slight baking loss. Why are the values of the yafan yundun? Lt;/P>

  • Generally, you can enter less code using the range member function.
  • The interval member function makes the code clearer and more straightforward.

In short, the code generated by the range member function is easier to write and understand. Isn't that true?

Alas, some people will regard this argument as a matter of programming style, and developers like to argue about style issues almost the same as they like to argue about what a real editor is. (Although there are many questions, it is indeed Emacs .) It would be advantageous if a single-element sibling with a definite range member function prevails over their more universally accepted criteria. For standard sequence containers, we have one: efficiency. When processing standard sequence containers, applying single-element member functions requires more memory allocation than completing the same purpose range member functions, copying objects more frequently, and/or causing extra operations.

For example, suppose you want to copy an int array to the front end of the vector. (Data may be first stored in an array instead of a vector, because the data comes from the legacy c api. For more information about the mixed use of STL containers and C APIs, see section 16 .) Using the insert function in the vector interval is really insignificant:

Int data [numvalues]; // assume that numvalues defines vector <int> V ;... v. insert (v. begin (), data, data + numvalues); // insert int in data // to the front of V

Insert data using iterative calls in an explicit loop, which may look more or less like this:

vector<int>::iterator insertLoc(v.begin());for (int i = 0; i < numValues; ++i) {  insertLoc = v.insert(insertLoc, data[i]);}

Note that we must carefully save the insert return value for the next iteration. If we do not update insertloc after each insert, we have two problems. First, all loop iterations after the first time will lead to undefined behaviors, because each call to insert will invalidate insertloc. Second, even if insertloc remains valid, we always insert (that is, in V. Begin () in the front of the vector. The result is that the integer is copied to V in reverse order.

If we use copy to replace the loop in accordance with the guidelines of Clause 43, we will get something like this:

copy(data, data + numValues, inserter(v, v.begin()));

This demonstrates the copy template. This Code is based on copy, which is almost the same as the code using an explicit loop. Therefore, we will focus on the display loop for the purpose of efficiency analysis, remember that analysis is also effective in using copy code. Focusing on explicit loops makes it easier to understand the impact of efficiency (s. Yes, it is the "impact (s)" plural, because the code using the insert single element version imposes three different performance taxes on you. If you use the insert of the interval version, none of them.

The first type of tax is that no function call is necessary. Insert numvalues elements into V. Each time one element is inserted, it will naturally take you numvalues to call insert. Using the insert range form, you only need to spend one call, saving the numValues-1 of the call. Of course, the possible inline will save you the tax, but again, it may not. Only one thing is definite. You do not need to spend it explicitly using the insert interval format.

Inline does not save your second type of tax-overhead of Moving existing elements in V unefficiently to their final inserted locations. Each time insert is called to add a new element to V, each element above the insertion point must be moved up once to free up space for the new element. Therefore, the element at the position P must be moved up to the position P + 1. In our example, numvalues elements are inserted in front of v. This means that in V, each element before insertion must be moved up to a total of numvalues positions. But each insert call can only move one position up, so each element will be moved numvalues for a total of times. If V has n elements before insertion, a total of N * numvalues is moved. In this example, V contains int, so each move may be attributed to a memmove call. However, if V contains a custom type such as widget, each movement will call the value assignment operator or copy the constructor of that type. (Most of them call the value assignment operator, but each time the last element of the vector is moved, that movement will be done by calling the copy constructor of the element .) Therefore, it usually takes N * numvalues to call the function once to insert a new numvalues object to the front of a vector <widget> containing n elements at a time: (n-1) * numvalues calls the widget value assignment operator and numvalues calls the widget copy constructor. Even if these calls are inline, you still do the work of moving the elements in numvalues Times v.

On the contrary, the standard requires the interval insert function to directly move existing elements to their final position, that is, the overhead is to move each element at a time. The total overhead is moved n times. numvalues is the object type copy constructor in the container, and the rest is the type assignment operator. Compared to the single-element insertion policy, interval insert executes N * (numValues-1) Less times of movement. Take a minute to think about it. This means that if numvalues is 100, the insert interval format will be 99% less than the code in the form of a single element that repeatedly calls insert!

Before I turn to the single-element member function and the third efficiency overhead of their interval brothers, I have a small correction. The paragraphs I wrote earlier are all truth, and there is nothing except truth, but they are not the whole truth. Only when the distance between the two iterators can be determined without losing the positions of the two iterators, a Range insert function can move an element to its final position in one movement. This is almost always possible, because all the forward iterators provide this function, and the forward iterators are almost everywhere. All iterators used for standard containers provide the forward iterator function. The non-standard hash container (see clause 25) is also the iterator. In the array, the pointer of the iterator also provides this function. In fact, the only standard iterator that does not provide the capabilities of the forward iterator is the input and output iterator. Therefore, except when the iterator passed to the insert range form is an input iterator (for example, istream_iterator -- See Clause 6), what I wrote above is true. In that unique case, each insert interval must move the elements one by one to their final position, and the expected advantages will disappear. (This problem does not occur for the output iterator, because the output iterator cannot be used to specify an interval for insert .)

The last kind of performance tax left behind is stupid. to reuse single-element insertion instead of a single interval insertion, you must handle memory allocation, although there is also an annoying copy in it. As explained in Clause 14, when you try to insert an element into a vector with full memory, the vector will allocate new memory with more capacity, copy its elements from the old memory to the new memory, destroy the elements in the old memory, and recycle the old memory. Then it adds the inserted element. Clause 14 also explains that most vector implementations double their capacity when memory is used up, so inserting a new numvalues element will lead to a maximum of log2numvalues memory allocation. Clause 14 also focuses on the existing implementation of the behavior. Therefore, inserting 1000 elements at a time results in 10 new distributions (including copying the elements they are responsible ). The comparison is (and, for the moment, it is predictable ), you can calculate the amount of new memory required before you start inserting data into a single interval (assuming that the memory is given to the forward iterator), so it does not need to re-allocate the internal memory of the vector more than once. As you can imagine, this saving is considerable.

I used the analysis just now for vector, but the same reason also applies to string. The reason for deque is similar, but deque manages their memory in a different way than that for vector and string. Therefore, the argument for repeated memory allocation cannot be applied. However, arguments about many unnecessary element moves are usually applied through observation of the number of function calls (although the details are different ).

 

For standard sequence containers, when selecting between single element insertion and interval insertion, there are many things besides the programming style. There are almost no efficiency problems with associated containers, but the overhead of the additional single-element insert function for repeated calls still exists. In addition, the special types of interval inserts may also cause optimization in the associated containers, but as far as I know, such optimization only exists in theory currently. Of course, when you see this, the theory may have become practice, so the interval insertion of the associated container may become more effective than single element insertion. There is no doubt that they will not reduce efficiency, So you choose them without any loss.

Even if there is no efficiency argument, when you write code, the fact that using the range member function requires less input still exists, and the code generated by it is easier to understand, so as to enhance the long-term maintenance of your software. Only two features are enough for you to select the range member function as much as possible. The efficiency advantage is really only a bonus.

After a long story about the miracle of the range member function, I Just Need To summarize it for you. Knowing the supported intervals of member functions makes it easier for you to discover the time to use them. In the following example, the parameter type iterator indicates the container iterator type, that is, container: iterator. On the other hand, the parameter type inputiterator means that any input iterator can be accepted.

  • Interval structure.All standard containers provide constructor in this form:
Container: container (inputiterator begin, // inputiterator end of the interval); // The End of the Interval

If the iterator passed to this constructor is istream_iterators or istreambuf_iterators (see clause 29), you may encounter the most amazing parsing of C ++, one of the reasons is that your compiler may interrupt the definition of this constructor as a function declaration rather than a new container object. Clause 6 tells you that you need to know everything about resolution, including how to deal with it.

  • Insert interval.All standard sequence containers provide this form of insert:
Void container: insert (iterator position, // inputiterator begin, // inputiterator end of the insert interval); // The End Of The insert Interval

The associated containers use their comparison functions to determine where elements are to be placed, so they omit the position parameter.

void container::insert(lnputIterator begin, InputIterator end); 

When looking for a method to replace single-element insertion with the interval version, do not forget that some single-element variables use different function names to disguise themselves. For example, push_front and push_back both insert single elements into the container even if they are not called insert. If you see a loop that calls push_front or push_back, or if you see an algorithm -- for example, copy -- the parameter is front_inserter or back_inserter, you will find that the insert interval format should be used as the priority policy.

  • Interval deletion.Each standard container provides an erase in the form of an interval, but the sequence and the returned type of the associated container are different. The sequence container provides the following:
iterator container::erase(iterator begin, iterator end); 

The associated container provides the following:

void container::erase(iterator begin, iterator end); 

Why is it different? The explanation is that if erase's associated container version returns an iterator (the next of the deleted element), it will lead to an unacceptable performance reduction. I am one of the many people who found that this student has an explanation of his table,StandardIn terms of the standard, the sequence of erase and the version of the associated container have different return types.

Most of the insert performance analysis in this clause can also be used in erase. The number of function calls to delete a single element is still greater than that of one call interval. When single-element deletion is used, each element value must still move one bit to their destination, while interval deletion can move them to the target location in a single movement.

One argument about insertion and deletion of vector and string is that there must be many repeated allocations. (Of course, the deletion will be recycled repeatedly .) This is because the memory used for vector and string increases automatically to adapt to new elements, but it does not automatically contract when the number of elements decreases. (Article 17 describes how you reduce unnecessary memory held by a vector or string .)

Erase-remove is a very important section of erase. You can learn about it in Clause 32.

  • Interval assignment.As I mentioned at the beginning of this article, all standard column containers provide assign in the range format:
void container::assign(InputIterator begin, InputIterator end); 

So now we understand that we should try to use the range member function to replace the three reliable arguments of Single-element brothers. Interval member functions are easier to write. They clearly express your intent and provide higher performance. It is a difficult trigger to defeat.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.