Valid STL Clause 45

Source: Internet
Author: User
Tags sorted by name
Differences between STL search algorithms [1]

What are you looking for, and you have a container or you have a range divided by the iterator-everything you are looking for is in it. How do you complete the search? The arrows in your arrow bag include count, count_if, find, find_if, binary_search, lower_bound, upper_bound, and interval _range. How do you choose to face them?

Simple. You can do it quickly and easily. Faster, easier, and better.

For the moment, I suppose you have a pair of iterators that specify the search areas. Then, I will consider that you have a container instead of a range.

To select a search policy, it must depend on whether your iterator defines an ordinal interval. If yes, you can use binary_search, lower_bound, upper_bound, and performance_range to accelerate (usually logarithm time) search. If the iterator does not divide an ordinal interval, you can only use the count, count_if, find, and find_if linear time algorithms. In the following article, I will ignore whether the difference between count and find is _ if, just as I will ignore whether binary_search, lower_bound, upper_bound, and interval _range have different predicates (predicate. Whether you rely on the default search predicate or specify your own, you have the same considerations for selecting a search algorithm.

If you have an unordered interval, you can choose count or find. They can answer slightly different questions, so it is worth separating them carefully. Count answers the following question: "Is there a value? If so, how many copies are there ?" Find answers the question: "Is there? If so, where is it ?"

Suppose you want to know whether there is a specific widget value W in the list. If count is used, this code looks like this:

List <widget> LW; // widget list widget W; // special widget value... if (count (LW. begin (), LW. end (), W )){... // W in LW} else {... // not in}

Here we demonstrate a common method: Count is used as a check for existence. Count returns zero or a positive number, so we convert non-zero to true and zero to false. If this makes what we want to do more obvious,

if (count(lw.begin(), lw.end(), w) != 0) ...

In addition, some programmers write like this, But implicit conversion is more common, as in the original example.

Compared with the original code, using find is slightly more difficult, because you must check whether the return value of find is equal to the end iterator of list:

If (find (LW. Begin (), LW. End (), w )! = LW. End () {... // found} else {... // not found}

If you want to check whether the count method exists, it is easier to encode the count method. However, when the search is successful, the efficiency is relatively low, because find stops when the matching value is found, and count must continue to search, wait until the end of the interval to find other matching values. For most programmers, the efficiency advantage of find is sufficient to prove that it is appropriate to slightly increase complexity.

Generally, it is not enough to know whether a value exists in the interval. Instead, you want to obtain the first object in the range that is equal to this value. For example, you may want to print this object, insert something before it, or delete it. When you need to know not only whether a value exists, but also the object (or object) that owns the value, you need to find:

List <widget>: iterator I = find (LW. Begin (), LW. End (), W); if (I! = LW. End () {... // found, I points to the first} else {... // not found}

For sorted intervals, you have other options, and you should use them explicitly. Count and find are linear, but the search algorithms (binary_search, lower_bound, upper_bound, and pai_range) in the sorted intervals are logarithm time.

Migrating from unordered intervals to sorted intervals leads to another migration: Judging whether two values are the same by using equality to use equivalence [2]. That is because the count and find algorithms use equal search, while binary_search, lower_bound, upper_bound, and interval _range are equivalent.

To test whether a value exists in the ordered interval, binary_search is used. Unlike bsearch in the Standard C library (and therefore in the Standard C ++ Library), binary_search returns only one bool: whether the value is found. Binary_search answers this question: "Is it there ?" Its answer can only be yes or no. If you need more information than this, you need a different algorithm.

Here is an example of applying binary_search to a sorted vector:

Vector <widget> VW; // create a vector and put it in... // data, sort (VW. begin (), VW. end (); // sort data by widget W; // the value to be found... if (binary_search (VW. begin (), VW. end (), W )){... // W in VW} else {... // not in}

If you have an ordered interval and your question is: "Is it there? If so, where is it ?" You need to distribute _range, but you may want to use lower_bound. I will discuss about interval _range soon, but first, let's see how to use lower_bound to locate a value in the interval.

When you use lower_bound to find a value, it returns an iterator pointing to the first copy of the value (if found) or to the position where this value can be inserted (if not found ). Therefore, lower_bound answers this question: "Is it there? If yes, where is the first copy? If not, where will it be ?" Like find, you must test the lower_bound result to see if it points to the value you are looking. Unlike find, you cannot just check whether the returned value of lower_bound is equal to the end iterator. Instead, you must check whether the object marked by lower_bound is the value you need.

Many programmers use lower_bound as follows:

Vector <widget>: iterator I = lower_bound (VW. Begin (), VW. End (), W); if (I! = VW. End () & * I = W) {// ensure that I points to an object; // This ensures that the object has a correct value. // This is a bug! ... // Locate this value, I point to // The first object that equals this value} else {... // not found}

This works in most cases, but it is not true. Check again whether the required values are found in the Code:

if (i != vw.end() && *i == w) ...

This is an equal test, but the lower_bound search is equivalent. In most cases, the results of the equivalent test are the same as those of the equal test, but it is not difficult to see the case where the equal and equal results are different, as demonstrated in note 2. In this case, the above Code is wrong.

To completely complete the process, you must check whether the value of the object pointed to by the iterator returned by lower_bound is equivalent to the value you are looking. You can do it manually, but you can do it more cleverly, because you must make sure that the same comparison function used with lower_bound is used. Generally, it can be a free function (or function object ). If you pass a comparison function to lower_bound, you must make sure that you use the same comparison function as your handwritten equivalence detection code. This means that if you change the comparison function that you pass to lower_bound, You have to modify your equivalence detection part. Synchronization of comparison functions is not a rocket launch, but another thing to remember. I think you already have a lot to remember.

Here is a simple method: Use interval _range. When _range returns a pair of iterators. The first is equal to the iterator returned by lower_bound, and the second is equal to the iterator returned by upper_bound (that is, it is equivalent to the next iterator of the last iterator to search for the value range ). Therefore, interval _range returns an iterator that divides an interval that is equivalent to the value you want to search. An algorithm with a good name, isn't it? (Of course, equivalent_range may be better, but it is also very good .)

There are two important aspects for the return value of performance_range. First, if the two iterators are the same, it means that the object's range is empty; this is not found. In this result, use pai_range to answer "Is it there ?" The answer to this question. You can use this method:

Vector <widget> VW ;... sort (VW. begin (), VW. end (); typedef vector <widget>: iterator vwiter; // convenient typedeftypedef pair <vwiter, vwiter> vwiterpair; vwiterpair P = pai_range (VW. begin (), VW. end (), W); If (P. first! = P. second) {// If interval _range does not return // an empty interval ...... // The description is found, P. first points to // The first and P. second // point to the last one} else {... // not found, P. first and // P. second points to the insert position of the search value} //

This code is only equivalent, so it is always correct.

The second thing to note is that the items returned by performance_range are two iterators. The distance for them is equal to the number of objects in the range, that is, the object equivalent to the value to be searched. As a result, the sorted _range not only completes the task of searching the sorted intervals, but also completes the counting. For example, to find a widget equivalent to W in VW and print out the number of such widgets, you can do this:

VWIterPair p = equal_range(vw.begin(), vw.end(), w);cout << "There are " << distance(p.first, p.second)<< " elements in vw equivalent to w.";

So far, we have discussed the assumption that we want to search for a value in a range, but sometimes we are more interested in finding a position in the range. For example, suppose we have a timestamp class and a timestamp vector, Which is sorted by the method in front of the old timestamp:

Class timestamp {...}; bool operator <(const timestamp & LHS, // returns LHS const timestamp & RHs); // determines whether the vector is before RHS <timestamp> VT; // creates a vector, fill in data ,... // sort to make the old time sort (vt. begin (), VT. end (); // in front of the new

Now suppose we have a special timestamp -- agelimit, And we delete all timestamp older than agelimit from VT. In this case, we do not need to search for timestamp equivalent to agelimit in VT, because there may be no element equivalent to this exact value.

Instead, we need to find a location in VT: the first element that is no older than agelimit. This is a little simple, because lower_bound will give us the answer:

Timestamp agelimit ;... vt. erase (vt. begin (), lower_bound (vt. begin (), // exclude all VT from VT. end (), // The value of agelimit); // The object before

If our requirements change a little, we need to exclude all timestamp that are at least as old as agelimit, that is, we need to find the first timestamp that is younger than agelimit. This is a special task for upper_bound:

Vt. erase (vt. begin (), upper_bound (vt. begin (), // exclude all VT from VT. end (), // before the value of agelimit); // or equivalent object

Upper_bound is also useful if you want to insert an object into an ordered interval where it should be located in an ordered equivalence relationship. For example, you may have a list of sorted person objects, which are sorted by name:

Class person {public :... const string & name () const ;...}; struct personnameless: Public binary_function <person, person, bool> {bool operator () (const person & LHS, const person & RHs) const {return LHS. name () <RHS. name () ;}}; list <person> LP ;... LP. sort (personnameless (); // sort LP using personnameless

We can use upper_bound to specify the insert position to keep the list in the desired order (by name, the equivalent names are still arranged in order after insertion:

Person newperson ;... LP. insert (upper_bound (LP. begin (), // rank in the lp in the newperson LP. end (), // previous or equivalent newperson, // The Last personnameless (), // newperson after the object); // insert newperson

This work is very good and convenient, but it is important not to be misled-mistakenly think that the use of upper_bound allows us to find the insert position in a list in the logarithm time. We do not. Because we use list to search for linear time, but it only uses logarithm comparison.

Until now, I have considered the case where we have a pair of iterators defining the search areas. Usually we have a container instead of a range. In this case, we must differentiate sequences and associated containers. For standard sequence containers (vector, String, deque, and list), you should follow the suggestions I have put forward in these terms and use the begin and end iterator of the container to divide the intervals.

This situation is different for the standard associated containers (set, Multiset, map, and multimap), because they provide a member function for searching, they are often better choices than using STL algorithms [3]. Fortunately, member functions usually have the same name as the corresponding algorithms. Therefore, the algorithms count, find, interval _range, lower_bound, and upper_bound are recommended in the previous discussion, when searching for associated containers, you can simply use a member function with the same name instead.

Binary_search calls different policies because this algorithm does not provide peer-to-peer member functions. To test whether a value exists in the set or map, use the regular method of Count to check the members:

Set <widget> S; // create a set and put it into the data... widget W; // W still saves the value to be searched... if (S. count (w )){... // There is a value equivalent to w} else {... // This value does not exist}

To test whether a value exists in Multiset or multimap, find is usually better than count, because once a single object that is equal to the expected value is found, find can be stopped, while count, in the worst case, each object in the container must be detected.

However, count is reliable for the associated container count. In particular, it is better than calling cmd_range and then applying distance to the result iterator. First, it is clearer: Count means "count ." Second, it is simpler; you do not need to create an iterator and then make it(Note: first and second)To distance. Third, it may be faster.

Where should we begin with all of the considerations we have taken into account in these terms? The following table shows everything.

What you want to know Algorithm Used Member functions used
In unordered intervals In the sorted Interval On set or map On Multiset or multimap
Does the expected value exist? Find Binary_search Count Find
Does the expected value exist? If so, where is the first object equal to this value? Find Performance_range Find Find or lower_bound (see article)
Where is the first object not equal to the expected value? Find_if Lower_bound Lower_bound Lower_bound
Where is the first object equal to the expected value? Find_if Upper_bound Upper_bound Upper_bound
How many objects are equal to the expected value? Count Performance_range Count Count
Where are all objects equal to the expected value? Find (iteration) Performance_range Performance_range Performance_range

The above table summarizes how to operate the ordered intervals. The occurrence frequency of the sorted _range may be surprising. When searching, this frequency increases because of the importance of equivalence detection. For lower_bound and upper_bound, it is easy to retreat in equal detection, but for pai_range, it is natural to detect only equal values. In the sorted range of the second row, pai_range beat find for another reason: pai_range takes the logarithm time, while find takes the linear time.

For Multiset and multimap, when you are searching for the row of the first object that is equal to a specific value, this table lists the find and lower_bound algorithms as candidates. Find is a common choice for this task, and you may have noticed that in the set and map columns, this is only find. However, for multi containers, if not only one value exists, find does not guarantee that it can identify the first element in the container that is equal to the given value; it only recognizes one of these elements. If you really need to find the first element equal to the given value, you should use lower_bound, and you must manually perform the equivalence check on the second part. (You can use interval _range to avoid manual equivalence detection, but it takes much more to call interval _range than to call lower_bound .)

It is easy to choose from Count, find, binary_search, lower_bound, upper_bound, and pai_range. When you call it, selecting an algorithm or a member function can give you the behavior and performance you need, and it is the least effort. Follow this advice (or refer to the table) and you will not be confused.

[1] Scott Meyers, valid tive STL: 50 specific ways to improve your use of the standard template library, Addison-Wesley, 2001, ISBN 0-201-74962-9. This article is written in Article 45 based on valid STL.

[2] If, in a certain sort order, one is not before the other, the two objects are equivalent. Generally, the equivalent value is equal, but not always. For example, strings "STL" and "STL" are equivalent in case-insensitive sorting, but they are obviously not equal. For details about the differences between equivalence and equality, refer to Clause 19.

[3] You can find the cause in Clause 44.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.