Binary Search in STL)

Source: Internet
Author: User
Tags sorted by name

 

Binary Search in STL)

Section I
Correctly differentiate different search algorithms: Count, find, binary_search, lower_bound, upper_bound, and interval _range
This article is a summary of article 45th objective STL, and describes the similarities and differences of various search algorithms and the time to use them.

The following algorithms are available: Count, find, binary_search, lower_bound, upper_bound, and interval _range. The usage of the factional version with discriminant such as count_if, find_if, or binary_search is roughly the same and does not affect the selection.
Note that these search algorithms must be sequential containers or arrays. The associated container has a member function of the same name, namely, member t binary_search.

First, whether to sort the intervals is a crucial factor when you select a search algorithm.
You can divide the data into two groups based on the sorting interval:
A. Count, find
B. binary_search, lower_bound, upper_bound, interval _range
Group A does not require sorting intervals, and group B requires sorting intervals.
When an interval is sorted, group B is given priority because they provide the logarithm time efficiency. While a is a linear time.

In addition, Group A and Group B depend on different query and Judgment Rules. A uses the equality rule (operator = needs to be defined for the search object) and B uses the equivalence rule (operator needs to be defined for the search object <, must return false if they are equal ).

Differences between group
Count: calculates the number in the object range.
Find: returns the location of the first object.
If the search is successful, find will return immediately, and count will not return immediately (until the entire range is queried). In this case, find is more efficient.
Therefore, count is not considered unless you want to calculate the number of objects.

Differences between group B {1, 3, 4, 5, 6}
Binary_search: determines whether an object exists.
Lower_bound: Return> = the first position of the object, lower_bound (2) = 3, lower_bound (3) = 3
If the target object exists, it is the location of the target object. If the target object does not exist, it is the next location.
Upper_bound: Return> the first position of the object, upper_bound (2) = 3, upper_bound (3) = 4
Whether or not it exists is the last position.
Pai_bound: returns the pair consisting of lower_bound and upper_bound return values, that is, all equal element intervals.
Performance_bound has two points to note:
1. If the two iterators returned are the same, the search interval is empty.
2. The distance between the return iterator and the number of objects in the iterator are equal. For the sorting interval, the return iterator completes the count and find tasks.

Section II binary search in STL

If the C ++ STL container contains an ordered sequence, STL provides four functions for search. They use binary search ).
Where:
Assume that there may be multiple elements with the same value
Lower_bound returns the first element location that meets the condition.
Upper_bound returns the position of the last qualified element.
When _range returns the positions of all Header/tail elements equal to the specified value, which are actually lower_bound and upper_bound.
Binary_search returns whether any element needs to be searched.

Section II effect STL #45

Clause 45: note the differences between count, find, binary_search, lower_bound, upper_bound, and interval _range.

What are you looking for, and you have a container or you have a range divided by the iterator-everything you are looking for is in it. How do you complete the search? The arrows in your arrow bag include count, count_if, find, find_if, binary_search, lower_bound, upper_bound, and interval _range. How do you choose to face them?

Simple. You are looking for something fast and simple. The faster, the simpler the better.

For the moment, I suppose you have a pair of iterators that specify the search areas. Then, I will consider that you have a container instead of a range.

To select a search policy, you must determine whether your iterator defines an ordered interval. If yes, you can use binary_search, lower_bound, upper_bound, and performance_range to accelerate the search (usually logarithm time-See Clause 34. If the iterator does not divide an ordered interval, you can only use the count, count_if, find, and find_if linear time algorithms. In the following article, I will ignore whether the difference between count and find is _ if, just as I will ignore whether binary_search, lower_bound, upper_bound, and interval _range have the limit type. Whether you rely on the default search predicate or specify your own, you have the same considerations for selecting a search algorithm.

If you have an unordered interval, you can choose count or find. They can answer slightly different questions, so it is worth separating them carefully. Count answers the following question: "Is there a value? If so, how many copies are there ?" Find answers the question: "Is there? If so, where is it ?"

Suppose you want to know whether there is a specific widget value W in the list. If count is used, the code looks like this:

List <widget> LW; // widget list
Widget W; // specific widget Value
...
If (count (LW. Begin (), LW. End (), W ))...{
... // W in LW
} Else ...{
... // No
}
Here we demonstrate a common usage: Count is used as a check for existence. Count returns zero or a positive number, so we convert non-zero to true and zero to false. In this case, what we need to do is more obvious:
If (count (LW. Begin (), LW. End (), w )! = 0 )...

In addition, some programmers write like this, But implicit conversion is more common, as in the original example.

Compared with the original code, using find is slightly more difficult, because you must check whether the return value of find is equal to the end iterator of list:
If (find (LW. Begin (), LW. End (), w )! = LW. End ()){
... // Found
} Else {
... // Not found
}

If you want to check whether the Count exists, the usage of count is simply encoded. However, when the search is successful, the efficiency is relatively low, because find stops when the matching value is found, and count must continue to search, wait until the end of the interval to find other matching values. For most programmers, the efficiency advantage of find is sufficient to prove that it is appropriate to slightly increase complexity.

Generally, it is not enough to know whether a value exists in the interval. Instead, you want to obtain the first object in the range that is equal to this value. For example, you may want to print out this object, you may want to insert something before it, or you may want to delete it (but for guidance on deleting during iteration, see article 9 ). When you need to know not only whether a value exists, but also the object (or object) that owns the value, you need to find:
List <widget>: iterator I = find (LW. Begin (), LW. End (), W );
If (I! = LW. End ()){
... // Found, I points to the first
} Else {
... // Not found
}

For ordered intervals, you have other options, and you should use them explicitly. Count and find are linear, but the search algorithms (binary_search, lower_bound, upper_bound, and pai_range) in the ordered interval are logarithm time.

Migrating from unordered intervals to ordered intervals leads to another migration: Judging from using equality to determining whether two values are the same to using equivalence. Clause 19 describes the differences between equality and equivalence in detail, so I will not repeat them here. Instead, I will simply describe that the count and find algorithms use equal searches, while binary_search, lower_bound, upper_bound, and interval _range are equivalent.

To test whether a value exists in the ordered interval, binary_search is used. Unlike bsearch in the Standard C library (and therefore in the Standard C ++ Library), binary_search returns only one bool: whether the value is found. Binary_search answers this question: "Is it there ?" Its answer can only be yes or no. If you need more information than this, you need a different algorithm.

Here is an example of binary_search applied to ordered vector (you can learn the advantages of ordered vector from clause 23 ):

Vector <widget> VW; // create a vector and place it in
... // Data,
Sort (VW. Begin (), VW. End (); // sort data
Widget W; // the value to be found
...
If (binary_search (VW. Begin (), VW. End (), W ))...{
... // W in VW
} Else ...{
... // No
}
If you have an ordered interval and your question is: "Is it there? If so, where is it ?" You need to use lower_bound. I will discuss about interval _range soon, but first, let's see how to use lower_bound to locate a value in the interval.

When you use lower_bound to find a value, it returns an iterator pointing to the first copy of the value (if found) or to the position where this value can be inserted (if not found ). Therefore, lower_bound answers this question: "Is it there? If yes, where is the first copy? If not, where will it be ?" Like find, you must test the lower_bound result to see if it points to the value you are looking. Unlike find, you cannot just check whether the returned value of lower_bound is equal to the end iterator. Instead, you must check whether the object marked by lower_bound is the value you need.

Many programmers use lower_bound as follows:

Vector <widget>: iterator I = lower_bound (VW. Begin (), VW. End (), W );
If (I! = VW. End () & * I = W)... {// ensure that I points to an object;
// This ensures that the object has a correct value.
// This is a bug!
... // Find this value, I points
// The first object equal to this value
} Else ...{
... // Not found
}
This works in most cases, but it is not true. Check again whether the required values are found in the Code:
If (I! = VW. End () & * I = W )...

This is an equal test, but the lower_bound search is equivalent. In most cases, the results of the equivalent test and the equality test are the same, but as stated in Clause 19, the results of equality and equivalence are not difficult to see. In this case, the above Code is wrong.

To completely complete the process, you must check whether the value of the object pointed to by the iterator returned by lower_bound is equivalent to the value you are looking. You can do it manually (Clause 19 demonstrates how you do it, and provides an example when it is worth doing it), but it can be done more cleverly, because you must make sure that you use the same comparison function as lower_bound. In general, it can be an arbitrary function (or function object ). If you pass a comparison function to lower_bound, you must make sure that you use the same comparison function as your handwritten equivalence detection code. This means that if you change the comparison function that you pass to lower_bound, You have to modify your equivalence detection part. Synchronization of comparison functions is not a rocket launch, but another thing to remember. I think you already have a lot to remember.

Here is a simple method: Use interval _range. When _range returns a pair of iterators. The first is equal to the iterator returned by lower_bound, and the second is equal to the iterator returned by upper_bound (that is, it is equivalent to the next iterator of the last iterator to search for the value range ). Therefore, interval _range returns an iterator that divides an interval that is equivalent to the value you want to search. An algorithm with a good name, isn't it? (Of course, equivalent_range may be better, but it is also very good .)

There are two important aspects for the return value of performance_range. First, if the two iterators are the same, it means that the object's range is empty; this is not found. In this result, use pai_range to answer "Is it there ?" The answer to this question. You can use this method:

Vector <widget> VW;
...
Sort (VW. Begin (), VW. End ());
Typedef vector <widget>: iterator vwiter; // convenient typedef
Typedef pair <vwiter, vwiter> vwiterpair;
Vwiterpair P = pai_range (VW. Begin (), VW. End (), W );
If (P. First! = P. Second)... {// If pai_range does not return
// Null interval...
... // The description is found. P. First points
// The first one and P. Second
// Point to the next of the last one
} Else ...{
... // Not found, P. First and
// P. Second all points to the search Value
} // Insert position

This code is only equivalent, so it is always correct.

The second thing to note is that the items returned by performance_range are two iterators. The distance for them is equal to the number of objects in the range, that is, the object equivalent to the value to be searched. As a result, the sorted _range not only completes the task of searching the ordered interval, but also completes the count. For example, to find a widget equivalent to W in VW and print out the number of such widgets, you can do this:
Vwiterpair P = pai_range (VW. Begin (), VW. End (), W );
Cout <"there are" <distance (P. First, P. Second)
<"Elements in VW equivalent to W .";

So far, we have discussed the assumption that we want to search for a value in a range, but sometimes we are more interested in finding a position in the range. For example, suppose we have a timestamp class and a timestamp vector, Which is sorted by the method in front of the old timestamp:
Class timestamp {...};
Bool operator <(const timestamp & LHS, // returns LHS on time
Const timestamp & RHs); // whether it is before RHS
Vector <timestamp> VT; // creates a vector and fills in data,
... // Sort to make the old time
Sort (vt. Begin (), vt. End (); // in front of the new

Now suppose we have a special timestamp -- agelimit, And we delete all timestamp older than agelimit from VT. In this case, we do not need to search for timestamp equivalent to agelimit in VT, because there may be no element equivalent to this exact value. Instead, we need to find a location in VT: the first element that is no older than agelimit. This is a little simple, because lower_bound will give us the answer:
Timestamp agelimit;
...
Vt. Erase (vt. Begin (), lower_bound (vt. Begin (), // exclude all
Vt. End (), // The value of agelimit
Agelimit); // The previous object

If our requirements change a little, we need to exclude all timestamp that are at least as old as agelimit, that is, we need to find the first timestamp that is younger than agelimit. This is a special task for upper_bound:
Vt. Erase (vt. Begin (), upper_bound (vt. Begin (), // remove all
Vt. End (), // before the value of agelimit
Agelimit); // or an equivalent object

Upper_bound is also useful when you want to insert an object into an ordered interval and the object is inserted at the place where it should be in an ordered equivalence relationship. For example, you may have a list of ordered person objects, which are sorted by name:
Class person {
Public:
...
Const string & name () const;
...
};

Struct personnameless:
Public binary_function <person, person, bool> {// For more information, see section 40.
Bool operator () (const person & LHS, const person & RHs) const
{
Return LHS. Name () <RHS. Name ();
}
};

List <person> LP;
...
LP. Sort (personnameless (); // sort LP using personnameless

We can use upper_bound to specify the insert position to keep the list in the desired order (by name, the equivalent names are still arranged in order after insertion:
Person newperson;
...
LP. insert (upper_bound (LP. Begin (), // rank in newperson in LP
LP. End (), // before or equal
Newperson, // the last one
Personnameless (), // After the object
Newperson); // insert newperson

This work is very good and convenient, but it is important not to be misled-mistakenly think that the use of upper_bound allows us to find the insert position in a list in the logarithm time. We didn't -- cla34 explains that because we use list, it takes linear time to search, but it only uses logarithm comparison.

Until now, I have considered the case where we have a pair of iterators defining the search areas. Usually we have a container instead of a range. In this case, we must differentiate sequences and associated containers. For standard sequence containers (vector, String, deque, and list), you should follow the suggestions I have put forward in these terms and use the begin and end iterator of the container to divide the intervals.

This situation is different for standard associated containers (set, Multiset, map, and multimap) because they provide search member functions, which are often better choices than STL algorithms. Article 44 details why they are better choices, simply put, because they are faster and more natural. Fortunately, member functions usually have the same name as the corresponding algorithms. Therefore, the algorithms count, find, interval _range, lower_bound, and upper_bound are recommended in the previous discussion, when searching for associated containers, you can simply use a member function with the same name instead.

The policies for calling binary_search are different because this algorithm does not provide the corresponding member functions. To test whether a value exists in the set or map, use the regular method of Count to check the members:
Set <widget> S; // create a set and put it into the data
...
Widget W; // W is still the value to be searched
...
If (S. Count (w )){
... // There are values equivalent to W
} Else {
... // This value does not exist
}

To test whether a value exists in Multiset or multimap, find is usually better than count, because once a single object that is equal to the expected value is found, find can be stopped, while count, in the worst case, each object in the container must be detected. (This is not a problem for set and map, because set does not allow duplicate values, while map does not allow duplicate keys .)

However, count is reliable for the associated container count. In particular, it is better than calling cmd_range and then applying distance to the result iterator. First, it is clearer: Count means "count ". Second, it is simpler. You do not need to create an iterator and then pass its composition to distance. Third, it may be faster.

Where should we begin with all of the considerations we have taken into account in these terms? The following table shows everything.

 

The table above summarizes how to operate the ordered interval. The occurrence frequency of the interval _ range may be surprising. When searching, this frequency increases because of the importance of equivalence detection. For lower_bound and upper_bound, it is easy to retreat in equal detection, but for pai_range, it is natural to detect only equal values. In the ordered interval of the second row, pai_range beat find for another reason: pai_range takes logarithm time, while find takes linear time.

For Multiset and multimap, when you are searching for the row of the first object that is equal to a specific value, this table lists the find and lower_bound algorithms as candidates. Find is a common choice for this task, and you may have noticed that in the set and map columns, this is only find. However, for multi containers, if not only one value exists, find does not guarantee that it can identify the first element in the container that is equal to the given value; it only recognizes one of these elements. If you really need to find the first element equal to the given value, you should use lower_bound, and you must manually perform the equivalence check on the second part, the content in Clause 19 can help you confirm that you have found the value you are looking. (You can use interval _range to avoid manual equivalence detection, but it takes much more to call interval _range than to call lower_bound .)

It is easy to choose from Count, find, binary_search, lower_bound, upper_bound, and pai_range. When you call it, selecting an algorithm or a member function can give you the behavior and performance you need, and it is the least effort. Follow this advice (or refer to the table) and you will not be confused.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.