Search in STL

Source: Internet
Author: User
Tags xslt
Code It is very important to find a special entry in a collection. Standard C ++ The Runtime Library provides many different search technologies.
In C ++ In the Runtime Library, the common method to specify a set is to use iterator to indicate the output range. The range can be written as [first, last). Here, * First is the first element in the interval, and last points to the next element of the last element. This article shows how to consider a general problem: Given a range and a criterion, find the iterator pointing to the first element that meets the criterion. Because the range is expressed as asymmetric, we avoid a special case: if the search fails, we can return last, and the range without any elements can be written as [I, I ).

Linear search and its variants

the simplest Search is linear search, or, as knuth calls, sequential search: view each element in sequence and check whether it is the one we are searching. If there are n elements in the interval, the worst case requires n comparisons.
the standard Runtime Library provides some linear search versions. The two most important versions are find () (which accepts a range and a value x and searches for elements equivalent to X) and find_if () (accept a range and condition P, and find the elements that meet P ). Other linear search versions are find_first_of () (two intervals are accepted: [first1, last1) and [first2, last2), and in [first1, last1) find the first element equivalent to any element in [first2, last2) and adjacent_find () (accept a single interval, and find the first element equivalent to its successor element ).
for example, V is an int vector. You can use the Code below to find the first 0:
vector int > :: iterator I = Find (v. begin (), V. end (), 0 );

Find the first non-0 value:
Vector<Int>: Iterator I=Find_if (V. Begin (), V. End (), not1 (bind2nd (defaults _to<Int>(),0)));

Find the first small Prime Number:
A [ 4 ] = { 2 , 3 , 5 , 7 };
Vector < Int > : Iterator I = Find_first_of (V. Begin (), V. End (), + 0 , + 4 );

Find the first duplicate pair:
Vector<Int>: Iterator I=Adjacent_find (V. Begin (), V. End ());

There are no independent versions to reverse search for intervals, because you don't need to: You can use a simple iterator adaptor to achieve the same effect. For example, if you find the last 0 in V, you can write it like this:
Vector<Int>: Reverse_iterator I=Find (V. rbegin (), V. rend (),0);

Linear search is a simple Algorithm It seems that there is nothing to discuss about. Many books (including my) demonstrate a simple implementation of STD: Find:
Template < Class Initer, Class T >
Initer find (initer first, initer last, Const T & Val)
{
While (First ! = Last &&   ! ( * First = Val )) ++ First;
Return First;
}

This is indeed a loyal implementation of linear search algorithms to meet the requirements of C ++ Standard requirement; the name of the first template parameter, initer, means that the real parameter only needs to be a very weak input iterator [NOTE 1]. It may seem so simple that it is not as easy as writing code directly. Even so, there is still a annoying problem: this implementation does not achieve the efficiency it should be. The cyclic conditions are complex and two tests are required for each element obtained. Conditional branches are expensive, and complex cyclic conditions are not optimized to the same extent as simple cyclic conditions.
One of the answers to the question is [NOTE 2], which is used by some standard Runtime Library implementations. It is a "undo" loop, with four elements checked each time. This is a complicated solution, because find () must then process the residual elements (the interval is not always a multiple of 4), and it also needs find () break down based on the types of iterator-"unlock" can only work in the range indicated by random access iterator. In general, you still need to use the old implementation. However, "undo" is effective: it reduces the number of tests on each element from 2 1.25 . It is a technology that the implementers of the standard library can use without modifying any interfaces.
You will see a different answer in a common book about algorithms. The reason why we need to perform two tests on each element is that if the element to be found is not found at the end of the interval, we must acknowledge that it has failed. But what if we happen to find elements that always exist and the search will never fail? In that case, the test for the end of the interval is redundant; there is no reason to think that the search algorithm should first master the information of the end of the interval (there wouldn't be any reason For The search algorithm to keep track of the end of the range In The first place ). Instead of STD: Find (), we can implement a linear search algorithm as follows:
Template < Class Initer, Class T >
Initer unguarded_find (initer first, Const T & Val)
{
While ( ! ( * First = Val )) ++ First;
}

Knuth's linear search version [NOTE 3] is closer to unguarded_find () rather than STD: Find (). Note that unguarded_find () is not c++Standard part. It is more dangerous than find () and less universal. You can only use it when making sure that one element is equivalent to Val -- this usually means that you have put that element in it and used it as the Sentinel for inter-zone termination. The use of Sentinel is not always true. (What if you are searching for a read-only range ?) However, when it is available, unguarded_find () is faster and simpler than everything in the standard library.
Binary Search

Linear search is very simple, and it is the best method for inter-cell. However, if the interval is longer and longer, it is no longer a reasonable solution. When I recently used XSLT, I think of this problem. My XSLT script contains a line similar to this:
< X: Value - Of select = " /Defs/data [@ key = $ R] " />
The XSLT engine I used to run this task must use linear search. I searched in a list and executed this search for each entry in the list. My script is O (n2) and it takes several minutes to run.
If you are searching for a general interval, it cannot be better than linear search. You must check every element. Otherwise, you may miss what you are looking. But if you want this interval to be organized in some way, you can do better.
For example, you can request that the intervals are sorted. If there is an ordinal interval, you can use a linear search improved version (when you reach an element larger than the element you are looking, you do not need to continue until the end of the range to know that the search has failed), but a better way is to use binary search. By viewing the element in the center of the interval, You can say whether the searched element is in the first half or the second half. Repeat this decomposition process, you do not need to traverse all elements to find the elements to be found. Linear search requires comparison of O (N), while binary search only requires O (log n ).
The standard Runtime Library contains four different versions of Binary Search: lower_bound (), upper_bound (), performance_range (), and binary_search (). They all share the same form: accepting a range, an element that is trying to find, and an optional comparison function. Intervals must be sorted based on this comparison function. If a comparison function is not provided, it must be based on the < . For example, if you are searching for an int array in ascending order, you can use the default behavior. If you search in an int array in descending order, you can input a STD: greater < Int > As a comparison function.
Among the four binary search functions, the most useless one is the one with the most clear name: binary_search (). It returns a simple yes or no: returned when it exists in the interval. True Otherwise, the value is false. However, such a piece of information is useless. I have never used binary_search () in any scenarios (). If the element you want to search for exists, you may want to know its location. If it does not exist, you may want to know where it exists.
You can ask a few different questions about the location of elements, which is why different versions of Binary Search exist. When there are several copies of the same element, their differences are very important. For example, if you have an int array, use lower_bound () and upper_bound () to find the same value:
Int A [ 10 ] = { 1 , 2 , 3 , 5 , 5 , 5 , 5 , 7 , 8 , 9 };
Int * First = STD: lower_bound ( + 0 , + 10 , 5 );
Int * Last = STD: upper_bound ( + 0 , + 10 , 5 );

The name first and last imply a difference: lower_bound () returns the first value you are looking for (in this example, yes & A [ 3 ]), And upper_bound () returns the next iterator of the last value you are looking for (for this example, yes & A [ 7 ]).
If the value you search for does not exist, you will get the location where it should exist. As before, we can write:
Int * First = STD: lower_bound ( + 0 , + 10 , 6 );
Int * Last = STD: upper_bound ( + 0 , + 10 , 6 );

Both first and last will be equal&A [7], Because this is the only position that can be inserted when sorting is not violated.
In practice, you cannot see the lower_bound () call followed by an upper_bound () immediately (). If you need both information at the same time, it is the reason for introducing the last Binary Search Algorithm: pai_range () returns a pair, and the first element is the value to be returned by lower_bound, the second element is the returned value of upper_bound.
At this time, I intentionally made a rough comparison in the discussion: I said lower_bound () and upper_bound () to find a value, but it does not correctly describe its meaning. If you write
Iterator I=STD: lower_bound (first, last, X );

And the search is successful. * Are I and X equal? Not necessarily! Lower_bound () and upper_bound () do not test the equivalence (WQ note: the logic is equal and operator is used. = ()). They use the comparison functions you provide: Operator < () Or some other functions such as "less than" comparison functions. This means that * I is not less than X and X is not less * When I, the search is successful (WQ note, that is, attention, mathematics is equal ).
The difference looks like a flaw, but it's not. If you have complex records with many fields, you can use one of the fields as the sort key value (such as the person's surname ). The two records may have the same key value. Therefore, even if all the other child segments are different, either of them is smaller than the other.
Once you start to think of records and key values, another problem in binary search becomes very natural: Can you use binary search to search records based on keys? For more details, suppose we define a struct X:
Struct X {
Int ID;
... // Other fields
};

assume a vector x > , sort by element ID. How do you use binary search to find a specified ID (such as 148) X?
one way is to create a dumb X object with the specified ID and use it in Binary Search:
X dummy;
dummy. ID = 148 ;
vector x > :: iterator = lower_bound (v. begin (), V. end (), dummy, x_compare);

currently, this is the most reliable method. If you care about maximum portability, it is the method you should use. On the other hand, it is not very elegant. You must create a complete X object, although all you need is one of the fields. Other fields have to be initialized as default values or random values. The initialization may be inconvenient, expensive, or sometimes even impossible.
is it possible to pass the ID directly to lower_bound? Maybe, by passing in a heterogeneous comparison function, it accepts an X and an ID? There is no simple answer to this question. C + The standard does not fully clarify whether such a heterogeneous comparison function is allowed. In my opinion, the most natural reading of the standard is not allowed. In today's practice, heterogeneous comparison functions are feasible in some implementations, but not in others. C + the Standardization Board considers this a defect and, in future versions, the standards determine whether heterogeneous comparison functions are allowed [Note 4].
conclusion

C++The Runtime Library also provides other forms of search algorithms. With find () and lower_bound (), the search is limited to a single element, but the standard Runtime Library also provides serach (), which searches for the entire subinterval. For example, you can search for a word in string s:
STD ::StringThe= "The";
STD ::String: Iterator I=STD: Search (S. Begin (), S. End (), The. Begin (), The. End ());

The return value, I, points to the first occurrence of "the" in S -- or, as usual, if "the" does not exist, S. End () is returned (). There is also a variant to start searching from the tail:
STD: find_end (S. Begin (), S. End (), The. Begin (), The. End ());

It returns an iterator pointing to the beginning of the last appearance of "the", rather than the first one. (If you think this is strange, the reverse variant of search is called find_end () instead of search_end (), then you are not alone .)
Search can be encapsulated into the data structure. Most clearly, the container associated with the standard Runtime library,Set, Multiset, map, and multimap are specially designed to search by key. [Note 5]. The string class of the Runtime Library also provides many member functions for search: Find (), rfind (), find_first_of (), find_last_of (), find_first_not_of (), and find_last_not_of (). I recommend that you do not use them. I found these special member functions hard to remember, because they have so many forms, and the interface form is different from the other parts of the Runtime Library; in any case, they do not provide any functions that cannot be obtained from find (), find_if (), and search.
However, if you still think you have seen some important omissions, you are correct! I didn't mention hash tables because there is no hash table in the standard Runtime Library. I mentioned the subinterval matching of search (), but of course it is only a special case of pattern matching-there is no regular expression search in the standard Runtime library or anything similar.
C++The Standardization Board has just begun to consider expanding the standard Runtime Library, and hash tables and regular expressions are the priority candidates for future versions of the standard. If you think that the standard Runtime Library is missing and you want to submit a proposal, it is time to start preparation.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.