Introduction to algorithms 7 (median and sequence statistic selection algorithms)

Last Update:2015-01-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In real life, we often encounter this type of problem: in a collection, who is the biggest element? Who is the smallest element? Or who is the second smallest element ?.... And so on. This article describes how to solve such problems in a short period of time.

First, familiarize yourself with several concepts:

1. sequence statistics:

In a set composed of n elements, order statistic is the smallest element in the set. The minimum value is 1st sequence statistics (I = 1), and the maximum value is the nth sequence statistic (I = n ).

2. Median:

A median is the "midpoint element" of the set to which it belongs. When n is an odd number, the median is unique at I = (n + 1)/2; when n is an even number, there are two medians located at I = n/2 and I = n/2 + 1 respectively.

I. maximum and minimum values

Intuitively, how many times do we need to compare the minimum number of elements in a set containing n elements? It is easy to think that we need to make at least n-1 comparisons. We only need to traverse the set for comparison. Each time we record smaller elements, we can get the smallest element at the end of the traversal. You can also find the maximum value.

If the problem is that you need to find the maximum and minimum values at the same time, of course, you can perform two traversal and perform two (n-1) comparisons to obtain the maximum and minimum values. But this is not optimal, because we do not need to compare each number with both the maximum value and the minimum value.

Taking a set of even elements as an example, we can first compare the first two elements. The big one is set to max first, and the small one is set to min first, because the elements are even numbers, split them into two groups. First, compare them in the group, and then compare them with max. If they are larger than max, the value of max is replaced; otherwise, the value of max remains unchanged. The smaller values in the group are compared with min. Similar operations are performed until the complete meta set is traversed. The total number of comparisons is: 1 + (n/2-1) x 3 = 3n/2-2. If it is a set of odd elements, the first comparison operation is omitted, and max and min are set to the first element directly. The subsequent operations are the same as those of the even number. For an odd element set, the total number of comparisons is (n-1)/2) * 3. If we do not consider the odd or even numbers, we can obtain the maximum and minimum values by comparing at most 3 [n/2] (the maximum integer not greater than n/2, that is, the time complexity of the algorithm is O (n ).

The implementation code is also relatively simple:

# Include
 
  
Typedef int T; using namespace std;/** struct containing the result, which contains the maximum and minimum values */struct result {public: T max; T min; result (): max (0), min (0) {}}; result * getMinMax (int a [], int len); int main () {T a [9] = {5, 8, 0,-89, 9, 22,-1,-31, 98}; result * r1 = getMinMax (a, 9); cout <"maximum value: "<r1-> max <", minimum value: "<r1-> min <endl; T B [10] = {5, 8, 0, -89, 9, 22,-1,-31, 98,222 2}; result * r2 = getMinMax (B, 10); cout <"Maximum Value "<R2-> max <", minimum value: "<r2-> min <endl; delete r1; delete r1; return 0 ;} result * getMinMax (T a [], int len) {result * re = new result (); if (len = 0) {return 0 ;} if (len = 1) {re-> max = a [0]; re-> min = a [0]; return re;} if (len = 2) {re-> max = a [0]> a [1]? A [0]: a [1]; re-> min = a [0] <a [1]? A [0]: a [1]; return re;} int max, min; int I = 0; if (len % 2 = 0) {// if the number of elements is an even number, re-> max = a [I]> a [I + 1]? A [I]: a [I + 1]; re-> min = a [I] <a [I + 1]? A [I]: a [I + 1]; I + = 2;} else {// if the number of elements is odd, re-> max = a [I]; re-> min = a [I]; I ++;} while (I <len) {// compare values in pairs, then compare them with max and min respectively. max = a [I]> a [I + 1]? A [I]: a [I + 1]; min = a [I] <a [I + 1]? A [I]: a [I + 1]; I + = 2; re-> max = re-> max? Re-> max: max; re-> min = re-> min <min? Re-> min: min;} return re ;}

2. Find the I-th smallest element in a set, that is, the I-th percentile statistic.

It seems that it is much more complicated than finding the maximum and minimum values. In fact, the time complexity is the same as the above, it is also O (n ). For more information about quick sorting, see:Introduction to algorithms (quick sorting)

The method used here is similar to quick sorting. After you locate a mid position in quick sorting, You need to perform quick recursive sorting on the arrays on both sides of the mid. Here, we only need to find the position of the I-th percentile statistic. Therefore, this element is either located at the mid position, the array on the left of the mid, or the array on the right of the mid, therefore, you only need to consider one side. In terms of algorithm complexity, the expected time complexity of fast sorting is O (nlgn), and here is O (n ).

Specific ideas:

1. Fast sorting groupingBut with a slight improvement, the key value in each group is no longer a fixed position value, but is randomly selected from all elements as the key value, then swap it with the elements at the end. The advantage is that it can effectively avoid the extremely unbalanced status of each group: 0: n-1. (From the probability point of view, the random selection of the key value makes there is no fixed situation to make the worst case happen every time ).

2. After obtaining the random grouping result, let's first check whether the elements at the mid position are in the same position as the I-th percentile statistic we are looking.If yes, the elements at the mid position are directly returned. If we find that the mid position is prior to the I-th percentile statistic, we only need to perform the above operation on the part of the set after the grouping recursively. Otherwise, if we find that the mid is after the I-th percentile statistic, we only need to perform the above operation on the first half of the recursive heap grouping set.

The Code is as follows:

/*** Intended solution: Search for small I elements in the array by O (n) in a linear time */# include
 
  
Typedef int T; using namespace std; T randomizedSelect (T a [], int start, int end, int I); int randomizedPartition (T a [], int start, int end); int partitionArray (T a [], int start, int end); void swap (T * a, T * B); void printArray (T *, int len); void randomizedQuickSort (T a [], int start, int end); int main () {T a [10] = {1,999,-1025,654,185, 5, -9, 21, 8, 11}; cout <"array before sorting:"; // <endl; printArray (a, 10); int pos = 3; cout <"nth" <pos <"small elements:" <randomizedSelect (a, 0, 9, pos) <endl; randomizedQuickSort (a, 0, 9); cout <"sorted array:"; // <endl; printArray (a, 10); return 0 ;} /** search for the I-th percentile statistic, that is, the elements smaller than I */T randomizedSelect (T a [], int start, int end, int I) {if (start = end) {return a [start];} int q = randomizedPartition (a, start, end); int keyPos = q-start + 1; // obtain the minimum number of elements for this element. It is convenient to compare it with the smallest element. // compare it with I if (keyPos = I) {return a [q];} else if (keyPos> I) {return randomizedSelect (a, start, q-1, I);} else {// keyPos
  
   
= Key) {j ++; continue;} else {I ++; swap (a + I, a + j); j ++ ;}} I ++; swap (a + I, a + j); return I;}/** swap two elements */void swap (T * a, T * B) {T tmp = * a; * a = * B; * B = tmp;}/** print array */void printArray (T * a, int len) {for (int I = 0; I <len; I ++) {cout <a [I] <'';} cout <endl ;}

Why is the time complexity of the above algorithm O (n) When elements are different from each other? In the introduction to algorithms-the third edition, p121 ~ P122 has a detailed mathematical proof, and I will not go into details here. We can simply understand that for fast sorting, the expected time complexity is O (nlgn). After grouping, recursion is required for both sets on both sides. For selection algorithms, you only need to recursion the set on the other side. The speed is better than that in the fast sorting. When elements are different from each other, the time complexity is O (n ).

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to algorithms 7 (median and sequence statistic selection algorithms)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to algorithms 7 (median and sequence statistic selection algorithms)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support