Programmer programming Art: the K-small (large) element in the given subscript range in the array

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Chapter 3 continued 3: Find the K small (large) element in the given subscript range in the array

Author: July, shangshanruoshui, and programming art room.
Source:Http://blog.csdn.net/v_JULY_v .

Prelude

The original fantasy series has been renamed:ProgramMember programming Art Series. The original fantasy group was renamedProgramming art room. The art of programming room is dedicated to the following three tasks: 1. To solve a problem, we are constantly looking for more efficientAlgorithmAnd programming. 2. solve practical application problems, such as chapter 10, how to sort disk files. 3. Research and Implementation of classical algorithms. Overall highlights: programming, how to efficiently program to solve practical problems. You are welcome to join us.

OK. In the previous chapter, we introducedChapter 10: how to sort disk files with 10 ^ 7 data volumesThe topic of this chapter is described below. We know that, in general, the problem of finding the K-small (large) elements in a given range is a typical example of a common data structure in ACM, that is, the partition tree/reverse merge tree, which is usually stored in the structure of the Line Segment tree.

Of course, this is not a table for the time being. The magic of tree Division is the structure of the Line Segment tree. It is hard for people without ACM to understand it. Therefore, here we provide a clever solution that is time-efficient and has little space cost-companion array.

If you have read the programmer's programming art before: Chapter 6, solving the adjoint array solution of the question about the affinity number within 5 million, that is, using the array subscript as the adjoint array, I believe that I will have a certain understanding of this method.

Section 1: Find the K-small (large) element in a given interval

How to solve the K-small number for given array, given range? For more information about the general method, see programmer programming art. Chapter 3: Find the minimum K number.

1. sort and fast sort. We know that the average time consumed for fast sorting is N * logn, and the N count is sorted from small to large. Then, we can traverse the last k elements in the sequence and output them, the total time complexity is O (n * logn + k) = O (N * logn ).

2. sort and select sort. Sort by selection or exchange, that is, to traverse the number of N. First, store the first K number in the array with the size of K, find the minimum number of K Kmax (the minimum element in the array where Kmax is set to k elements), and use o (k) (you should know, insert or select the O (k) time required for the sorting search operation, and then traverse the n-k count. X is compared with Kmax. If x <Kmax, X replaces Kmax, and find the largest element Kmax in the array of k elements again (thanks to jiyeyuran for reminding and correcting); If x <Kmax, the array is not updated. In this way, the time used for each update or non-update of the array is O (k) or O (0). The total time complexity is averaged as follows: N * O (k) = O (N * K ).

3. Maintain the maximum heap of k elements. The principle is consistent with the above 2nd solutions, that is, the number of K First traversed by the maximum Heap Storage with a capacity of K, assuming that they are the minimum K number, the heap time is O (K), and k1 <k2 <... kmax (Kmax is set to the smallest element in the max heap ). Continue to traverse the series, and traverse an element x each time. Compared with the heap top element, if X <Kmax, the heap is updated (logk used); otherwise, the heap is not updated. In this case, the total time is O (K + (n-k) * logk) = O (N * logk ). This method is benefited from the fact that in the heap, the time complexity of various operations such as search is logk (otherwise, as described in idea 2 above: the maximum K elements can be found directly using arrays, O (N * K )).

4. According to the preceding solution of programming, similar to the division method of fast sorting, N numbers are stored in array s, and then a random number X is selected from the array, divide the array into two parts: SA and SB. Sa <= x <= sb. If the k elements to be searched are smaller than the number of SA elements, return the smaller k elements in SA; otherwise, return the smaller K-| sa | elements in SA + sb. Continuously recursion, breaking down the problem into smaller ones. The average time complexity is O (n) (the complexity of N * logk described in the beauty of programming is incorrect. It should be O (N ), it is hereby revised. For detailed proof, see Chapter 3: programmer's preference:Chapter 3: finding the minimum k Number, Updated 10 times ).......

Next we will provide an adjoint array solution. First, we will define a struct. One is an array element, and the other is the original label of the array, which records the original order of each number in the array.

The following test data is used as an example (the red part indicates that the subscript is 2 ~ The numbers 5, 2, 6, and 3 between 5. The light part indicates the subscript of the array corresponding to the numbers in the array, and the light blue part indicates the given subscript range. Note: Here, let's start the array subscript from 1):

A [I]. Data 15 2 6 37 4
A [I]. NUM 12 3 4 56 7

Now, the question is given a subscript range, for example, subscript 2 ~ in the original sequence ~ 5 (the subscript is2, 3, 4, 5) The interval is 3rd small. The problem is also equivalent to finding the subindex range in the original sequence column, that is, the number of 2nd numbers to the number of 5th (5 2 6 3) and the number of 3rd small numbers (of course, the answer is obvious, A small number of 3rd is 5 ).

Then sort the original array and the resulting sequence is (Note: The original subscript remains unchanged.):

A [I]. Data 1 2 3 4 5 6 7
A [I]. NUM 1 3 5 7 2 4 6

As shown above, since the data has been sorted from small to large, we only need to perform a search. From the minimum to the maximum, we can find a small number in K (k = 3, when we find the subscript A [I]. num is equal to the range of the original subscript in 2 ~ 5, that isA [I]. num =2 | 3 | 4 | 5K --, then when k = 0, we will find a small number in K (3. As follows (the red part indicates the number in the original given subscript range, the light part is the subscript corresponding to the original number, and the light blue part is the index corresponding to the original given subscript range ):

A [I]. Data 12 345 67
A [I]. NUM 13 572 46
K 3 2 1 1 0

Therefore, the subscript index is 2 ~ The number in K (3) between 5 is 5.

Program Structure and explanation: after sorting, we can ensure that the original sequence has been sorted in ascending order. Therefore, when traversing or scanning the number in the given subscript range of the original sequence, then K --, the minimum K number can be found at k = 0, and this number is a certain number in the original given subscript range.
This accompanying array, or the index of the original sequence, helps us or help the computer to write down the original number, we can identify whether the number in the sorted sequence is a certain number in the given subscript range during later traversal. If it is the number in the original given subscript range, then K --, otherwise K remains unchanged.

Section 2: using the adjoint array Solution

The above method of using the adjoint array is clever and simple, and is well understood and implemented. The key is that the question requirement is to find the K small (large) element in the given subscript range. Therefore, basically, after sorting n * logn, you can always find the desired number in the O (n) time.Source codeAs follows:

// Copyright @ & July // The total time complexity is O (n * logn + n) = O (N * logn ). // July, updated, 2011.05.28. Early morning. # include <iostream> # include <algorithm> using namespace STD; struct node { int num, data; bool operator <(const node & P) const { return data <p. data; } }; node P [100001]; int main () { int n = 7; int I, j, A, B, C; // C: flag; for (I = 1; I <= N; I ++) { scanf ("% d ", & P [I]. data); P [I]. num = I; } sort (p + 1, p + 1 + n); // call the library function sort to complete sorting, complexity N * logn scanf ("% d", & A, & B, & C ); for (I = 1; I <= N; I ++) // scan once, complexity n { If (P [I]. num> = A & P [I]. num <= B) C --; If (C = 0) break; } printf ("% d \ n", P [I]. data); return 0; }

Program test:Number of the input 1st rows1 5 2 6 3 7 4Represents the given array, In the number in the second row,2 5Represents the given subscript range 2 ~ 5,3Indicates that the range of the given subscript is 2 ~ 5. Search for a number smaller than 3rd.5Indicates the small number of 3rd found. The program running result is as follows:

Originally written by water Code (My transformation above is to achieve the O (n) visual effect when scanning later ):// Copyright @ # include <iostream> # include <algorithm> using namespace STD; struct node { int num, data; bool operator <(const node & P) const { return data <p. data; } }; node P [100001]; int main () { int n, m, I, j, A, B, C; // C: flag; while (scanf ("% d", & N, & M )! = EOF) { for (I = 1; I <= N; I ++) { scanf ("% d", & P [I]. data); P [I]. num = I; } sort (p + 1, p + 1 + n); for (j = 1; j <= m; j ++) { scanf ("% d", & A, & B, & C ); for (I = 1; I <= N; I ++) { If (P [I]. num> = A & P [I]. num <= B) C --; If (C = 0) break; } printf ("% d \ n", P [I]. data); } return 0; }

Section 3: directly sort the number of given subscripts

You may ignore an important fact and wonder whether the reader is aware of it. The question is that we need to find a small K number in the given subscript range in the array, that is, we only need to find the small K number. However, one drawback of the above program is that it first sorts the entire array, and then uses the accompanying array solution to find a small number of K. The fact is that we do not need to sort the entire array. We only need to partially sort the number of subscripts given in the array of the number we are looking.

Yes, it's that simple. Instead of sorting the entire array, we just need to sort the numbers in the subscripts given in the array directly. In this case, the time complexity of the algorithm is reduced to L * logk. Here, L = | B-A + 1 |, L is the length of the given subscript range, n, l <= n relative to the entire array. The program code is as follows.

// copyright @ wolf // sort the number of values in a given range directly without using an accompanying array. # include <iostream> # include <algorithm> using namespace STD; struct node { int data; bool operator <(const node & P) const { return data <p. data; } }; node P [100001]; int main () { int n = 7; int I, A, B, C; // C: flag; for (I = 1; I <= N; I ++) { scanf ("% d ", & P [I]. data); } scanf ("% d", & A, & B, & C); // B, A is the subscript index of the original array sort (p + A, P + B + 1); // sort the given range directly, | B-A + 1 | * log (B-A + 1) printf ("the number is % d \ n ", P [A-1 + C]. data); return 0; }

Program testing: We also adopt the test case in section 2. Number of the input 1st rows1 5 2 6 3 7 4Represents the given array, In the number in the second row,2 5Represents the given subscript range 2 ~ 5,3Indicates that the range of the given subscript is 2 ~ The number in 5, that is, from a [2] ~ In a [5], search for a number smaller than 3rd and the value of the third row.5Indicates the small number of 3rd found. The program running result is as follows.

It seems that the above sort the numbers in a given range directly, which is more efficient than the adjoint array solution in section 2. In this case, is the accompanying array an alternative? Otherwise, @ water: If I sorted the data between 2 and 5, the data will be destroyed. How can I perform two operations? That is, the current 2 position is no longer the initial 2 position data. That is to say, the method for directly locating subscripts after fast sorting can be used only once.

OK. For more information, see section 4 "Hundred Schools of contention" and "classic dialogue ".

Section 4 advantages of adjoint Array

Competition

@ Yu Xiang: This method of adjoint array is indeed quite novel. The premise of adjoint array is sorted, but the total complexityOr 0 (N * logn + n) = O (N * logn ),This type of questions with the K-digit size has the following restrictions: 1. A large number of questions may make you unable to store in the memory. 2. strict requirements on complexity, that is, sorting is not allowed. Of course, even if the given subscript range is directly sorted in section 3, the complexity is also L * logl, and L is the length of the given range. In fact, to solve the problem of "finding a small (large) K element from the number in the given subscript range", we should choose heap as well. On the basis of the above: when entering the heap, you only need to check whether the subscript of this element is within the given subscript range. If not, the complexity is low and no sorting is required. The average time complexity is O (n), but it is not commonly used to quickly select the select algorithm. For details, refer:Chapter 3 continued: in-depth analysis and implementation of quick select Algorithm.
@ Water: the adjoint array solution aims to pre-process overhead for search overhead. Sort the numbers of given subscripts directly, and the complexity is acceptable when processing small data volumes. However, when dealing with large data volumes, such as 10 Gb of data, every time we take a 1g segment, the use of the adjoint array method will highlight the advantage, but the overhead of preprocessing is indeed a big point. The essence of the companion array is to solve multiple accesses to the same data within a stable period of time. To put it bluntly, it is the same array. It is advantageous to constantly find the number of K smaller numbers in different subscripts given in the array. For details, you can also look at this question:Http://poj.org/problem? Id = 2104.
@ July: You don't have to look at me. Basically, you agree with the above-mentioned views. Yu Xiang considers that the adjoint array is not desirable because the problem raised by water is not taken into account, that is, if you want to search for the number of small K numbers from different subscripts in the array multiple times or constantly. In this case, the advantages of arrays are shown. OK. You can continue to read the following classic dialogue. Believe that you can find the answer you want.

Classic dialogue

Search for a [0] ~ The K value in a [n-1] is smaller, and then a [1] ~ The K in a [n] is small and returns in turn. When you find it several times, the advantage is obvious. In fact, it is compared to the cost of using the accompanied array solution n log n + M * n (m for the number of given different intervals) and direct sorting M * (L * log L) (L indicates the length of the given subscript range), which is lower. TheAdjoint ArrayThe worst case is nlogn + M (n-1), whileDirect sortingCost. The worst case is M * (n-1) * log (n-1 )). When M> 0 and N> 0, the sorting time-accompanied time = m * n * logn-N * logn-Mn = m-1) N * logn-Mn is constant,Conclusion:That is, when the number of K decimal places needs to be constantly searched from different given subscripts, when the data size is large, it is accompanied by the array effect.HengyouThe number of parts in the given subscript interval is sorted directly each time.
Yes, for example, if I have given another subindex range and want you to find the number smaller than K, you cannot sort it every time. In the companion array scheme, the companion Array records the numbers corresponding to their respective subscripts. Therefore, the second time we look for the number of small K numbers in different subscripts, we only need to scan it again. The complexity is still O (n ). Therefore, given different subscripts to search for a small number of K, the complexity is M * n plus the complexity of the previous sort preprocessing, N * logn, the total time complexity is O (n * logn + M * n) (M is the number of given intervals ). The price for directly sorting the numbers in a given subscript range is L1 * logl1 + l2 * logl2 +... + li * logli. When M> 0 and N> 0, which complexity is large or small, we can see at a glance the great advantages of the accompanying array.
Well, the actual example is as follows: we have more than 1 million web pages each day. We have n common sources. Then, we need to determine the analysis of the click source for each time period and one week or even the whole month each day. The database's inventory data volume is huge, copy costs a lot, and internal sorting costs a lot. If you want to make such a statistical chart, I will shed my tears. If you sort the data every time, you will die.

Original example Reproduction

Okay, you may not understand what it is after talking so much. Let's start with the example in section 1. We are looking for a given subscript range.2 ~ 5It is true that we have two options: 1. the adjoint array described in Section 1 and section 2 above. 2. Sort the numbers between 2 and 5 in the subscript range. Next, we will only review the adjoint array solution.

Adjoint Array

A [I]. Data 15 2 6 37 4
A [I]. NUM 12 3 4 56 7

After the first sorting:

A [I]. Data 1 2 3 4 5 6 7
A [I]. NUM 1 3 5 7 2 4 6

Search with Arrays:

A [I]. Data 12 345 67
A [I]. NUM 13 572 46
K 3 2 1 1 0

Okay, now, if the question requires you to be in the subscript range of the previous Array3 ~ 6Where can we find a 3rd small number (the answer is obvious, 6 )?

A [I]. Data 1 52 6 3 74
A [I]. NUM 123 4 567

Is it sorted directly? OK. Let's move on to the next step, assuming that some readers may still choose to directly sort the subscript 3 ~ The number between six. But have you ever thought that every time you sort the numbers corresponding to different subscripts, you not only destroy the original data, but also ifRange coverageThen, we can no longer locate the original data by using the original direct subscript, and sort the data each time, the average time complexity is N * logn. As described in the classic dialogue above, the overhead will be very large and will be L1 * logl1 + l2 * logl2 +... + li * logli.
So what should we do if we adopt the adjoint array method? As shown in the following figure, when k = 0, we also found a number of 6 smaller than 3rd.Previous sortingIn the future, Can I scan O (n) for different subscripts? The complexity is O (n * logn + M * n) (M is the number of intervals for given subscripts ).
Based on the content in the classic dialogue above, we already know that when m> 0 and N> 0 (M is the number of intervals for given different subscripts, n is the array size), sorting time-adjoint time = m * n * logn-N * logn-Mn = (m-1) N * logn-Mn hengzheng. Yeah, I believe you understand.

Adjoint Array

After the first sorting:

A [I]. Data 1 2 3 4 5 6 7
A [I]. NUM 1 3 5 7 2 4 6

Run O (n) to scan again:

A [I]. Data 12 34 56 7
A [I]. NUM 13 57 24 6
K 3 2 1 1 1 0

(In the past, some readers did not realize the meaning of the accompanying array, because the average user only looks for the array once and does not think of the second or multiple searches)

Programming monologue

For 40 minutes, you can think about 10 minutes, write code in 30 minutes, and finally waste some time debugging. You can also think about it for half an hour, thoroughly understand the nature of the problem and the context of the program, and then write the code in ten minutes to realize the feeling that the code is like a cloud flow.

This chapter is complete.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Programmer programming Art: the K-small (large) element in the given subscript range in the array

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Programmer programming Art: the K-small (large) element in the given subscript range in the array

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support