Preface:
Before we analyze the MapReduce, Hive, Redis and Storm, Spark, and other tools to achieve group top n issues, let's look at the Java most original implementation of the top method to lay some foundation for the back, which is why I want to organize into a series.
For top n problems, this is achieved in four ways based on the data characteristics, such as merging method, fast scheduling process method, size stack and priorityqueue fixed queue. merger Law:
Data Description: This method is suitable for several arrays in order to find top K.
Implementation description: Using the method of merge, set an array subscript scan position to record the temporary array and the top result array, and then start traversing all the arrays and compare the sizes from the temporary array record subscript, and store the maximum value in the result array, with the maximum value corresponding to the array subscript adding a temporary array to allow it to begin traversing from the next Time Complexity of O (k*m). (M: the number of arrays).
Specific implementation:
Package FJDM;
Import java.util.ArrayList;
Import java.util.List; /** * Known several descending ordered m arrays, the number of k large before these data * a[4,3,2,1],b[6,5,3,1]-> result[6,5,4] * @author Zhang En Preparation * @date 2016-11-25 a.m. 10:57:0 3 */public class topkbymerge{public static int[] GETTOPK (list<list<integer>>input,int k) {int index[]=new
Int[input.size ()];//saves the location of each array subscript scan;
int result[]=new Int[k];
for (int i=0;i<k;i++) {int max=integer.min_value;
int maxindex=0;
for (int j=0;j<input.size (), j + +) {if (Index[j]<input.get (j). Size ()) {if (Max<input.get (j). Get (Index[j])) {
Max=input.get (j). Get (Index[j]);
Maxindex=j;
}} if (Max==integer.min_value) {return result;
} Result[i]=max;
Index[maxindex]+=1;
return result;
public static void Main (string[] args) {list<integer> a = new arraylist<integer> ();
A.add (4);
A.add (3);
A.add (2);
A.add (1);
List<integer> B = new arraylist<integer> ();
B.add (6); B.add(5);
B.add (3);
B.add (1);
list<list<integer>> ab = new arraylist<list<integer>> ();
Ab.add (a);
Ab.add (b);
int r[] = GETTOPK (AB, 3);
for (int i = 0; i < r.length i++) {System.out.println (r[i]); }
}
}
Quick-Arrange process method:
Data description: Suitable for unordered single arrays, the fast-scheduling process uses a fast ordering process to find top K.
Implementation steps: According to the fast-scheduling rules, select a number as the benchmark (code is the last number) to differentiate the data, and record the benchmark number of the final placement subscript, and finally judge the subscript and k-1 value size (subscript starting from 0), not equal to continue to the direction of the number of k-1:
The subscript is less than k-1, and the right side of the subscript (partion,end) continues to be divided into two points;
The subscript is greater than k-1, and the left side of the subscript (first,partion) continues to be two points;
Until the k number is top, but the K number is not in order.
Specific implementation:
Package FJDM;
/**
* Using a fast sequencing process to find the smallest number of K
* @author Zhang Enbei
* @date 2016-11-25 a.m. 11:59:45/Public
class topk{
int partion (int a[],int first,int end) {
int i=first;
int main=a[end];
for (int j=first;j<end;j++) {
if (a[j]<main) {
int temp=a[j];
A[j]=a[i];
A[i]=temp;
i++
}
}
A[end]=a[i];
A[i]=main;
return i;
}
void Gettopkminbysort (int a[],int first,int end,int k) {
if (first<end) {
int partionindex=partion (a,first , end);
if (partionindex==k-1) return;
else if (partionindex>k-1) Gettopkminbysort (a,first,partionindex-1,k);
else Gettopkminbysort (a,partionindex+1,end,k);
}
}
public static void Main (String []args) {
int a[]={2,20,3,7,9,1,17,18,0,4};
int k=6;
New TOPK (). Gettopkminbysort (a,0,a.length-1,k);
for (int i=0;i<k;i++) {
System.out.print (a[i]+ "");}}}
with a small top heap or a large top heap:
Data description: To find the largest k using a small top heap, and the minimum k to use a large pile.
Implementation steps: According to the data before K-node to establish a small top heap, in the back of the N-k data scan,
If the data is greater than the root node of the small top heap, the value of the root node is overridden for that data and the node is tuned to the small heap.
If the data is less than or equal to the root node of the small top heap, the small Gan does not change.
The minimum k is similar to this one for the largest k. Time complexity O (NLOGK) (n: The length of the data), especially for large data for the top K.
Specific implementation:
Package FJDM; /** * For the front of the largest K solution: small Gan (especially large to memory can not be accommodated), in favor of using a heap) * @author Zhang En * * @date 2016-11-25 PM 12:15:36/public class Top kbyheap {/** * Create k node Gan * * @param A * @param k * @return * * int[] createheap (int a[], int k) {int[]
result = new Int[k];
for (int i = 0; i < K; i++) {result[i] = A[i];
for (int i = 1; i < K; i++) {int child = i;
int parent = (i-1)/2;
int temp = A[i];
while (parent >= 0 &&child!=0&& result[parent] >temp) {Result[child] = result[parent];
Child = parent;
Parent = (parent-1)/2;
} Result[child] = temp;
return result;
} void Insert (int a[], int value) {a[0]=value;
int parent=0;
while (parent<a.length) {int lchild=2*parent+1;
int rchild=2*parent+2;
int minindex=parent;
if (Lchild<a.length&&a[parent]>a[lchild]) {minindex=lchild;
} if (Rchild<a.length&&a[minindex]>a[rchild]) { Minindex=rchild;
} if (minindex==parent) {break;
}else{int temp=a[parent];
A[parent]=a[minindex];
A[minindex]=temp;
Parent=minindex;
}} int[] Gettopkbyheap (int input[], int k) {int heap[] = This.createheap (input, k);
for (int i=k;i<input.length;i++) {if (input[i]>heap[0]) {This.insert (heap, input[i]);
} return heap;
public static void Main (string[] args) {int a[] = {4, 3, 5, 1, 2,8,9,10};
int result[] = new Topkbyheap (). Gettopkbyheap (A, 3);
for (int temp:result) {System.out.println (temp); }
}
}
priorityqueue Priority queue:
Data Description: Priorityqueue is a new data structure interface provided from JDK1.5, which is a maximum priority queue based on the priority heap. A priority queue is another queue that differs from a first-in first-out queue. The element with the highest priority is taken out of the queue each time. If Comparator is not provided, the elements in the precedence queue are sorted by default in natural order, that is, the number is small by default in the queue header, the string is in dictionary order (see comparable), or it can be specified according to Comparator, depending on which construction method is used. The priority queue does not allow null elements. Priority queues that rely on natural sorting also do not allow inserting objects that are not comparable (this can cause classcastexception).
Implementation steps: Priorityqueue constructs the fixed capacity priority queue, simulates the big top heap, this kind of queue itself array realizes, does not have the capacity limit, may specify the queue length and the comparison way, then presses the data in order, when the queue full will poll the small value, finally needs to note that, The priorityqueue itself is unordered, and you can use the built-in poll () method to fetch the elements from the team head each time.
Specific implementation:
Package FJDM;
Import java.util.ArrayList;
Import java.util.Collections;
Import Java.util.Comparator;
Import Java.util.Iterator;
Import java.util.List;
Import Java.util.PriorityQueue;
Import Java.util.Random; /** * Fixed capacity of the priority queue, simulation of the large pile, used to solve topn small problem * @author Zhang En * * @date 2016-11-25 PM 02:29:31 * * public class fixsizedpriorityqueue& Lt
E extends comparable> {private priorityqueue<e> queue; private int maxSize;
Maximum capacity of the heap public fixsizedpriorityqueue (int maxSize) {if (maxSize <= 0) throw new IllegalArgumentException ();
This.maxsize = maxSize; This.queue = new Priorityqueue (maxSize, New comparator<e> () {public int compare (e O1, E O2) {//Generate maximum heap usage o2-o
1, generate the minimum heap using O1-o2 and modify E.compareto (Peek) comparison rule return (O2.compareto (O1));
}
});
public void Add (e e) {if (Queue.size () < maxSize) {//Not up to maximum capacity, add queue.add directly (e);
else {//queue is full E peek = Queue.peek ();
if (E.compareto (Peek) < 0) {//Compare the new element with the current heap top element, preserving the smaller element queue.poll ();Queue.add (e);
}} public list<e> SortedList () {list<e> List = new arraylist<e> (queue); Collections.sort (list);
The traversal of the priorityqueue itself is unordered, and ultimately it is necessary to sort the elements in the queue return list;
public static void Main (string[] args) {final Fixsizedpriorityqueue PQ = new Fixsizedpriorityqueue (10);
Random Random = new Random ();
int rnum = 0;
System.out.println ("Random number between 100 0~999:-----------------------------------");
for (int i = 1; I <= i++) {rnum = Random.nextint (1000);
System.out.println (Rnum);
Pq.add (Rnum);
} System.out.println ("Traversal of the Priorityqueue itself is unordered:-----------------------------------"); iterable<integer> iter = new iterable<integer> () {public iterator<integer> iterator () {return P
Q.queue.iterator ();
}
};
for (Integer item:iter) {System.out.print (item + ",");
} System.out.println ();
SYSTEM.OUT.PRINTLN ("Traversal after priorityqueue sort:-----------------------------------"); * * FOR (Integer iteM:pq.sortedlist ()) {System.out.println (item);}
////or directly with the built-in poll () method, each fetch the first element of the team (the maximum of the heap top) while (!pq.queue.isempty ()) {System.out.print (Pq.queue.poll () + ","); }
}
}