There are 10 million lines of duplicate SMS, saved as a text file, one line, duplicate. How do I find out the top 10 repetitions?
You can use a hash table method to divide 10 million pieces into groups for edge-scan edge-building hash lists. First scan, take the first byte, the tail byte, the middle random two bytes as a hash Code, inserted into the hash table. and record its address and information length and number of repetitions, 10 million information, record the information can be put down. The same hash code and the same length is suspected of the same, compared. The same record is added 1 times into the hash table, but the number of repetitions is added to 1. After a single scan, the number of repetitions has been recorded and the second hash table is processed. Use linear time selection to complete the first 10 searches at the O (n) level. After grouping each part of the TOP10 must be guaranteed to be different, can be hashed to ensure, or directly according to the size of the hash value to classify.
Linear time Selection:
Http://www.ezloo.com/2008/04/randomizedselect.html
What is a trie tree?
also called the word search tree, Trie tree, is a kind of tree-shaped structure, is a kind of hash tree variant. Typical applications are used for statistics, sorting, and saving a large number of strings (but not limited to strings), so it is often used by search engine systems for text frequency statistics.
Http://www.cnblogs.com/huangxincheng/archive/2012/11/25/2788268.html
Comparison of the sorting algorithms?
650) this.width=650; "src=" Http://img.ph.126.net/tdDjoEILparMm3Wl3_MWHw==/632192797708930620.png "alt=" 632192797708930620.png "/>
Stability of the sorting algorithm?
It is assumed that in the sequence of records to be sorted, there are multiple records with the same keyword, and if sorted, the relative order of the records remains the same, that is, in the original sequence, RI=RJ, and RI before RJ, and in the sorted sequence, RI is still before RJ, it is said that this sort algorithm is stable Otherwise, it is called unstable.
Heap Sort, quick sort, hill sort, Direct selection Sorting is not a stable sorting algorithm, while the base sort, bubble sort, direct insert Sort, binary insert sort, and merge sort are stable sorting algorithms.
Shell sort: http://www.cnblogs.com/kkun/archive/2011/11/23/2260288.html
Merge sort: http://blog.csdn.net/morewindows/article/details/6678165
There are 20 arrays, each with 500 arrays in descending order, each with a 32-bit unit, to find the largest of the 10,000 numbers.
Take a number from each of the 20 arrays and record the source array of each number, creating a large heap with 20 elements.
At this point the heap top is the largest number, take out the top element of the heap, and from the source array of the top elements of the heap to add the next number to the heap, and then take the maximum value, has been done 500 times.
Time Complexity: 500*LOG2 (20)
The principle of the Euclidean method? The complexity of time?
Set two to A, B (a>b), and the steps for A and B greatest common divisor (a, a) are as follows: with B In addition to a, a a÷b=q......r1 (0≤R1) is obtained. If r1=0, then (A, B) =b, if r1≠0, then use R1 in addition to B, B÷R1=Q......R2 (0≤R2). If r2=0, then (A, B) =r1, if r2≠0, continue to use R2 in addition to R1, ... So go on until you can divide it evenly. The last non-0 divisor is (A, b).
Time complexity O (logn).
String s = "a" + "B" + "C" + "D" + "E"; how many objects are created?
1 of them. "A" is a constant and only the literal value is stored.
What are the output results of the following programs?
String str = "ABCDEFGH";
System.out.println (str.dubstring (3,5));
Answer: "DE"
What's the difference between StringBuffer and StringBuilder?
The newly introduced StringBuilder class is not thread-safe, but its performance in a single thread is higher than stringbuffer.
The char type in Java can be assigned to any of the code points encoded in the Unicode-16, and the Chinese characters can be in English special notation.
Java Programmer Interview Book (Data structure section)