Http://stackoverflow.com/questions/185697/the-most-efficient-way-to-find-top-k-frequent-words-in-a-big-word-sequence
http://www.geeksforgeeks.org/find-the-k-most-frequent-words-from-a-file/
Http://cs.stackexchange.com/questions/26427/word-frequency-with-ordering-in-on-complexity
The idea is roughly as follows:
(1) Hash table statistics Word occurrences, and then look for top K appears, wherein the top K can use N*log (k) heap ideas, or quick-line ideas, or bucket sequencing ideas (previously in FBT to achieve real-time integration sequencing);
(2) using Trie to count the number of words, and then to facilitate trie, the use of heap sorting ideas to find top K;
(3) The use of buckets, especially when you know the maximum number of occurrences, similar to the previous implementation of the FBT real-time integration sorting, and then from large to small to remove the top K;
(4) Use map reduce.
The idea of statistical analysis of facets in lucene-essentially the same as word count count