I found the solution to this problem in the v_july_v article and implemented it using C ++. I found that the C ++ code is very concise.
It mainly uses hash_map in the standard library and priority_queue in the priority queue.
The idea of the algorithm is:
- Traverse the file from start to end and read every word in the file.
- Put the traversed word into hash_map and count the number of times the word appears.
- Traverse hash_map and put the occurrence times of the words to be traversed into the priority queue.
- When the number of elements in the priority queue exceeds K, the element with the lowest element level is taken out of the queue, so that the number of elements in the queue is always K.
- After traversing hash_map, the k elements with the most occurrences are left in the queue.
The specific implementation and results are as follows:
// The maximum number of occurrences is a word.
// The word void top_k_words () {timer t; ifstream fin; Fin. Open ("Modern c.txt"); If (! Fin) {cout <"Can nont open file" <Endl;} string s; hash_map <string, int> countwords; while (true) {fin> S; if (Fin. EOF () {break;} countwords [s] ++;} cout <"Total Number of words (repeated words are not counted):" <countwords. size () <Endl; priority_queue <pair <int, string>, vector <pair <int, string>, greater <pair <int, string> countmax; for (hash_map <string, int>: const_iterator I = countwords. begin (); I! = Countwords. end (); I ++) {countmax. push (make_pair (I-> second, I-> first); If (countmax. size ()> 10) {countmax. pop () ;}} while (! Countmax. empty () {cout <countmax. top (). second <"" <countmax. top (). first <Endl; countmax. pop ();} cout <"time elapsed" <t. elapsed () <Endl ;}