Question requirements:
Design the data structure and write an algorithm to calculate the number of times each word appears in an article, and output the word and its number in the order it appears in the article.
Personal thoughts:
Uses the idea similar to a key tree to store words and the number of times a word appears. The tree forms a word from the root node to the leaf node, each node only stores a character in the word;
Represent the tree by the child brother method, and use the left child to record the number of times a word appears (see the tree structure chart), so that each word has a left child (create a new node and create a left child ).
In this way, we can obtain the number of words and the number of words. In the question, we need to output words in the order of appearance. We can use a list or vector to store word records, this record stores the word and points to the Tree node pointer representing the number of access records for the word. After reading this article, output the records in list or vector.
Other ideas:
(1) it can be implemented by map in STL.
(2) words can be hashed and stored in an array. If the hash conflicts, the link address method is used to add the linked list after the array elements to store words and times, store the first record pointer to a vector or list.
Implementation:
The tree structure and word records are defined as follows:
// The child brother method indicates the tree typedef struct keynode {char data; unsigned int count; struct keynode * son, * Next;} * keytree; // Word Record // pointer to the node that stores the word and points to the number of occurrences of the word struct tword {string word; struct keynode * pnode ;};
Tree Structure:
The tree formed after the words "ah" and "she" are queried;
The following uses the word "ah" as an example to describe how to construct a tree:
(1) initialize a tree to generate a root node and a left child with empty characters
(2) The word "ah" first looks for the character 'O' from the tree. If no character is found, insert the node to the root of the tree and assign it to 'A ', at the same time, create the left child with empty characters, and set the number of times to 0 (This number indicates the number of times the word "A" appears)
(3) then use the same method to generate a new node by pressing the method in (2) and insert it to (2) to generate a node under 'A '.
(4) After reading a word, the number of occurrences of the Left subtree of 'H' plus 1 indicates that the occurrence of the word "ah" is 1, and the pointer and word name of the Left subtree are placed in the vector
(5) generate "she" Words and Their times in the same way
If the same word appears, you only need to set the number of left child nodes of the corresponding ending character to + 1.
Functions used to search for and insert words:
// Check whether a string pstr exists in the given key tree. // If yes, true is returned. If this string is queried for the first time, the string is the first query and P is used to return // pointer to the end leaf node of the string; // if no falsebool findstring (const keytree ptree, char * pstart, const char * pend, bool & isfirstfind, keytree & P) {assert (ptree! = NULL & ptree-> son! = NULL); Assert (pstart! = NULL); Assert (pend! = NULL); char * pcur = pstart; keytree pcurtree = ptree-> son; while (pcurtree) {keytree pnext = pcurtree-> next; // find the corresponding characters from the sibling until it finds that if the sibling is empty while (pnext & pnext-> data! = * Pcur) {pnext = pnext-> next;} // If (pnext = NULL) not found return false; // locate and go to the next layer to determine pcurtree = pnext-> son; pcur ++; // after string matching, if (pcur> pend) break is found successfully;} keytree Pson = pcurtree; // For the first access, the child node does not exist. Create if (Pson = NULL) {Pson = createkeynode (end_word); Pson-> COUNT = 1; isfirstfind = true; P = Pson;} else {// number of occurrences + 1 Pson-> count ++; If (Pson-> COUNT = 1) {isfirstfind = true; P = Pson ;}} return true ;}
// Insert the string into the key tree // create a new tree node from the first unmatched character storage string character // P returns the pointer bool pointing to the end leaf node of the string insert (const keytree ptree, char * pstr, const char * pend, keytree & P) {assert (ptree! = NULL & ptree-> son! = NULL); Assert (pstr! = NULL); Assert (pend! = NULL); char * pcur = pstr; keytree pcurtree = ptree-> son; while (pcur <= pend) {keytree pnext = pcurtree-> next; // find the corresponding characters from the brothers until all brothers are found or found while (pnext & pnext-> data! = * Pcur) {pcurtree = pnext; pnext = pnext-> next;} // If (pnext = NULL) {pnext = createkeynode (* pcur) is created if (pnext = NULL ); pnext-> son = createkeynode (end_word); pcurtree-> next = pnext;} // The current character is matched, matching the next character pcurtree = pnext-> son; pcur ++ ;} // number of occurrences + 1 pcurtree-> count + +; P = pcurtree; return true ;}
Output of the string:
Attachment source code file: Key tree statistics. Zip