The performance analysis of STL map and dictionary tree in keyword Statistics __ algorithm and data structure

Source: Internet
Author: User

Reprint Please indicate the source: http://blog.csdn.net/mxway/article/details/21321541

In the search engine in the usual number of keywords appear in the statistics, this article analyzes the use of C + + STL map for statistics, and the use of Dictionary tree statistics in the running speed, space and applicable occasions for analysis. First randomly generates 1 million 3-6-length strings. To simplify the problem, the string consists of only lowercase letters. Another randomly generated string of 100,000 length 3-8 is used to test the efficiency of the map and dictionary tree in queries.

Here is C + + code implemented using the map and dictionary tree:

STL map to achieve the source of statistics:

#include <iostream> #include <ctime> #include <fstream> #include <string> #include <map>

using namespace Std;
	int main () {clock_t start,end;
	Map<string,int> dict;
	string Word;
	Ifstream in ("Data.dat");
	start = Clock ();
		while (In>>word) {if (Dict[word]) {Dict[word] = 1;
		else {dict[word]++;
	} in.close ();
	end = Clock ();
	cout<< "STL Map statistics take time:" <<end-start<< "milliseconds" <<endl;
	Map<string,int>::iterator ITR = Dict.begin ();
	start = Clock ();
	Ofstream out ("OUT.txt");
		while (ITR!= dict.end ()) {out<<itr->first<< "" <<itr->second<<endl;
	itr++;
	end = Clock ();
	cout<< "STL map output to file takes time:" <<end-start<< "milliseconds" <<endl;

	Out.close ();
	start = Clock ();
	int sum1=0,sum2=0;
	Ifstream Searchin ("Search.dat");
		while (Searchin>>word) {if (Dict[word]!= 0) {sum1++;
		else {sum2++;
	} end = Clock (); cout<< "Find the word:" <<sum1<< "-->" << "No word found:" <<sum2<<endl;
	cout<< "Query takes time:" <<end-start<<endl;
return 0; }

Dictionary Tree Implementation Code:

#include <iostream> #include <string.h> #include <fstream> #include <ctime> using namespace std;
The char str[20];//is used when outputting words in the dictionary tree.
	struct Node {int cnt;
	struct Node *child[26];
		Node () {int i;
		For (i=0 i<26; i++) {child[i] = NULL;
	CNT = 0;

}
};
	* * * Inserts a string into the dictionary tree */Void Insert (node *root, char word[]) {node *p = root;
	int i,index;

	int len = strlen (word); For (i=0 i<len; i++) {index = word[i]-' a ';//Here is a hash algorithm that only considers the case of lowercase letters if (p->child[index] = NULL) {p->
		Child[index] = new Node ();
	} p = p->child[index];
p->cnt++;//the number of words plus 1.
	* * * * string output to File/void Outtofile (char *word,int cnt) {ofstream out ("OUT.txt", Ios::app);
	out<<word<< "" <<cnt<<endl;
Out.close ();
	* * * The word in the dictionary tree and its occurrences are output */void Outputword (Node *p,int length) {int i;
		if (p->cnt!= 0)//found a string {str[length] = ';
	Outtofile (STR,P-&GT;CNT);
			For (i=0 i<26; i++) {if (P->child[i]!= NULL) {Str[length] = i+ ' a ';//To restore the character Outputword (p->child[i],length+1) according to the subscript;
	}}/** * Query Word if it is in the dictionary tree */int Searchword (Node *p,char word[) {int i,index;
	int len = strlen (word);
		For (i=0 i<len; i++) {index = word[i]-' a ';
		if (p->child[index] = = NULL)//not found {return 0;
	} p = p->child[index];
	} if (p->cnt > 0) {return 1;//found} else//prefix string cannot be counted as having this word {return 0;
	}/* * Destroy dictionary tree */void Destroytrietree (Node *p) {int i;
		For (i=0 i<26; i++) {if (p->child[i)!= NULL) {destroytrietree (p->child[i]);
} delete p;
	int main () {node *root = new node ();
	Char word[20];
	clock_t Start,end;
	start = Clock ();
	Ifstream in ("Data.dat");
	while (In>>word) {Insert (Root,word);
	end = Clock ();
	cout<< "Use dictionary tree for Statistical time:" <<end-start<< "milliseconds" <<endl;
	start = Clock ();
	Outputword (root,0);
	end = Clock ();
	cout<< "Time to file output:" <<end-start<< "milliseconds" <<endl;
	In.close ();
	int sum1=0,sum2=0; StArt = Clock ();
	Ifstream Searchin ("Search.dat");
		while (Searchin>>word)/{if (Searchword (Root,word)) {sum1++;
		else {sum2++;
	} searchin.close ();
	end = Clock ();
	cout<< "Find word:" <<sum1<< "-->" << "No word found:" <<sum2<<endl;

	cout<< "Query takes time:" <<end-start<<endl; /** Destroy Dictionary tree */for (int i=0; i<26; i++) {if (Root->child[i]!= NULL) {destroytrietree (root->child[i));
Dictionary tree} return 0;
 }

The following is the operation of two programs under release version:



First, the running time aspect: From above can see in the statistics and the query process uses the dictionary tree the speed is obviously superior to the map. If the string length is n, there is a total of M keywords. Because map's bottom layer is supported by a red-black tree (a binary tree of the essence of a red-black tree), inserting a string into the map requires a log (m) time to find its location. In this log (m) times in each extreme case, you need to do n times comparison. So inserting a string into the map requires the time complexity of O (N*log (m)). The dictionary tree can be seen from the above program. Inserting a string is related only to the length of the string and is independent of the number of keywords, and has a time complexity of O (n). The dictionary tree takes a great deal of time when all the keywords and their occurrences are written to an external file. This is because the traversal of the dictionary tree is recursive and a lot of time is spent on building and destroying the stack.

Second, in the use of memory space

To insert a string A as an example, insert into the dictionary tree the True storage of useful data occupies only one space, and another requires 26 space pointer fields. Inserted into the map, the bottom is a red-black tree, the data occupy a space, and another two space pointer to its left and right children. So in terms of space use, map uses less memory space.

Third, the application of the occasion

(1) Comparison of Dictionary tree and map: 1. The dictionary tree inserts and queries a string more quickly than the map. 2.map uses less memory space than the dictionary tree. 3. A map is much faster than a dictionary tree when it is necessary to write data to an external file.

(2) The Application of Dictionary tree:

Using a dictionary tree is preferable to a map in a system where there is no need to write the data of the dictionary tree to an external file, and there is not much demand for memory space and a higher system response requirement. For example, in the 12306 site booking page, in the Starting box input BJ will prompt "Beijing" and other information.

Using a map is better than a dictionary tree in systems that have limited internal constraints on system responsiveness and in systems that need to write data stored in memory to external files.  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.