STD: Summary of map usage

Source: Internet
Author: User

In order to complete the assignments of the Web Search Course, I struggled for two days to implement the hierarchical clustering HAC algorithm and the clustering algorithm based on Affinity messages. To implement these two algorithms, the first thing is to compute the document vector. Specifically, the index words in the text set constitute a dimension of the vector space. In this way, M index words constitute M-dimensional feature vectors. STD: map is frequently used to construct feature vectors. Because I need to know the probability of this index word in a document. Here are some of my experiences to share with you:

1. OPERATOR []. This [] function is very effective, not only can reference the value corresponding to the key, but also the insert function. Demonstrate a basic usage first:
  

using namespace std;
...
map<string,int> elem;
....
//insert operation
...
//get inserted value
string keyword;
int freq = elem[keyword];

In this way, the value corresponding to the key in the map can be obtained! What should I do if the keyword I entered does not exist in this map? The [] Insert function is used. If the user fills in a keyword that the map does not have. OPERATOR [] can insert a new pair. And call the constructor of mapped data. Verify with code!

1 struct numidf
2 {
3 int num;
4 bool showup;
5 numidf ()
6 {
7 num = 0;
8 showup = false;
9 cout <"set to 0 and false" <Endl;
10}
11 };
12...
13 Map <string, numidf> m_idf;
14 // Insert elements
15...
16 // query Elements
17 string newkeyword; // The word m_idf does not contain
18 if (! M_idf [newkeyword]. showup)
19 {
20 cout <"construct a new one" <Endl;
21}

If the output of the above Code is

set to 0 and false
construct new one

That is to say, after a new key is input in [], map can automatically add a new pair. The key of the new pair is the entered newkeyword. Mapped data is the instance after initialization. This function is very good. I used to search for the find function first. If the function is new, manually add it. That would be complicated.

2. Use of map iterator
To be honest, I use fewer iterator. So I made several low-level mistakes. I will also remind myself of this article. The error is as follows: I want to implement a function similar to the following code.

vector<int> a;
for(int i = 0; i < a.size()-1; i++)
{
for(int j = i+1, j < a.size(); j++)
{
//some operation about i and j
}
}

I want to use iterator to implement the above functions, so I have the following tragic scene:

Map <string, int >:: iterator iteri;
....
// This is wrong!
Int I = 5;
Iteri = iteri + I;
// This is wrong!

I assume that iterator + will jump to the back. It cannot be compiled! A lot of errors have occurred !! Yes !!! So I used the following method:

 1 map<string,int> m_Tree;
2 map<string,int>::iterator iterI = m_Tree.begin();
3 map<string,int>::iterator iterJ;
4 int i = 0;
5 for( ; i < m_Tree.size()-1; ++iterI,i++)
6 {
7 //iterJ = m_Tree.begin();
8 //advance(iterJ, i+1);
9 iterJ = iterI;
10 iterJ++;
11 for(; iterJ != m_Tree.end(); iterJ++)
12 {
13 float s = S((iterI->dvmap),(iterJ->dvmap));
14 if(s > mostSim)
15 {//this is the pair
16 mostSim = s;
17 sp.s1 = iterI;
18 sp.s2 = iterJ;
19 }
20 }
21 }

I want to get the next element pointed to by iteri, so I used the 9-and 10-row method. In fact, lines 7 and 8 of Code are also acceptable, but not as efficient as Lines 9 and 10! If you have a better way to bring it to this function, please let me know!

3. In terms of performance, do not let STD copy the memory. Pass the pointer!
A multi-dimensional document vector is used to calculate document similarity. This large vector is processed using STD: vector. I have noticed two points in terms of performance.
1) use reserve to apply for enough memory. Preparing for push_back
2) Pay attention to push_back. If a vector <float> is declared in the function body. The size of this vector is very large. This is when you want to give it push_back to private members of the class, it is necessary to copy a large amount of memory.
Based on the above two points, I used the following method

1 vector<float> dv;
2 pair<map<string,vector<float> >::iterator,bool> pr;
3 pr = m_TF_IDF.insert(pair<string,vector<float> >(filename, dv));
4 vector<float>& rkdv = pr.first->second;
5 rkdv.reserve(m_IDF.size());

M_tf_idf is a private member of the class. I have inserted an empty vector. Then, the reference of the empty vector is taken out, as shown in row 4th. Then we can use the reference of a large vector to push_back new data, thus eliminating the need for memory replication.
Pointer usage is a fast and efficient implementation method to avoid Memory replication. In my program, I do not know where to use the aforementioned huge vector <float>. To allow anyone who wants to use vector <float> to use it, I passed the pointer of vector <float>. I have defined the following struct:

struct dvPair
{
string names;
map<string,vector<float>*> dvmap;
};

I passed in the vector <float> pointer instead of the vector <float>!

That's all. No more.

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.