Open source Word bag Model DBOW3 principle & source code

Source: Internet
Author: User
Tags idf

The predecessor picked the tree, posterity.

The source code is cmakelists on GitHub and can be compiled directly.

Bubble robot has a very detailed analysis, combined with a discussion of the loop detection of the word bag model, with Gao Xiang's loopback detection application, basically can be strung together.

The concept of TF-IDF, the expression is not unique, here is the definition of:

TF indicates the frequency of the word, the number of occurrences of the words in the image/image words total

IDF represents the common degree of words throughout the training corpus: Idf=log (N/ni), N is the total number of pictures in the training corpus, and Ni is the number of images that contain the word in the training corpus.

Since Ni<=n,idf>=0, when Ni=n, the IDF takes a minimum of 0 (if a word appears in every corpus, the word is too common to ignore)

After the visual dictionary is generated, IDF is fixed, and the weights for retrieving the database are calculated using TF*IDF.

Personal Understanding the DBOW3 Word (Word ID) corresponds to the weight of the IDF, the Retrieval portal (Entry Id) at the right value is TF*IDF.

Demo use:./demo_gernal orb image1 image2 image3 image4

In the execution directory generated by the input 4 of the characteristics of the VOC dictionary, and the four images for the retrieval portal, the DB retrieval library is generated, the db file contains a dictionary, so the db is not VOC.

The dictionary looks like this:

1 Vocabulary:2 k:9 //cluster Center number 3 l:3 //number of layers 4 scoringtype:05 weightingtype:06 nodes:7 -{nodeid:19, parentid:10, weight:2.8768207245178085e-01,8 DESCRIPTOR:DBW3 0 32 124 169 185 96 221 205 85 157 235 189 172 8 159 181 72 50 137 222 236 88 26 107 250 49 251 221 127 106 1989 -{nodeid:20, parentid:10, weight:1.3862943611198906e+00,Ten DESCRIPTOR:DBW3 0 32 120 164 24 104 249 61 80 95 115 27 172 24 31 147 64 248 145 152 76 56 26 111 82 23 219 223 226-202 One -{nodeid:21, parentid:10, weight:0., A DESCRIPTOR:DBW3 0 32 124 165 248 102 209 109 83 25 99 173 173 8 115 241 72 50 16 222 68 56 26 114 248 2 255 205 Notoginseng 121 226 234 198 - -{nodeid:22, parentid:10, weight:0., - DESCRIPTOR:DBW3 0 32 36 161 252 81 224 233 97 139 235 53 108 40 151 199 204 26 3 158 110 24 18 248 234 65 20 5 117 194 198 162} the -{nodeid:23, parentid:10, weight:1.3862943611198906e+00, - DESCRIPTOR:DBW3 0 32 76 140 56 218 221 253 113 218 95 53 44 24 24 211 106 116 81 154 18 24 18 26 122 245 95 109 146 137 212 106} - words: - -{wordid:0, nodeid:19} + -{wordid:1, nodeid:20} - -{wordid:2, nodeid:21} + -{wordid:3, nodeid:22} A -{wordid:4, nodeid:23} at-{wordid:5, nodeid:24}

is a description of the cluster-generated tree, every point on the tree is node, only the leaf node is word, each image comes in to extract features, use descriptor to calculate the feature distance, and eventually fall to the leaf (word), all the characteristics of the word form the image of the word vector.

Retrieve Portal Description:

1 Database:2 nentries:43 usingdi:04 dilevels:05 Invertedindex:6 - //(wordid:0) 7 -{imageid:1, weight:1.6807896319101980e-03}8 -{imageid:2, weight:3.2497152852064880e-03}9 -{imageid:3, weight:3.6665308718065778e-03}Ten - //(wordid:1)  One -{imageid:1, weight:4.0497295661974788e-03} A - //(wordid:2)  -          [] -       - the          [] -       - - -{imageid:2, weight:3.9149658655580431e-03} -       - + -{imageid:3, weight:4.4171079458813099e-03} -       - + -{imageid:1, weight:2.0248647830987394e-03} A-{imageid:3, weight:4.4171079458813099e-03}

Search Portal:

According to the VOC Dictionary description, Word ID 2 corresponds to Node ID 21, and Node ID 21 corresponds to a weight of 0, that is, Word 2 is too common, in the generation of visual glossary of the 4 images appear (refer to the Chinese article "of", "in", "and" and other common words), is not representative, so there is no corresponding entry ID, which is reasonable.

The source code does not add 1 votes to the same word entry, but calculates all the EntryID scores for the word, and then the first n. Scores can be L1 L2 KL and other methods of calculation

Queryl1,c++ not ripe for a long day, use the map function, note:

1 voidDATABASE::QUERYL1 (ConstBowvector &Vec,QueryResults &ret,intMax_results,intMAX_ID)Const3 {4 Bowvector::const_iterator Vit;5 6Std::map<entryid,Double>pairs;7Std::map<entryid,Double>:: iterator pit;8 9    for(Vit = Vec.begin (); Vit! = Vec.end (); + +Vit)Ten   { One     ConstWordid word_id = vit->First ; A     Constwordvalue& Qvalue = vit->second; -  -     Constifrow& row =m_ifile[word_id]; the  -     //Ifrows is sorted in ascending entry_id order -      for(Auto RIT = Row.begin (); RIT! = Row.end (); + +RIT) +     { -       ConstEntryId entry_id = rit->entry_id; +       Constwordvalue& Dvalue = rit->Word_weight; A  at       if((int) entry_id < max_id | | max_id = =-1) -       { -         DoubleValue = Fabs (qvalue-dvalue)-Fabs (Qvalue)-fabs (dvalue); -  -Pit =Pairs.lower_bound (entry_id); -         if(Pit! = Pairs.end () &&! (Pairs.key_comp () (entry_id, pit-> first))) in{ -Pit->second + =value; // If there are already entry_id, accumulate and   to         } +         Else -{//if not, insert this ID thePairs.insert (Pit, Std::map<entryid,Double>:: Value_type (entry_id, value)); *         } $       } -}//For each inverted row the}//For each query word +  A   //move to Vector the Ret.reserve (Pairs.size ()); +    for(Pit = Pairs.begin (); Pit! = Pairs.end (); + +pit) -   { $Ret.push_back (Result (Pit->first, pit->second)); $   } -  -   //resulting "scores" is now on [-2 best: 0 Worst] -   //sort vector in ascending order of scoreWuyi Std::sort (Ret.begin (), Ret.end ()); the   //(ret is inverted now--the lower the better--) Wu   //cut Vector -   if(Max_results >0&& (int) Ret.size () >max_results) About ret.resize (max_results); $  -   //Complete and scale score to [0 Worst: 1 Best] -   // || v-w| | _{L1} = 2 + Sum (|v_i-w_i|-|v_i|-|w_i|) -   //For all I | v_i! = 0 and W_i! = 0 +   //scaled_| | v-w| | _{L1} = 1-0.5 * | | v-w| | _{L1} the queryresults::iterator Qit; -    for(Qit = Ret.begin (); Qit! = Ret.end (); qit++) $Qit->score =-qit->score/2.0; the}

Open source Word bag Model DBOW3 principle & source code

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.