This blog has previously written four articles on sift. For details, refer to IX. Image Feature Extraction and matching sift algorithm, 9 (continued). Compile and implement the sift algorithm, 9. Learn how to use the C language to implement the sift algorithm, and 9. Learn how to use the C language to implement the sift algorithm and the next step.
The above four articles detail the principles of the sift algorithm and the implementation of the C language, and use sift for image matching. Now we want to apply the sift algorithm to higher-level applications.Target Recognition: Discovering the object categories in an image is one of the most basic and important tasks in the computer vision field.
In addition, the original classical algorithm research series may be renamed as the classic algorithm's popular romance. Renaming takes into account three points: 1. It does not cover all aspects of all algorithms, so it is called "Pearl"; 2. It highlights the characteristics of the algorithm content in this blog-easy to understand, concise and straightforward, it is referred to as "popular". 3. It focuses on the research and implementation of classical algorithms and its practical application ".
Okay. Let's just talk about it. In the previous article, we introduced the six (continued) and talked about the BM algorithm step by step from the KMP algorithm. Next we will introduce the bag-of-words model, an application of the sift Algorithm for target recognition.
- Overview of the bag-of-words Model
The bag-of-words model is a common document Representation Method in the information retrieval field. In information retrieval, the Bow model assumes that a document ignores its word order, syntax, syntax, and other elements and regards it as a collection of several words, the appearance of each word in the document is independent from that of other words. That is, any word that appears at any position in the document is selected independently without being influenced by the meaning of the document. For example, there are two documents:
1: Bob likes to play basketball, Jim likes too.
2: Bob also likes to play football games.
Construct a dictionary based on the two text documents:
Dictionary = {1: "Bob", 2. "Like", 3. "to", 4. "Play", 5. "Basketball", 6. "Also", 7. "football", 8. "Games", 9. "JIM", 10. "too "}.
This dictionary contains a total of 10 different words. Using the index number of the dictionary, each of the above two documents can be expressed using a 10-dimensional vector (with an integer 0 ~ N (n is a positive integer) indicates the number of times a word appears in the document ):
1: [1, 2, 1, 1, 1, 0, 0, 0, 1, 1]
2: [1, 1, 1, 1, 0, 1, 1, 1, 0]
Each element in the vector represents the number of occurrences of the relevant element in the dictionary in the document (in the following section, it is represented by the word histogram ). However, we can see in the process of constructing a document vector that we did not express the order of words in the original sentence (this is one of the shortcomings of the bag-of-words model, but it doesn't even matter ).
- Application of the bag-of-words Model
Applicable scenarios of the bag-of-words Model
Now imagine a huge collection of documents, D, which contains a total of M documents. After all the words in the documents are extracted, a dictionary Containing N words is formed together, using the bag-of-words model, each document can be represented as an n-dimensional vector, and computers are very good at processing numerical vectors. In this way, you can use a computer to classify a large number of documents.
Apply the bag-of-words model to image representation. To represent an image, we can regard the image as a document, that is, a set of several "visual Words". Similarly, there is no sequence between visual words.
Figure 1 apply the bag-of-words model to Image Representation
Because the words in the image are not as ready-to-use as in text documents, We need to extract Independent Visual words from the image first. This usually requires three steps: (1) feature detection, (2) Feature Representation. (3) generating a single word base. See section 2:
Figure 2 extract Independent Visual words from images
Through observation, we will find that, although there are differences between different instances of the same category, we can still find some common areas between them, such as faces, although the faces of different people are quite different, the eyes, mouths, noses, and other small parts are not much different, we can extract the common parts of these different instances as visual words to identify such objects.
The sift algorithm is the most widely used algorithm for extracting local Invariant Features in images. Therefore, we can use the sift algorithm to extract invariant feature points from images as visual words and construct word lists, use words in the word list to represent an image.
Three steps for applying the bag-of-words Model
Next, we use the above image to demonstrate how to use the bag-of-words model to represent an image as a numerical vector. There are three target categories: Face, bicycle, and guitar.
The first step of the bag-of-words model is to use the sift algorithm to extract visual words from each type of image and aggregate all visual words, as shown in Figure 3:
Figure 3 extract visual vocabulary from each type of image
The second step is to use the K-means algorithm to construct a Word Table. The K-means algorithm is an indirect clustering method based on similarity measurement between samples. This algorithm uses K as a parameter to divide n objects into k clusters, in order to make the cluster have a high similarity, while the similarity between clusters is low. The K-means algorithm can be used to merge words with similar word meanings based on distance between visual word vectors extracted by sift as the basic word in a word table. Suppose we set K to 4, the construction process of the Word Table is shown in Figure 4:
Figure 4 Use the K-means algorithm to construct a Word Table
The third step is to use words in the word list to represent images. Using the sift algorithm, you can extract many feature points from each image. These feature points can be replaced by words in the word vocabulary. by counting the number of times each word appears in the image in the word vocabulary, the image can be expressed as a k = 4-dimensional numeric vector. See 5:
Figure 5 histogram representation of each image
5. We extract different visual words from the face, bicycle, and guitar objects. In the constructed vocabulary, visual words with similar meanings are merged into the same category. after merging, the vocabulary contains only four visual words marked as 1, 2, 3, and 4 by index values. We can see that they are bicycle, face, guitar, and face. The histogram representation of each image can be obtained by counting the number of times these words appear in different target classes (we assume there is an error, but the actual situation is still the case ):
FACE: [3, 30, 3, 20]
Bicycle: [20, 3, 3, 2]
Guitar: [8, 12, 32, 7]
In fact, this process is very simple. It is used to extract similar parts (or combine visual words with similar meanings into the same category) from the face, bicycle, and guitar documents to construct a dictionary, the dictionary contains four visual words: dictionary = {1: "bicycle", 2. "face", 3. "guitar", 4. "face"}, the final face, bicycle, and guitar documents can all be represented by a four-dimensional vector, finally, the corresponding histogram is drawn based on the number of occurrences of the corresponding parts of the three documents.
It should be noted that the above process is just a very simple example for the three target classes. In practical application, in order to achieve better results, the number of words in the word list K is usually very large, in addition, the larger the number of target classes, the larger the corresponding K value. Generally, K is measured in hundreds to thousands. Here K = 4 is just for convenience.
Next, we will summarize how to use the bag-of-words model to represent an image as a numerical vector:
- Step 1: Use the sift algorithm to extract visual word vectors from images of the same type. These vectors represent local unchanged feature points in the image;
- Step 2: Combine all feature point vectors into one, and use the K-means algorithm to combine visual words with similar meanings to construct a word table containing K words;
- Step 3: count the number of times each word appears in the image in the word table, and then represent the image as a K-dimensional numeric vector.
Next, follow these steps to use C ++ to implement the above process step by step.
- C ++ is gradually implemented: the bag-of-words model represents an image.
Before coding, we need to work with the development environment in advance.
1. Build a Development Environment
The development platform used is Windows XP SP3 + vs2010 (Windows XP SP3 + vc6.0, please refer to this article: 9 (continued), compile and implement the sift algorithm)
1. From Rob Hess's personal homepage: Workshop;
2. Because the opencv version required by sift-latest_win.zip is 2.0 or later, also download the latest OpenCV-2.2.0-win32-vs2010.exe, run the installer to install opencv in a local path. For example, I installed it in the root directory of drive D.
3. Run vs2010 and create an empty console application named Bow.
4. Configure the opencv environment. Under vs2010, select the bow property sub-menu under the project menu to bring up the bow property pages dialog box. You need to configure three items: in VC ++ DIRECTORY options, you need to configure include directories and library directories. In the input options on the linker tab, You need to configure additional dependencies.
Now, all the development environments have been set up and configured.
2. Create a C ++ class csiftdiscriptor
For ease of use, we encapsulate the sift library in the C ++ class csiftdiscriptor, which can calculate and obtain the feature point vector set of the specified image. Class name in the siftdiscriptor. h file, the content is as follows:
# Ifndef _ sift_discriptor_h _ # DEFINE _ sift_discriptor_h _ # include <string> # include <pighgui. h> # include <cv. h> extern "C" {# include ".. /sift. H "# include ".. /sift/imgfeatures. H "# include ".. /sift/utils. H "}; Class csiftdiscriptor {public: int getinterestpointnumber () {return m_ninterestpointnumber;} struct feature * getfeaturearray () {return m_pfeaturearray;} public: void setimgname (const STD :: string & strimgnam E) {m_strinputimgname = strimgname;} int calculatesift (); Public: csiftdiscriptor (const STD: string & strimgname); partition () {m_ninterestpointnumber = 0; m_pfeaturearray = NULL ;}~ Csiftdiscriptor (); Private: STD: String m_strinputimgname; int m_ninterestpointnumber; feature * m_pfeaturearray;}; # endif
The member functions are implemented in the siftdiscriptor. cpp file. The calculatesift function extracts and computes feature points. The main internal process is as follows:
1) Call the opencv function cvloadimage to load the input image;
2) in order to unify the size of the input image, the second step of the calculatesift function is to adjust the size of the input image, which is achieved by calling the cvresize function;
3) if the input image is a color image, we need to first convert it into a grayscale image, which is achieved by calling the cvcvtcolor function;
4) Call the sift_feature function of the sift library to obtain the feature point vector set and number of feature points of the input image.
# Include "siftdiscriptor. H" int csiftdiscriptor: calculatesift () {iplimage * pinputimg = cvloadimage (m_strinputimgname.c_str (); If (! Pinputimg) {return-1;} int nimgwidth = 320; // standard image size for training double dbscalefactor = pinputimg-> width/300.0; // scale factor iplimage * ptmpimg = cvcreateimage (cvsize (pinputimg-> width/dbscalefactor, pinputimg-> height/dbscalefactor), pinputimg-> depth, pinputimg-> nchannels ); cvresize (pinputimg, ptmpimg); // scale cvreleaseimage (& pinputimg); If (ptmpimg-> nchannels! = 1) // non-grayscale image {iplimage * pgrayimg = cvcreateimage (cvsize (ptmpimg-> width, ptmpimg-> height), ptmpimg-> depth, 1); cvcvtcolor (ptmpimg, pgrayimg, duration); m_ninterestpointnumber = duration (pgrayimg, & m_pfeaturearray); cvreleaseimage (& pgrayimg);} else {m_ninterestpointnumber = duration (ptmpimg, & duration );} cvreleaseimage (& ptmpimg); Return m_ninterestpointnumber;} csiftdiscriptor: csiftdistor Or (const STD: string & strimgname) {m_strinputimgname = strimgname; m_ninterestpointnumber = 0; m_pfeaturearray = NULL; calculatesift ();} csiftdiscriptor ::~ Csiftdiscriptor () {If (m_pfeaturearray) {free (m_pfeaturearray );}}
3. Create a C ++ class cimgset to manage the experiment image set
The bag-of-words model requires that visual words be extracted from multiple target images. Images of different target classes are stored in different subfolders for convenient operations, we have designed a specialized class cimgset to manage image sets and declare the object in the file imgset. h:
# Ifndef _ img_set_h _ # DEFINE _ img_set_h _ # include <vector> # include <string> # pragma comment (Lib, "shlwapi. lib ") Class cimgset {public: cimgset (const STD: string & strimgdirname): m_strimgdirname (strimgdirname +" \ "), m_nimgnumber (0) {} int gettotalimagenumber () {return m_nimgnumber;} STD: String getimgname (INT nindex) {return m_szimgs.at (nindex);} int loadimgsfromdir () {return loadimgsfromdir ("");} PRIVATE: int loadimgsfromdir (const STD: string & strdirname); Private: typedef STD: vector <STD: String> img_set; img_set m_szimgs; int m_nimgnumber; const STD :: string m_strimgdirname;}; # endif // The member function is implemented in the imgset file. in CPP: # include "imgset. H "# include <windows. h> # include <shlwapi. h>/** strsubdirname: subfolders */INT cimgset: loadimgsfromdir (const STD: string & strsubdirname) {win32_find_dataa stfd = {0}; STD: String strdirname; if ("" = strsubdirname) {strdirname = m_strimgdirname;} else {strdirname = strsubdirname;} STD: String strfindname = strdirname + "\\*"; handle hfile = findfirstfilea (strfindname. c_str (), & stfd); bool bexist = findnextfilea (hfile, & stfd); For (; bexist;) {STD: String strtmpname = strdirname + stfd. cfilename; If (strdirname + ". "= strtmpname | strdirname + ".. "= strtmpname) {bexist = findnextfilea (hfile, & stfd); continue;} If (pathisdirectorya (strtmpname. c_str () {strtmpname + = "\"; loadimgsfromdir (strtmpname); bexist = findnextfilea (hfile, & stfd); continue;} STD :: string strsubimg = strdirname + stfd. cfilename; m_szimgs.push_back (strsubimg); bexist = findnextfilea (hfile, & stfd);} m_nimgnumber = m_szimgs.size (); Return m_nimgnumber ;}
Loadimgsfromdir recursively retrieves the names of all lab images from the image folder, including subfolders. This function cyclically calls the Windows API functions findfirstfile and findnextfile to find the names of all images in the folder.
4. Create chistogram and generate the histogram of the image.
// Imghistogram. h # ifndef _ img_histogram_h _ # DEFINE _ img_histogram_h _ # include <string> # include "siftdiscriptor. H "# include" imgset. H "const int cnclusternumber = 1500; const int cimax_d = numeric; Class chistogram {public: void inline (const STD: String strtrainingimgset) {inline = strtrainingimgset;} int formhistogram (); cvmat calculateimghistogram (const string strimgname, Int pszimghistogram []); cvmat * getobserveddata (); cvmat * getcodebook () {return m_pcodebook;} void setcodebook (cvmat * pcodebook) {m_pcodebook = pcodebook; m_bset = true ;} public: chistogram (): m_pszhistogram (0), m_nimgnumber (0), m_pobserveddata (0), m_pcodebook (0), m_bset (false ){}~ Chistogram () {If (m_pszhistogram) {Delete succeed; succeed = 0;} If (m_pobserveddata) {cvreleasemat (& m_pobserveddata); m_pobserveddata = 0;} If (m_pcodebook &&! M_bset) {cvreleasemat (& m_pcodebook); m_pcodebook = 0 ;}} PRIVATE: bool m_bset; cvmat * m_pcodebook; cvmat * m_pobserveddata; STD: String cursor; int (* m_pszhistogram) [cnclusternumber]; int m_nimgnumber ;};# endif # include "imghistogram. H "int chistogram: formhistogram () {int nret = 0; cimgset iimgset (m_strtrainingimgsetname); nret = iimgset. loadimgsfromdir (); const int cntrainingimgnumber = iimgset. g Ettotalimagenumber (); m_nimgnumber = cntrainingimgnumber; required * pdiscriptor = new feature [cntrainingimgnumber]; int nipnumber (0); For (INT I = 0; I <cntrainingimgnumber; + I) // calculate the sift descriptor {const string strimgname = iimgset for each training image. getimgname (I); pdiscriptor [I]. setimgname (strimgname); pdiscriptor [I]. calculatesift (); nipnumber + = pdiscriptor [I]. getinterestpointnumber ();} double (* pszdiscr Iptor) [feature_max_d] = new double [nipnumber] [feature_max_d]; // store the array of all descriptors. Each line represents an IP descriptor zeromemory (pszdiscriptor, sizeof (INT) * nipnumber * feature_max_d); int nindex = 0; For (INT I = 0; I <cntrainingimgnumber; ++ I) // traverse all images {struct feature * pfeaturearray = pdiscriptor [I]. getfeaturearray (); int nfeaturenumber = pdiscriptor [I]. getinterestpointnumber (); For (Int J = 0; j <nfeaturenumber; ++ J) // traverses all the IP addresses in an image (interesting point interest point {for (int K = 0; k <feature_max_d; k ++) // Initialize an IP address Descriptor {pszdiscriptor [nindex] [k] = pfeaturearray [J]. descr [k] ;}++ nindex ;}} cvmat * pszlabels = cvcreatemat (nipnumber, 1, cv_32sc1); // executes the kmeans Algorithm for all IP address descriptors, find the cnclusternumber cluster centers and store them in pszclustercenters if (! M_pcodebook) // construct the metadata table {cvmat szsamples, * Records = cvcreatemat (cnclusternumber, numbers, cv_32fc1); cvinitmatheader (& szsamples, nipnumber, numbers, cv_32fc1, numbers ); cvkmeans2 (& szsamples, cnclusternumber, pszlabels, average (cost + weight, 10, 1.0), 1, (cvrng *) 0, 0, pszclustercenters); // m_pcodebook = pszclustercenters ;} m_pszhistogram = new int [C Ntrainingimgnumber] [cnclusternumber]; // stores the histogram representation of each image. Each row corresponds to the zeromemory (m_pszhistogram, sizeof (INT) * cntrainingimgnumber * cnclusternumber) of the image ); // calculate the histogram nindex = 0 for each image; For (INT I = 0; I <cntrainingimgnumber; ++ I) {struct feature * pfeaturearray = pdiscriptor [I]. getfeaturearray (); int nfeaturenumber = pdiscriptor [I]. getinterestpointnumber (); // int nindex = 0; For (Int J = 0; j <nfeaturenumber; ++ J) {/CV Mat szfeature; // cvinitmatheader (& szfeature, 1, feature_max_d, cv_32fc1, pszdiscriptor [nindex ++]); // double dbminimum = temperature; // int ncodebookindex = 0; // For (int K = 0; k <m_pcodebook-> rows; ++ K) // locate the code element with the smallest distance, use the minimum code element to replace the original vocabulary // {// cvmat szcode = cvmat (1, m_pcodebook-> cols, m_pcodebook-> type); // cvgetrow (m_pcodebook, & szcode, k); // double dbdistance = cvnorm (& szfeature, & szcode, cv_l2); // If (Dbdistance <dbminimum) // {// dbminimum = dbdistance; // ncodebookindex = K; //} int ncodebookindex = pszlabels-> data. I [nindex ++]; // find the index value of the j ip in the I-th image in the codebook ncodebookindex ++ m_pszhistogram [I] [ncodebookindex]; // 0 <ncodebookindex <cnclusternumber ;}/// resource cleanup, function returns // Delete [] m_pszhistogram; // m_pszhistogram = 0; cvreleasemat (& pszlabels ); // cvreleasemat (& pszclustercenters); Delete [] pszdiscriptor; delet E [] pdiscriptor; return nret;} // double descr_dist_sq (struct feature * F1, struct feature * F2); cvmat chistogram: const string strimgname, int pszimghistogram []) {If ("" = strimgname |! M_pcodebook |! Pszimghistogram) {return cvmat ();} csiftdiscriptor iimgdisp; iimgdisp. setimgname (strimgname); iimgdisp. calculatesift (); struct feature * pimgfeature = iimgdisp. getfeaturearray (); int cnipnumber = iimgdisp. getinterestpointnumber (); // int * pszimghistogram = new int [cnclusternumber]; // zeromemory (pszimghistogram, sizeof (INT) * cnclusternumber); For (INT I = 0; I <cnipnumber; ++ I) {double * pszdistance = new double [cnclusternumber]; cvmat IIP = cvmat (feature_max_d, 1, cv_32fc1, pimgfeature [I]. descr); For (Int J = 0; j <cnclusternumber; ++ J) {cvmat icode = cvmat (1, feature_max_d, cv_32fc1); cvgetrow (m_pcodebook, & icode, j); cvmat * ptmpmat = cvcreatemat (feature_max_d, 1, cv_32fc1); cvtranspose (& icode, ptmpmat); double dbdistance = cvnorm (& IIP, ptmpmat ); // calculate the distance between the IP address I and the code j. pszdistance [J] = dbdistance; cvreleasemat (& ptmpmat);} double dbmindistance = pszdistance [0]; int ncodebookindex = 0; // the index value of the Code with the smallest IP address in the codebook for (Int J = 1; j <cnclusternumber; ++ J) {If (dbmindistance> pszdistance [J]) {dbmindistance = pszdistance [J]; ncodebookindex = J ;}++ pszimghistogram [ncodebookindex]; Delete [] pszdistance ;} cvmat sequence = cvmat (cnclusternumber, 1, sequence, sequence); return sequence;} cvmat * chistogram: getobserveddata () {cvmat ihistogram; cvinitmatheader (& ihistogram, m_nimgnumber, cnclusternumber, cv_32sc1, m_pszhistogram); cvmat * m_pobserveddata = cvcreatemat (ihistogram. cols, ihistogram. rows, cv_32sc1); cvtranspose (& ihistogram, m_pobserveddata); Return m_pobserveddata ;}
This article is complete.
All rights reserved. infringement is required. It is strictly prohibited to be used for any commercial purposes. Please indicate the source for reprinting.
Bag of words