Analysis of KD tree +BBF algorithm for "feature matching" sift principle

Source: Internet
Author: User
Tags diff

Reprint Please specify Source: http://blog.csdn.net/luoshixian099/article/details/47606159

The following previous article has introduced sift principle and C source code analysis , a series of feature points are obtained, and each feature point corresponds to a 128-dimensional vector. If there are now two images that have been extracted to feature points, the only thing to do now is to match similar feature points.

There are two basic ways of similarity query: 1. Scope query: That is, to point to the query point and the query threshold, from the dataset to find out all the points with the query point distance is less than the threshold value.

2.K neighbor query: To point query points and positive integer k, from the data set to find and query points of the recent K data, when K=1, is the nearest neighbor query.

Feature matching operators can be divided into two categories: 1. Exhaustive method: The point in the data set and the query point to calculate the distance, assuming that figure 1 extracted to N1 feature points, figure 2 extracted to N2 feature points, using the poor lifting method to do n1xn2 operation. Such a method is obviously inefficient.

2. Create a Data index: analyze the data. Divides the search data space. If there is overlap, divided into kd tree and R-Tree. KD trees are a kind of non-overlapping space division.


a three-dimensional K-D tree. The First Division (red) divides the root node (white) into two nodes. Then they are divided again (green) into two sub-nodes respectively. Each of the last four child nodes is divided (blue) into two child nodes. Because there is no further division. The last eight nodes are called leaf nodes.



the construction of KD tree: the KD tree is a two-pronged tree, divides the data space space, each node corresponds to a spatial scope.

As you can see. The dividing way of three-dimensional space. First, determine the Dimension ki with the largest corresponding variance on the data set, and find the median kv (and as the root node) of the data set on the Ki dimension, that is, the first step divides the space into two parts, and the part of the Ki dimension less than kv is called the left Dial hand node. Greater than KV for the other part of the corresponding right child node, and then the same method, the left Dial hand node and the right child nodes continue to build the binary tree, the remaining data set is empty.

For example: There are 5 data, each data is 5 dimensions. Set up kd tree,a<7,5,7,3,8>; b<3,4,1,2,7>; c<5,2,6,6,9>;D <9,3,2,4,1>,E<2,1,5,1,4>. First, the variance on the 5 dimensions is calculated as 6.56;2;5.36;2.96;8.56, and the maximum difference is visible above the 5th dimension. On the 5th Dimension, continue to find the median value of 7, that is, B, the 5th dimension is less than 7 as the left subtree data (a,c), greater than 7 as the right subtree (d,e), and then continue on the A,c, two points on the calculation of the maximum variance dimension, continue to divide.

The same is true of d,e. For example with. Ki represents the dimension, and KV represents the value on that dimension.


kd Tree query: Start from the root node and search along the binary tree until the leaf node is reached. At this point the leaf node is not necessarily a recent point. But it must be a point near the leaf junction. So there must be a backtracking process that goes back to the root node.

For example: Query and m<5,4. The 1,3,6> Point's near neighbor, and the query path is b. A. C, after calculating the distance of MC, reverse up, query A and a right sub-tree, again back to B and B left subtree, finally get the nearest distance, MB point recently.

If the data set is a dimension d, it is generally required that the size of the data n need to meet the n>>2^d conditions, the ability to achieve efficient search. In general, the standard KD tree when the number of dimensions of the data set is not more than 20, but like Sift feature description of the narrative sub 128, surf descriptive narrative is 64-dimensional, so the existing KD tree to improve.

BBF: The process of backtracking is determined entirely by the path of the query, regardless of the nature of the data on the query path. The BBF (best-bin-first) query mechanism ensures that the space of the nearest neighbor is prioritized , that is, BBF maintains a priority queue, each time a query is made to the left subtree or right subtree, and at the same time the distance difference between the median value of the query point in the dimension is saved in the priority queue. At the same time, there was a child. The node address is also in the queue. The backtracking process is traced from the priority queue to the order of the (difference) from small to large.

As in the example above, first save B in the priority queue. Then starting from the priority queue to fetch data, take out B, found to go to the left child a node to continue the query, at this time, to the right child node D is saved in the priority queue, together with the distance attribute ki=5. Kv=7. So d=7-6=1, the précis-writers in the priority queue is D (1); Similarly, suppose a has a right child. Also to be deposited in the priority queue, plus the attribute ki=2,kv=5,d=5-4=1, (the sample is not very appropriate, O (╯-╰) o), the process of backtracking is to follow the priority queue distance from one after the other, until the priority queue is empty, or timeout, stop. BBF set the timeout mechanism. In order to meet the retrieval speed in high-dimensional data, it is necessary to exchange precision for time. Get high-speed queries. This is true. The nearest neighbor that the BBF mechanism finds is approximate, not recent, but just as close to the recent point. The time-out mechanism, in the implementation of the algorithm, limits the number of times to extract data from the priority queue.



The following is parsed from the algorithm:

To build a kd tree:

struct kd_node* kdtree_build (struct feature* features, int n)//features features you. N is the number {  struct kd_node* kd_root;  if (! features  | |  N <= 0)    {      fprintf (stderr, "Warning:kdtree_build (): No features,%s, line%d\n",       __file__, __line__); 
   return NULL;    }  Kd_root = Kd_node_init (features, n);   establishes the root node. Each time a node is established to deposit a feature point  expand_kd_node_subtree (kd_root);//Expand the KD tree return kd_root with the root node  ;}
static struct kd_node* kd_node_init (struct feature* features, int n) {  struct kd_node* kd_node;  Kd_node = malloc (sizeof (struct kd_node));  memset (kd_node, 0, sizeof (struct kd_node));  Kd_node->ki =-1;               The attribute ki is initialized to 1  kd_node->features = features;//points to the feature point  kd_node->n = n;     Node attribute n Saves the total number of nodes on the tree rooted in Kd_node  return kd_node;}
Extended kd Tree: Divides all the data into the node data of the left and right sub-tree with the most generous dimension of the current node being the corresponding median value. and recursively, until the creation of the leaf node is returned.

static void Expand_kd_node_subtree (struct kd_node* kd_node)  //recursive establishment of KD tree {/  * Base case:leaf node */  if (kd_no De->n = = 1  | |  Kd_node->n = = 0)//assuming that there is one node left, it becomes the leaf node    {      kd_node->leaf = 1;      return;    }  Assign_part_key (Kd_node);   Calculate the corresponding dimensionality of the most generous difference, Ki and kv  partition_features (Kd_node);//The data size of the Ki dimension into the left subtree data and the right subtree  if (kd_node->kd_left)   //Continue to build left subtree    expand_kd_node_subtree (kd_node->kd_left);  if (kd_node->kd_right)//Continue building right subtree    expand_kd_node_subtree (kd_node->kd_right);}
calculate the most generous difference corresponding to the Dimension ki, with the median kv, taking the median, using the worst case is also a linear time selection algorithm, my blog has written before, here no longer analysis < median sort > Click to open link
The static void Assign_part_key (struct kd_node* kd_node)//calculates the most generous difference of the node data corresponding to the Dimension ki, and the median kv{struct feature* features;  Double kv, x, mean, var, var_max = 0;  double* tmp;  int d, N, I, j, ki = 0;  features = kd_node->features;  n = kd_node->n;  D = FEATURES[0].D; /* Partition key index is this along which descriptors has most variance */for (j = 0; J < D; j + +)//calculate D-dimensional data on    , the variance on all dimensions.      {mean = var = 0;      for (i = 0; i < n; i++) mean + = Features[i].descr[j];      Mean/= N;  for (i = 0; i < n; i++) {x = features[i].descr[j]-mean;      var + = x * x;}                          var/= n;  Calculates the variance if (var > var_max) of the data of the J dimension {ki = j;    Var_max = var;}     }/* Partition key value is median of descriptor values at Ki */tmp = CALLOC (n, sizeof (double));  for (i = 0; i < n; i++)//obtain data on the first Ki dimension of all data tmp[i] = Features[i].descr[ki];   KV = Median_select (tmp, N); Find the middle value on the First Ki dimension.  Here is the choice algorithm for the worst case Execution time O (n) free (TMP);      Kd_node->ki = Ki; Dimension kd_node->kv = kv; Middle Value}
Press the Ki dimension on the KV value. Sort the feature points less than or equal to KV as the left subtree data, greater than KV as the right subtree data

static void partition_features (struct kd_node* kd_node) {  struct feature* features, TMP;  Double kv;  int n, ki, p, I, j =-1;  features = kd_node->features;  n = kd_node->n;  Ki = kd_node->ki;  KV = kd_node->kv; for (i = 0; i < n; i++)//Sort the feature points by the size of the Ki dimension if (Features[i].descr[ki] <= kv) {tmp = Features[++j];featu      RES[J] = features[i];features[i] = tmp;if (Features[j].descr[ki] = = kv) p = j;  } TMP = Features[p];  FEATURES[P] = Features[j];  FEATURES[J] = tmp; /* If all records fall in same side of partition, make Node A leaf * * if (j = = n-1)//said      Only one node left, marked as leaf node {kd_node->leaf = 1;    Return } Kd_node->kd_left = Kd_node_init (features, j + 1);//Create a left subtree with j+1 nodes kd_node->kd_right = Kd_node_init (features + (j + 1), (N-j-1));//Create a right subtree with n-j-1 nodes in it 

☆ KD Tree has been created to complete. What you need to do now is query. The K feature points in the near vicinity of the feature point, the root node is inserted into the priority queue, then the element is taken from the priority queue and traversed to the leaf node at the same time as the path process. There is also a node that is not queried to increase the priority queue (by the value of Ki dimension and the absolute value of the difference between KV), and then again from the priority queue to take the node, again traverse to the leaf node, so repeat ... Until a timeout limit is encountered, or the total node is traversed.

/*kd_root to create a good kd tree. Feat to query the feature point K is to find the number of neighbors, sift in the selection of 2nbrs storage query to the K nearest neighbor data Max_nn_chkes to the maximum number of extraction queue, that is, the time-out limit to successfully return the number of neighbor data found, otherwise return -1*/int KDTREE_BBF  _KNN (struct kd_node* kd_root, struct feature* feat, int k, struct feature*** nbrs, int max_nn_chks) {struct kd_node*  EXPL;  struct min_pq* min_pq;  struct feature* tree_feat, * * _NBRS;  struct bbf_data* bbf_data;  int i, t = 0, n = 0;  if (! NBRs | | !  Feat | | !      Kd_root) {fprintf (stderr, "Warning:null pointer error,%s, line%d\n", __file__, __line__);    return-1;  } _nbrs = Calloc (k, sizeof (struct feature*));                     MIN_PQ = Minpq_init ();  Create a minimum priority queue Minpq_insert (MIN_PQ, kd_root, 0); In the priority queue insert the first root element while (Min_pq->n > 0 && T < MAX_NN_CHKS)//Assuming the queue is not empty and within the timeout number {expl = (stru CT kd_node*) minpq_extract_min (MIN_PQ);//Remove an element in the priority queue if (! Expl) {fprintf (stderr, "WARNING:PQ unexpectedly empt  Y,%s line%d\n ", __file__, __line__);      goto fail;} EXPL = explore_to_lEAF (Expl, feat, MIN_PQ);//Find the feature point in the KD Tree leaf node position, the process is not queried to increase the priority queue if (! Expl) {fprintf (stderr, "WARNING:PQ unexpectedly  Empty,%s line%d\n ", __file__, __line__);      goto fail;}  for (i = 0; i < expl->n; i++)//traverse the subtree with expl as root all nodes {//printf ("%x", expl->features[i].feature_data);  Tree_feat = &expl->features[i];  Bbf_data = malloc (sizeof (struct bbf_data)); if (! Bbf_data) {fprintf (stderr, "warning:unable to allocate Memory," "%s line%d\n", __file__, __line_      _ );    Goto fail; }//bbf_data->old_data This data is not intended for real use.         Since the feature point attribute is not used to feature_data this own definition type Bbf_data->old_data = tree_feat->feature_data;                       printf ("%x", bbf_data->old_data);    Bbf_data->d = DESCR_DIST_SQ (feat, tree_feat);  The Euclidean distance tree_feat->feature_data = bbf_data for calculating two characteristic points; n + = Insert_into_nbr_array (tree_feat, _nbrs, N, K); Find the feature points of K nearest neighbor, and deposit in the array _nbrs.    The distance from small to large;} t++;  } minpq_release (&AMP;MIN_PQ);  for (i = 0; i < n; i++)  {Bbf_data = _nbrs[i]->feature_data;      _nbrs[i]->feature_data = bbf_data->old_data;    Free (bbf_data);  } *nbrs = _nbrs; return n;  Fail:minpq_release (&AMP;MIN_PQ);      for (i = 0; i < n; i++) {bbf_data = _nbrs[i]->feature_data;      _nbrs[i]->feature_data = bbf_data->old_data;    Free (bbf_data);  } free (_NBRS);  *nbrs = NULL; return-1;}
here the priority queue is created using the idea of heap sequencing, an important application of heap sequencing is the priority queue, in a heap of n elements, all priority queue operations can be completed within LGN time. There are two forms of priority queues. The maximum priority queue and the minimum priority queue, where the minimum priority queue is used. That is, the smaller the key value, the higher the priority. About the principle of heap sorting. Ability to see previous heap sorting algorithms < heap sequencing > Click the Open link.


struct min_pq* minpq_init ()  //queue initialization {  struct min_pq* min_pq;  MIN_PQ = malloc (sizeof (struct MIN_PQ));  Min_pq->pq_array = calloc (minpq_init_nallocd, sizeof (struct pq_node));//allocation queue space  MIN_PQ->NALLOCD = Minpq_ INIT_NALLOCD;  Min_pq->n = number of elements in 0;//queue  return MIN_PQ;}
int Minpq_insert (struct min_pq* min_pq, void* data, int key)//Insert element in priority queue {  int n = min_pq->n;  /* Double array allocation if necessary *  /if (Min_pq->nallocd = = N)    {      Min_pq->nallocd = array_double ( (void**) &min_pq->pq_array,      Min_pq->nallocd,      sizeof (struct pq_node));      if (! Min_pq->nallocd) {  fprintf (stderr, "warning:unable to allocate memory,%s, line%d\n",   __file__, __line __ );  return 1;}    }  Min_pq->pq_array[n].data = data;  Min_pq->pq_array[n].key = Int_max;  Decrease_pq_node_key (Min_pq->pq_array, Min_pq->n, key); Inserting elements into the priority queue, heap sorting algorithm  min_pq->n++;  return 0;}

Removes a node from the queue, traversing the nodes to the leaf node, and increasing the priority queue without querying at the same time.

The static struct kd_node* explore_to_leaf (struct kd_node* kd_node,    //starts from Kd_node and queries the struct feature* feat until the leaf node, struct min_pq* min_pq) {  struct kd_node* unexpl, * expl = Kd_node;  Double kv;  int ki;  while (expl  &&  ! expl->leaf)    {      ki = expl->ki;      KV = expl->kv;            if (ki >= feat->d) {  fprintf (stderr, "warning:comparing imcompatible descriptors,%s"    "line%d\n", __fi le__, __line__);  return NULL;}      if (Feat->descr[ki] <= kv) {  unexpl = expl->kd_right;  EXPL = Expl->kd_left;}      else{  unexpl = expl->kd_left;  EXPL = expl->kd_right;}            if (Minpq_insert (MIN_PQ, UNEXPL, ABS (Kv-feat->descr[ki)))   //non-queried node, increase priority queue by difference size {  fprintf (stderr, "Wa Rning:unable to insert in PQ,%s, line%d\n ",   __file__, __line__);  return NULL;}    }  return EXPL;}
calculates the Euclidean distance between two feature points:

Double descr_dist_sq (struct feature* f1, struct feature* f2) {  double diff, dsq = 0;  double* DESCR1, * DESCR2;  int I, D;  D = f1->d;  if (f2->d! = d)    return Dbl_max;  DESCR1 = f1->descr;  DESCR2 = f2->descr;  for (i = 0; i < D; i++)    {      diff = descr1[i]-descr2[i];      DSQ + = Diff*diff;    }  return DSQ;}
The found feature point distance is inserted into the NBRs queue as output.

NBRs is also in accordance with the order from small to large storage distance,

If the new distance to insert is D, the queue at this time the last element D

1. Assuming the queue is not full (that is, to obtain K neighbors have not found K) (1) D>=d, inserted directly behind the queue, (2) D<d, then find the location to insert. Then the elements that are larger than D are moved backwards.

2. Assuming that the queue is full: (1) D>=d, drop the current distance directly, and (2) d<d, locate the location to insert. Then the elements that are larger than D are moved one at a later level, and the last element is discarded.

static int Insert_into_nbr_array (struct feature* feat, struct feature** NBRs, int n, int k) {struct bbf_data* fdata, *  Ndata;  Double DN, DF;  int i, ret = 0;      if (n = = 0) {Nbrs[0] = feat;    return 1;                          }/* Check at end of array */Fdata = (struct bbf_data*) feat->feature_data;  Infer the position to be inserted df = fdata->d;  Ndata = (struct bbf_data*) nbrs[n-1]->feature_data;  DN = ndata->d; if (DF >= DN)//ready to insert into last {if (n = = k)//But K neighbor queue is full, discard {feat->feature_data = FD ata->old_data; Discard before you drop.  Retain the previously defined data free (fdata);      return 0;}      Nbrs[n] = feat;    return 1; }/* Find the right position in the array to be inserted into the middle of the queue, divided into the case of the queue full or dissatisfied */if (n < k)//k Neighbor queue is not full, the element is shifted backward {n      Brs[n] = nbrs[n-1];    ret = 1; } else {nbrs[n-1]->feature_data = ndata->old_data;//queue is full.    The last one to discard, restore the previous data free (ndata);  } i = N-2; while (i >= 0)//elements are translated backwards to find the appropriate position in the queue; {      Ndata = (struct bbf_data*) nbrs[i]->feature_data;      DN = ndata->d;      if (DN <= DF) break;      NBRS[I+1] = nbrs[i];    i--;  } i++;   Nbrs[i] = feat; Insert element return ret;}


At this point, the SIFT principle and feature points matching algorithm has been introduced, perhaps the article will be updated in succession Surf,brife,fast,orb a series of features matching articles. Thanks again for the csdn!



Analysis of KD tree +BBF algorithm for "feature matching" sift principle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.