Analysis of KD tree +BBF algorithm for "feature matching" sift principle

Source: Internet
Author: User
Tags diff goto

The following previous article has introduced sift principle and C source code analysis , and finally a series of feature points are obtained, each of which corresponds to a 128-dimensional vector. If there are now two images that have been extracted to feature points, now is the time to match the similar feature points.

There are two basic ways to query for similarity: 1. Scope query: That is, to point the query point and the query threshold, from the dataset to find all points with the query point distance is less than the threshold value.

2.K neighbor query: To point query points and positive integer k, from the data set to find the nearest query point K data, when K=1, is the nearest neighbor query.

Feature matching operators can be divided into two categories: 1. Exhaustive method: The point in the data set and the query point to calculate the distance, if the figure 1 extracted to N1 feature points, figure 2 extracted to N2 feature points, using the poor lifting method to do n1xn2 operation, this method is obviously inefficient.

2. Set up data index: Analyze the data, divide the search data space, and divide it into the KD tree and R tree according to the overlapping. KD trees are a kind of non-overlapping space division.


a three-dimensional K-D tree. The First Division (red) divides the root node (white) into two nodes, which are then divided (green) into two sub-nodes, respectively. Each of the last four child nodes is divided (blue) into two child nodes. Because there is no further division, the last eight nodes are called leaf nodes.


kd Tree Build: KD tree is a binary tree, Partitioning the data space space, Each node corresponds to a spatial range. As shown, the three-dimensional space is divided in a way. First determine the dimension of the largest corresponding variance in the data Set Ki, and found in the Ki dimension of the data set of the median kv (and as the root node), that is, the first step to divide the space into two parts, in the Ki dimension is less than KV is part of the left Dial hand node, greater than KV for the other part of the corresponding right child node Then, using the same method, we continue to build the binary tree on the left Dial hand node and right child nodes, and the remaining data sets are empty.

For Example: There are 5 data, each of which is 5 dimensions, and a kd tree is established , a<7,5,7,3,8>; b<3,4,1,2,7>; c<5,2,6,6,9>;D <9,3,2,4,1>,E<2,1,5,1,4>, first calculate the variance on the 5 dimensions as 6.56;2 ; 5.36;2.96;8.56; visible in the 5th dimension, the difference is the largest, continue to find the 5th dimension of the median value is 7, that is, the B point, the 5th dimension value less than 7 as the left subtree data (a,c), greater than 7 as the right subtree (d,e), and then continue in the A,c, The dimension that calculates the greatest variance at two points continues to be divided. The same is true of d,e. For example, Ki represents a dimension, and KV represents a value on that dimension.


KD Tree query: Starting from the root node along the binary tree search, until the leaf node, at this time the leaf nodes are not necessarily the nearest point, but must be near the point of Leaf Junction. So there must be a backtracking process that goes back to the root node. For example: Query and m<5,4,1,3,6> point nearest neighbor Point, query path is b,a,c, after calculating the distance of MC, reverse up, query A and a right subtree, again back to B and B left subtree, finally get the nearest distance, MB point recently.

If the dataset is a dimension d, it is generally required that the size of the data n needs to meet the n>>2^d condition, in order to achieve efficient search, in general, the standard KD tree when the number of dimensions of the dataset is not more than 20, but like Sift feature description sub 128, surf descriptors 64 dimensions, Therefore, the existing KD tree should be improved.

BBF: The process of backtracking is determined exactly by the path of the query, regardless of the nature of the data on the query path, the BBF (best-bin-first) query mechanism ensures that the space with the nearest neighbor is prioritized , that is, BBF maintains a priority queue, each time the query to the left subtree or the right subtree process, while the query point in the dimension of the distance between the value of the difference in the priority queue, while the other child node address is also in the queue, the process of backtracking from the priority queue (difference) from small to large order of back. As the previous example, first put B in the priority queue, and then start to fetch data from the priority queue, take out B, found to go to the left child a node to continue the query, at this time, the right child node D is saved in the priority queue, coupled with the distance attribute ki=5,kv=7, so d=7-6= 1, précis-writers in the priority queue is D (1); Similarly, if a has a right child, also to be deposited in the priority queue, plus the attribute ki=2,kv=5,d=5-4=1; (example is not very appropriate, O (╯-╰) o), the process of backtracking is to follow the priority queue distance one after the other, until the priority queue is empty, or time-out, BBF set a time-out mechanism, in order to meet the retrieval speed in high-dimensional data, the need for accuracy in exchange for time, get fast query. In this way, the nearest neighbor that the BBF mechanism finds is approximate, not recent, but closest to the nearest point. The time-out mechanism, in the implementation of the algorithm, limits the number of times to extract data from the priority queue.



The following is parsed from the algorithm:

To build a kd tree:

struct kd_node* kdtree_build (struct feature* features, int n)//features as a feature with you, n is the number {  struct kd_node* kd_root;  if (! features  | |  N <= 0)    {      fprintf (stderr, "Warning:kdtree_build (): No features,%s, line%d\n",       __file__, __line__); 
   return NULL;    }  Kd_root = Kd_node_init (features, n);   Establish the root node, each time a node is established to deposit a feature point  expand_kd_node_subtree (kd_root);//start with the root node to expand the KD tree  return kd_root;}
static struct kd_node* kd_node_init (struct feature* features, int n) {  struct kd_node* kd_node;  Kd_node = malloc (sizeof (struct kd_node));  memset (kd_node, 0, sizeof (struct kd_node));  Kd_node->ki =-1;               The attribute ki is initialized to 1  kd_node->features = features;//points to the feature point  kd_node->n = n;     Node attribute n Saves the total number of nodes on the tree rooted in Kd_node  return kd_node;}
Extended KD Tree: Based on the dimension of the maximum variance of the current node as the corresponding median, the data is divided into the nodes of the left and right sub-tree, and recursively continues until the creation of the leaf node is returned.

static void Expand_kd_node_subtree (struct kd_node* kd_node)  //recursive establishment of KD tree {/  * Base case:leaf node */  if (kd_no De->n = = 1  | |  Kd_node->n = = 0)//If a node is left, it becomes a leaf node    {      kd_node->leaf = 1;      return;    }  Assign_part_key (Kd_node);   Calculates the corresponding dimension of the maximum Variance, ki and kv  partition_features (Kd_node);//The data size of the Ki dimension into the left subtree data and the right subtree  if (kd_node->kd_left)   //Continue to build left subtree    expand_kd_node_subtree (kd_node->kd_left);  if (kd_node->kd_right)//Continue building right subtree    expand_kd_node_subtree (kd_node->kd_right);}
calculates the maximum variance corresponding to the dimension of Ki, with the median kv, taking the median, using the worst case is also a linear time selection algorithm, my blog has written before, here no longer analysis < median sorting > Click to open link
The static void Assign_part_key (struct kd_node* kd_node)//calculates the maximum variance of the node data corresponding to the dimensions of Ki, and the median kv{struct feature* features;  Double kv, x, mean, var, var_max = 0;  double* tmp;  int d, N, I, j, ki = 0;  features = kd_node->features;  n = kd_node->n;  D = FEATURES[0].D; /* Partition key index is this along which descriptors has most variance */for (j = 0; J < D; j + +)//calculate D-dimensional data on    , the variance on all dimensions.      {mean = var = 0;      for (i = 0; i < n; i++) mean + = Features[i].descr[j];      Mean/= N;  for (i = 0; i < n; i++) {x = features[i].descr[j]-mean;      var + = x * x;}                          var/= n;  Calculates the variance if (var > var_max) of the data of the J dimension {ki = j;    Var_max = var;}     }/* Partition key value is median of descriptor values at Ki */tmp = CALLOC (n, sizeof (double));  for (i = 0; i < n; i++)//Get Data on the Ki dimension on all data tmp[i] = Features[i].descr[ki];   KV = Median_select (tmp, N);  Find the middle value on the First Ki dimension, where the worst-case run time O (n) Selection algorithm free (TMP) is used;      Kd_node->ki = Ki; Dimension kd_node->kv = kv; Middle Value}
According to the Ki dimension kv value, the feature points are sorted, less than or equal to KV as the left subtree data, greater than KV as the right subtree data

static void Partition_features (struct kd_node* kd_node) {  struct feature* features, TMP;  Double kv;  int n, ki, p, I, j =-1;  features = kd_node->features;  n = kd_node->n;  Ki = kd_node->ki;  KV = kd_node->kv; for (i = 0; i < n; i++)//Sort the feature points by the size of the Ki dimension if (Features[i].descr[ki] <= kv) {tmp = Features[++j];featu      RES[J] = features[i];features[i] = tmp;if (Features[j].descr[ki] = = kv) p = j;  } TMP = Features[p];  FEATURES[P] = Features[j];  FEATURES[J] = tmp; /* If all records fall in same side of partition, make Node A leaf * * if (j = = n-1)//said      Only one node is left, marked as leaf node {kd_node->leaf = 1;    Return } Kd_node->kd_left = Kd_node_init (features, j + 1);//Create a left subtree with j+1 nodes kd_node->kd_right = Kd_node_init (features + (j + 1), (N-j-1));//Create a right subtree with n-j-1 nodes in it 

☆ KD tree has been created, now to do is query, query and feature point nearest neighbor K feature points, first put the root node into the priority queue, and then start to take elements from the priority queue, traverse to the leaf node, while the path process, The join priority queue for another node that is not queried (by the absolute size of the difference between the numeric value on Ki and kv), and then again from the priority queue to the leaf node again, so repeatedly ... Until the timeout limit is encountered, or until all nodes have been traversed.

/*kd_root for the creation of a good kd tree, feat to query the feature point K is to find the number of neighbors, sift selected 2nbrs storage query to the K nearest neighbor data Max_nn_chkes to the maximum number of fetches, that is, the time-out limit to successfully return the number of neighbor data found, Otherwise return -1*/int kdtree_bbf_knn (struct kd_node* kd_root, struct feature* feat, int k, struct feature*** nbrs, int max_nn_ch  KS) {struct kd_node* expl;  struct min_pq* min_pq;  struct feature* tree_feat, * * _NBRS;  struct bbf_data* bbf_data;  int i, t = 0, n = 0;  if (! NBRs | | !  Feat | | !      Kd_root) {fprintf (stderr, "Warning:null pointer error,%s, line%d\n", __file__, __line__);    return-1;  } _nbrs = Calloc (k, sizeof (struct feature*));                     MIN_PQ = Minpq_init ();  Create a minimum priority queue Minpq_insert (MIN_PQ, kd_root, 0); In the priority queue, insert the first root element while (Min_pq->n > 0 && T < max_nn_chks)//If the queue is not empty and within the timeout number {expl = (stru CT kd_node*) minpq_extract_min (MIN_PQ);//Remove an element in the priority queue if (! Expl) {fprintf (stderr, "WARNING:PQ unexpectedly empt  Y,%s line%d\n ", __file__, __line__);      goto fail;} EXPL = explore_to_lEAF (Expl, feat, MIN_PQ);//Find the feature point in the KD Tree leaf node position, the process is not queried to join the priority queue if (! Expl) {fprintf (stderr, "WARNING:PQ unexpectedly  Empty,%s line%d\n ", __file__, __line__);      goto fail;}  for (i = 0; i < expl->n; i++)//Traverse all nodes of subtree with expl as root {//printf ("%x", expl->features[i].feature_data);  Tree_feat = &expl->features[i];  Bbf_data = malloc (sizeof (struct bbf_data)); if (! Bbf_data) {fprintf (stderr, "warning:unable to allocate Memory," "%s line%d\n", __file__, __line_      _ );    Goto fail; }//bbf_data->old_data This data is not useful because the feature point attribute is not used in Feature_data this custom type Bbf_data->old_data = Tree_feat->feature_         Data                       printf ("%x", bbf_data->old_data);    Bbf_data->d = DESCR_DIST_SQ (feat, tree_feat);  The Euclidean distance tree_feat->feature_data = bbf_data for calculating two characteristic points; n + = Insert_into_nbr_array (tree_feat, _nbrs, N, K);    Find the feature points of K nearest neighbor, deposit in the array _nbrs, from small to large distance;} t++;  } minpq_release (&AMP;MIN_PQ);   for (i = 0; i < n; i++) {Bbf_data = _nbrs[i]->feature_data;      _nbrs[i]->feature_data = bbf_data->old_data;    Free (bbf_data);  } *nbrs = _nbrs; return n;  Fail:minpq_release (&AMP;MIN_PQ);      for (i = 0; i < n; i++) {bbf_data = _nbrs[i]->feature_data;      _nbrs[i]->feature_data = bbf_data->old_data;    Free (bbf_data);  } free (_NBRS);  *nbrs = NULL; return-1;}
here the priority queue is created using the idea of heap sequencing, a major application of heap sequencing is the priority queue, in a heap containing n elements, all priority queue operations can be completed within LGN time. The priority queue also has two forms, the maximum priority queue and the minimum priority queue, where the minimum priority queue is used, that is, the smaller the key value the higher the priority, about the principle of heap ordering, can look at the previous heap sorting algorithm < heap sorting > Click the Open link.
struct min_pq* minpq_init ()  //queue initialization {  struct min_pq* min_pq;  MIN_PQ = malloc (sizeof (struct MIN_PQ));  Min_pq->pq_array = calloc (minpq_init_nallocd, sizeof (struct pq_node));//allocation queue space  MIN_PQ->NALLOCD = Minpq_ INIT_NALLOCD;  Min_pq->n = number of elements in 0;//queue  return MIN_PQ;}
int Minpq_insert (struct min_pq* min_pq, void* data, int key)//Insert element in priority queue {  int n = min_pq->n;  /* Double array allocation if necessary *  /if (Min_pq->nallocd = = N)    {      Min_pq->nallocd = array_double ( (void**) &min_pq->pq_array,      Min_pq->nallocd,      sizeof (struct pq_node));      if (! Min_pq->nallocd) {  fprintf (stderr, "warning:unable to allocate memory,%s, line%d\n",   __file__, __line __ );  return 1;}    }  Min_pq->pq_array[n].data = data;  Min_pq->pq_array[n].key = Int_max;  Decrease_pq_node_key (Min_pq->pq_array, Min_pq->n, key); Inserting elements into the priority queue, heap sorting algorithm  min_pq->n++;  return 0;}

Takes a node from the queue and traverses it along the node to the leaf nodes without querying the join priority queue.

The static struct kd_node* explore_to_leaf (struct kd_node* kd_node,    //Starts the query from Kd_node until the leaf node is a struct feature* feat, struct min_pq* min_pq) {  struct kd_node* unexpl, * expl = Kd_node;  Double kv;  int ki;  while (expl  &&  ! expl->leaf)    {      ki = expl->ki;      KV = expl->kv;            if (ki >= feat->d) {  fprintf (stderr, "warning:comparing imcompatible descriptors,%s"    "line%d\n", __fi le__, __line__);  return NULL;}      if (Feat->descr[ki] <= kv) {  unexpl = expl->kd_right;  EXPL = Expl->kd_left;}      else{  unexpl = expl->kd_left;  EXPL = expl->kd_right;}            if (Minpq_insert (MIN_PQ, UNEXPL, ABS (Kv-feat->descr[ki)))   //non-queried node, by difference size Join priority Queue {  fprintf (stderr, "Wa Rning:unable to insert in PQ,%s, line%d\n ",   __file__, __line__);  return NULL;}    }  return EXPL;}
calculates the Euclidean distance between two feature points:

Double descr_dist_sq (struct feature* f1, struct feature* f2) {  double diff, dsq = 0;  double* DESCR1, * DESCR2;  int I, D;  D = f1->d;  if (f2->d! = d)    return Dbl_max;  DESCR1 = f1->descr;  DESCR2 = f2->descr;  for (i = 0; i < D; i++)    {      diff = descr1[i]-descr2[i];      DSQ + = Diff*diff;    }  return DSQ;}
the found feature point distance is inserted into the NBRs queue as output. The NBRs also stores distances in order from small to large,

If the new distance to insert is D, the queue at this time the last element D

1. If the queue is not full (that is, to obtain K neighbors have not found K) (1) D>=d, inserted directly behind the queue, (2) D<d, then find the position to be inserted, and then a larger than the elements of D move backward one bit;

2. If the queue is full: (1) D>=d, discard the current distance directly, and (2) d<d, locate the position to be inserted, then move backward one bit larger than the D element; The last element is discarded.

static int Insert_into_nbr_array (struct feature* feat, struct feature** NBRs, int n, int k) {struct bbf_data* fdata, *  Ndata;  Double DN, DF;  int i, ret = 0;      if (n = = 0) {Nbrs[0] = feat;    return 1;                          }/* Check at end of array */Fdata = (struct bbf_data*) feat->feature_data;  Determine where to insert df = fdata->d;  Ndata = (struct bbf_data*) nbrs[n-1]->feature_data;  DN = ndata->d; if (DF >= DN)//ready to insert into last {if (n = = k)//But K neighbor queue is full, discard {feat->feature_data = FD ata->old_data;  Before discarding, retain the previously customized data free (fdata);      return 0;}      Nbrs[n] = feat;    return 1; }/* Find the right position in the array to be inserted into the middle of the queue, divided into the case of the queue full or dissatisfied */if (n < k)//k Neighbor queue is not full, the element is shifted backward {n      Brs[n] = nbrs[n-1];    ret = 1;    } else {nbrs[n-1]->feature_data = ndata->old_data;//queue is full, last one to discard, restore previous data free (ndata);  } i = N-2; while (i >= 0)//elements are translated backwards to find the appropriate position in the queue; {     Ndata = (struct bbf_data*) nbrs[i]->feature_data;      DN = ndata->d;      if (DN <= DF) break;      NBRS[I+1] = nbrs[i];    i--;  } i++;   Nbrs[i] = feat; Insert element return ret;}


At this point, about the SIFT principle and feature points matching algorithm has been introduced, follow-up articles will be updated Surf,brife,fast,orb a series of features matching articles, once again thank the CSDN on the Daniel!

Reprint Please specify source: http://blog.csdn.net/luoshixian099/article/details/47606159


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Analysis of KD tree +BBF algorithm for "feature matching" sift principle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.