Kmeans based indexing and asypolicric distance Computation for Ann search (binary local feature): par

Source: Internet
Author: User
Tags scale image

From: http://www.cvchina.info/2012/01/13/kmeans-based-indexing-and-asymmetric-distance-computation-for-ann-search-binary-local-feature-part1/#more-3232

By Herve jegouHamming embedding and weak geometric consistency for large-scale image searchAndProduct
Quantization for Nearest Neighbor Search
OfInspired to apply kmeans clustering, inverted files, and asypolicric distance computation to the nearest neighbor retrieval of local features in binary form.

Main ideas:

Use kmeans to perform rough feature indexing.

Compress the feature according to the statistical data.

The distance between the index feature and the query feature is calculated in Asymmetric mode during retrieval.

Algorithm:

Training:

  1. Kmeans is used to cluster the features to be indexed to obtain K centers. When clustering is performed on Binary feature, the classification center update method is as follows: for each bit, calculate the frequency of all corresponding bits falling on the features of the class, and take the height.
  2. For each cluster, calculate the 1-0 frequency of each bit that falls into the features of this class, and take the first M bits whose frequency is 1 or 0 and is close to 50%. (The closer it is to 50%, the greater the entropy)

After training, we get two groups of data:

  • K feature category centers.
  • For each category center, there is a group of "M bit location identifiers ". These identifiers form a basis for compressing the original feature. (This is called a projection vector later)

Index:

  1. Create an inverted table
  2. For each feature to be indexed, calculate its category center and use the projection vectors of this category to project the feature to obtain a m-bit signature. Mark as sig_templ, and insert this signature into the inverted table.

Ann search:

  1. Calculate the query_cluster in the category center of the query feature, and use the projection vectors of this class to project the feature to obtain a m-bit sig_query.
  2. Calculate the distance between the query feature and the class center, excluding the bit corresponding to the class projection vector, as dist_base.
  3. Traverse the inverted table items corresponding to query_cluster and calculate the distance between sig_query and sig_templ In the table items, which is recorded as dist_sig. The distance between the query feature and the index feature is Dist = dist_base + dist_sig. The obtained minimum distance is less than a certain threshold, and an Ann can be considered.

An analysis of time complexity:

Suppose K = 40, M = 64, feature is 32 byte
ORB (assuming there are 1 K indexing features ):

So we only need to do this for each Ann search.

Hamming Distance Calculation for 40 32 bytes + Hamming Distance Calculation for about 25 8 bytes.

Comparison and exhaustive search:

1000 32byte Hamming Distance Calculation

Speed improvement: 20x

 

    Example:
    Experiment Configuration:
  • Feature: Orb
  • Nn/Ann matching threshold: 50
  • K = 40
    For exhaustive search, image matching results: (No ransac)

After ransac:

In order to use the above Ann search method, the image matching result is: (No ransac)

After ransac:

It can be seen that the number of matching points will decrease significantly. The reason is the inherent defect of the Two-Level indexing method. (Two feature features that are very close to each other are classified into different clusters ). The use of multi assignment may be improved.

Wait for the code to sort out and send part2.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.