Kmeans based indexing and asypolicric distance Computation for Ann search (binary local feature): par

Last Update:2018-12-03 Source: Internet

Author: User

Tags scale image

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

From: http://www.cvchina.info/2012/01/13/kmeans-based-indexing-and-asymmetric-distance-computation-for-ann-search-binary-local-feature-part1/#more-3232

By Herve jegouHamming embedding and weak geometric consistency for large-scale image searchAndProduct
Quantization for Nearest Neighbor SearchOfInspired to apply kmeans clustering, inverted files, and asypolicric distance computation to the nearest neighbor retrieval of local features in binary form.

Main ideas:

Use kmeans to perform rough feature indexing.

Compress the feature according to the statistical data.

The distance between the index feature and the query feature is calculated in Asymmetric mode during retrieval.

Algorithm:

Training:

Kmeans is used to cluster the features to be indexed to obtain K centers. When clustering is performed on Binary feature, the classification center update method is as follows: for each bit, calculate the frequency of all corresponding bits falling on the features of the class, and take the height.
For each cluster, calculate the 1-0 frequency of each bit that falls into the features of this class, and take the first M bits whose frequency is 1 or 0 and is close to 50%. (The closer it is to 50%, the greater the entropy)

After training, we get two groups of data:

K feature category centers.
For each category center, there is a group of "M bit location identifiers ". These identifiers form a basis for compressing the original feature. (This is called a projection vector later)

Index:

Create an inverted table
For each feature to be indexed, calculate its category center and use the projection vectors of this category to project the feature to obtain a m-bit signature. Mark as sig_templ, and insert this signature into the inverted table.

Ann search:

Calculate the query_cluster in the category center of the query feature, and use the projection vectors of this class to project the feature to obtain a m-bit sig_query.
Calculate the distance between the query feature and the class center, excluding the bit corresponding to the class projection vector, as dist_base.
Traverse the inverted table items corresponding to query_cluster and calculate the distance between sig_query and sig_templ In the table items, which is recorded as dist_sig. The distance between the query feature and the index feature is Dist = dist_base + dist_sig. The obtained minimum distance is less than a certain threshold, and an Ann can be considered.

An analysis of time complexity:

Suppose K = 40, M = 64, feature is 32 byte
ORB (assuming there are 1 K indexing features ):

So we only need to do this for each Ann search.

Hamming Distance Calculation for 40 32 bytes + Hamming Distance Calculation for about 25 8 bytes.

Comparison and exhaustive search:

1000 32byte Hamming Distance Calculation

Speed improvement: 20x

Example:

Experiment Configuration:

Feature: Orb
Nn/Ann matching threshold: 50
K = 40

For exhaustive search, image matching results: (No ransac)

After ransac:

In order to use the above Ann search method, the image matching result is: (No ransac)

After ransac:

It can be seen that the number of matching points will decrease significantly. The reason is the inherent defect of the Two-Level indexing method. (Two feature features that are very close to each other are classified into different clusters ). The use of multi assignment may be improved.

Wait for the code to sort out and send part2.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More