Local sensitive Hashi Ksh

Last Update:2015-03-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Kernel functions kernel function
- Process
- Analysis
Supervisory information Supervised information
Calculation of similarity by internal product method code Inner Product
Objective function Objective functions
Greedy algorithm for solving greedy optimization
Spectral Loose spectral relaxation
Sigmoid Smooth sigmoid Smoothing
Final algorithm
Reference documents

In a locally sensitive hash, the paper analyzes how the local sensitive hashing method is applied in the retrieval process, as well as the original hashing method and the hash method based on the p-stable distribution.

Reprint Please specify: http://blog.csdn.net/stdcoutzyx/article/details/44746839

Both the original hashing method and the hash method based on the p-stable distribution are randomly generated, and the effect is limited by the random function and creates turbulence. This paper describes a supervised learning hashing method, which learns different hashing methods according to different data, and has a great advantage over the randomly generated methods. This article describes the method of the original paper in [1], called Ksh, which is known as kernel-based supervised Hashing. The main points of the Ksh method are:

Kernel Function
Supervised information
Code Inner Product
Objective Function
Greedy optimization
Spectral relaxation
Sigmoid Smoothing

These points are combined into a complete set of use and training processes for the KSH approach. The following will be introduced to you.

Kernel functions (Kernel function) flow

The kernel hash method inherits from [2], as follows:
First, the M points are taken from the data, called the anchor points (anchor Point), and M is one of the important parameters of Ksh.

For a point x, you need to calculate the kernel function values of x and anchor points and get the M-dimensional vectors, such as the following formula, where the subscript is a sample marker. The selection of kernel functions is also a parameter that Ksh need to control.

Then, in the use of an m-dimensional vector A and the above vector to find the inner product, in minus the deviation b,a and B are ksh parameters.

The above formula obtains a real number, which can be evaluated according to its conformity.

In this way, the data is converted into a hamming. When the R group (A, b) parameter is available, the data can be converted to R-bit hamming codes.

In order to ensure that the learning of the Hamming code to save the largest amount of information, need to ensure:

Thus, B should be equal to the median of the sum of the first item in the F (X) formula.

Substituting B into f (x) to obtain:

Analysis

In the above process, a can be produced in a random way. However, in Ksh, A is learned based on the labeling data.
Notice the role of kernel in the above process, the first step of processing the data, the advantage is that it can be reduced dimension, for example, the original data is 10000-dimensional, but if you choose 500 anchor points, then the resulting data becomes 500 dimensions, greatly reducing the number of parameter a to learn.

Supervisory information (supervised information)

Since Ksh is a supervised learning algorithm, it is necessary to annotate the information, in the KSH algorithm, its labeling information is a matrix S.

The callout information of Ksh can be obtained as follows, select L samples from the sample set, then form a l*l matrix, and the value at (i,j) in the matrix indicates whether sample I and sample J are similar. This information is essentially the pair information, that is, whether the pair is similar to the two samples.

Calculation of similarity by internal product method (Code Inner product)

Assuming that the parameter A has been learned, the Hamming code can be obtained, if the Hamming code is consistent, the two samples are similar samples, otherwise it is not. The similarity calculation of Hamming code is different or calculated, however, in the course of learning, we should use Hamming coding similarity to inverse parameter A, which is difficult to be derivative by using XOR or calculation. Therefore, the XOR operation needs to be transformed. The conversion method is:

The Hamming code in the 0 value to 1, two hamming code of the inner product and the original similarity has one by one corresponding relationship, deduced as follows:

Among them, the Code function sample is converted to (1/-1) the Hamming codes, and the D function is to get the Hamming distance of the Chinese plaintext. Because the inner product range is [-r,r], in order to be normalized to [ -1,1], it is necessary to divide the inner product by R. As shown in the following:

Objective functions (Objective function)

The objective function is as follows:

Wherein, H is a matrix of l*r, L is the number of samples in the supervisory information, and R is the number of bits encoded by Hamming. S is the supervisory information matrix.

Expand the target function to get:

where k is the matrix of LXM, M is the number of anchor points, which represents the sample data processed by kernel function, A is a MXR matrix, that is, parameter a.

Greedy algorithm solution (greedy optimization)

To expand the target function again:

It is very difficult to solve the problem directly, so a greedy algorithm is proposed in this paper to obtain a better solution. The greedy way is to solve the problem by bit, first seek A1, then A2, until AK.

In order to solve the problem, we first need to define the remainder matrix, that is, the second item in the objective function, and then how to change after one bit. As follows

Apparently, R0=rs. Where A * is an already-completed parameter.

The objective function of a single solution, then, is:

In the first step equation, the first item is always equal to the square of L, and R does not change with a, so they are constants.
Therefore, the single-step objective function becomes:

Spectral easing (spectral relaxation)

In order to solve the objective function, the objective function is relaxed. Get:

The conditions that need to be met are derived from the results of not removing the SGN function. The problem is a standard feature vector problem:

Among them, AK is the characteristic vector corresponding to the maximum eigenvalue of the problem. After the value is calculated, it is not treated as the last calculated value, but as the initial value and then the following method is used to optimize it.

Sigmoid Smoothing (Sigmoid smoothing)

The above-mentioned spectral easing seems to be loose over the head, where it is smoothed with a method closer to the SGN function.

The PHI function is the simulation of the SGN function, which is almost completely close to the SGN function in [ -6,6].

Once you have added smoothing, you can use gradient descent to solve it. The gradient is as follows:

Final algorithm

最终算法流程

Reference documents

[1]. Liu W, Wang J, Ji R, et al. supervised hashing with Kernels[c]//computer Vision and Pattern recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012:2074-2081.

[2]. Kulis B, Grauman K. kernelized locality-sensitive Hashing[j]. Pattern analysis and Machine Intelligence, IEEE transactions on, 2012, 34 (6): 1092-1104.

Local sensitive Hashi Ksh

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Local sensitive Hashi Ksh

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Local sensitive Hashi Ksh

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support