Local sensitive hash kernelized locality-sensitive Hashing Page

Source: Internet
Author: User

kernelized locality-sensitive Hashing Page Brian kulis (1) and Kristen Grauman (2)
(1) UC Berkeley EECS and ICSI, Berkeley, CA
(2) University of Texas, Department of Computer Sciences, Austin, TX

Introduction

Fast indexing and search for large databases are critical to content-based image and video retrieval---particularly given t He ever-increasing availability of visual data in a variety of interesting domains, such as scientific image data, communi Ty photo collections on the Web, news photo collections, or surveillance archives. The most basic but essential tasks in image search are the "nearest neighbor" Problem:to take a query image and accurately Find the examples that is most similar to it within a large database. A naive solution to finding neighbors entails searching over all n database items and sorting them according to their Simi larity to the query, but this becomes prohibitively expensive when n was large or when the individual similarity function E Valuations is expensive to compute. For vision applications, this complexity was amplified by the fact that often the very effective representations is high-d Imensional or structured, and best known distance functions can require consiDerable computation to compare a single pair of objects.

To make large-scale search practical, vision researchers has recently explored approximate similarity search technique s, most notably locality-sensitive hashing (Indyk and Motwani 1998, Charikar 2002), where a predictable loss in accuracy I s sacrificed in order to allow fast queries even for high-dimensional inputs. In spite of hashing ' s success for visual similarity search tasks, existing techniques has some important restrictions. Current methods generally assume this data to is hashed comes from a multidimensional vector space, and require that T He underlying embedding of the data be explicitly known and computable. For example, LSH relies on the random projections with input vectors; Spectral hashing (Weiss et al NIPS) assumes vectors with a known probability distribution.

This was a problematic limitation, given that many recent successful vision results employ kernel functions for which th E underlying embedding is known only implicitly (i.e., only the kernel function is computable). It is thus far impossible to apply LSH and its variants to search data with a number of powerful kernels---including many Kernels designed specifically for image comparisons, as well as some basic well-used functions like a Gaussian RBF. Further, since visual representations is often most naturally encoded with structured inputs (e.g., sets, graphs, trees), The lack of fast search methods with performance guarantees for flexible kernels are inconvenient.

in the work, we present a lsh-based technique for performing fast similarity searches over arbitrary kernel f Unctions.  the problem is as Follows:given a kernel function and a database of N objects, what can we quickly find the most SIM Ilar item to a query object in terms of the kernel function? Like standard LSH, we hash functions involve computing random projections; However, unlike standard LSH, these random projections is constructed using only the kernel function and a sparse set of Examples from the database itself. Our main technical contribution are to formulate, the random projections necessary for LSH in kernel space. Our construction relies in an appropriate use of the central limit theorem, which allows us to approximate a random vector Using items from our database. The resulting scheme, which we call kernelized LSH (klsh), generalizes LSH to scenarios when the feature space embeddings is either unknown or incomputable.

Method

The main idea behind our approach are to construct a random hyperplane hash function, as-in-standard LSH, and to perform Computations purely in kernel space. The construction is based in the central limit theorem, which would compute an approximate random vectors using items from t He database. The central limit theorem states so, under very mild conditions, the mean of a set of objects from some underlying distr Ibution'll is Gaussian distributed in the limit as more objects is included in the set. Since for LSH we require a random vectors from a particular Gaussian distribution---that of a zero-mean, identity Covarianc E Gaussian---We can use the central limit theorem, along with a appropriate mean-shift and whitening, to form an Approxim Ate random vector from a unit-mean, identity covariance Gaussian. By performing this construction appropriately, the algorithm can is applied entirely in kernel space, and can also be APPL IED efficiently over very large data sets.

Once We have computed the hash functions, we use standard LSH techniques to retrieve nearest neighbors of a in a query to th E database in Sublinear time. In particular, we employ the method of Charikar for obtaining a small set of candidate approximate nearest neighbors, and Then these is sorted using the kernel function to yield a list of hashed nearest neighbors.

There is some limitations to the method. The random vector constructed by the Klsh routine are only approximately random; General bounds in the central limit theorem be unknown, so it's not clear how many database objects was required to get A sufficiently random vector for hashing. Further, we implicitly assume the objects from the database selected to form the random vectors span the subspace fro M which the queries is drawn. That's said, in practice the method was robust to the number of the database objects chosen for the construction of the random ve Ctors, and behaves comparably to standard LSH on non-kernelized data.

Experimental Results

Million Tiny Images. We ran klsh over the million images in the Tiny Image data set. We used the extracted Gist features from these images, and applied a nearest neighbor search on top of a Gaussian kernel.

The top left image of each set is the query. The remainder of the top row shows the top nearest neighbor using a linear scan (with the Gaussian kernel) and the second Row shows the nearest neighbor using Klsh. Note that, with this data set, the hashing technique searched less than 1 percent of the database, and nearest neighbors W ere extracted in approximately. seconds (versus, seconds for a linear scan). Typically the hashing results appear qualitatively similar to (or match exactly) the linear scan results.

We can see quantitatively how the results of the nearest neighbors extracted from Klsh compare to the linear scan nearest Neighbors in the above plot. It shows, for ten, and hashing nearest neighbors, how many linear scan nearest neighbors is required to cover the H ashing nearest neighbors.

Flickr Scene recognition. We performed a similar experiment with a set of Flickr images containing tourist photos from a set of landmarks. Here, we applied a chi-squared kernel on top of SIFT features for the nearest neighbor search. Note that these results didn't appear in the conference paper.

We can also measure how the accuracy of a k-nearest neighbor classifier with klsh approaches the accuracy of a linear scan K-nn classifier on the this data set. The above plot shows that, as epsilon decreases, the hashing accuracy approaches the linear scan accuracy.

Object recognition on Caltech-101. We applied our method on the Caltech-101 for object recognition, as there has been several recent kernel functions for IM Ages that has shown very good performance for object recognition, but has unknown or very complex featureembeddings. This data set also allowed us to test how changes in parameters affect hashing results.

The parameters p, T, and the number of hash bits only affect hashing accuracy marginally. The main parameter of interest is epsilon, a parameter from standard LSH which trades off speed for accuracy.

Local Patch indexing with the Photo tourism Data Set. Finally, we applied klsh over a data set of 100,000 image patches from the Photo tourism data set. We compared a standard Euclidean distance function (linear scan and hashing) with a learned kernel (linear scan and Hashin g). The particular learned kernel we used have no simple, explicit feature embedding (see the paper for details) but the Li Near scan retrieval results is significantly better than the baseline Euclidean distance, thus providing another example Where Klsh is useful for retrieval. The results indicate that the hashing schemes does not degrade retrieval performance considerably on this data.

Summary. We have shown this hashing can be performed over arbitrary kernels to yield significant speed-ups for similarity searches With little loss in accuracy. In experiments, we had applied klsh over several kernels, and over several domains:

    • Gaussian Kernel (Tiny Images)

    • chi-squared Kernel (FLICKR)

    • Correspondence Kernel (CALTECH-101)

    • Learned Kernel (Photo tourism)

Code

The code is available here. Note:the Code was updated July 5, and September, from correct bugs in CREATEHASHTABLE.M. Use the most recent version.

Paper

    • kernelized locality-sensitive Hashing for Scalable Image Search
      Brian Kulis & Kristen Grauman
      in  Proc. 12th International Conference on Computer Vision (ICCV) ,.
      [pdf] 

       

      Also see the following related papers, which apply LSH to learned Mahalanobis Metri CS:

    • fast similarity Search for learned Metrics
      Brian Kulis, Prateek Jain, & Kristen grauman
      IEEE transactions on Pattern analysis and Machine Intelligence , vol . No. 2009.2143--2157, pp.
      [PDF]

       

    • fast Image Search for learned Metrics
      Prateek Jain, Brian kulis, & Kristen Grauman
      in.  Proc. IEEE Conference on Computer Vision and Pattern Reco Gnition (CVPR) , 2008.
      [PDF]
    • from: http://web.cse.ohio-state.edu/~kulis/klsh/klsh.htm

Local sensitive hash kernelized locality-sensitive Hashing Page

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.