Application of learning hash and hash in big data retrieval and mining

Source: Internet
Author: User
Tags scale image

Http://cs.nju.edu.cn/lwj/conf/CIKM14Hash.htm

Learning to hash with its application to big data retrieval and mining

 

Overview

Nearest Neighbor (NN) Search plays a fundamental role in machine learning and related areas, such as information retrieval and data mining. hence, there has been increasing interest in NN search in massive (large-scale) data sets in this big data era. in actual real applications, it's not necessary for an algorithm to return the exact nearest neighbors for every possible query. hence, in recent years approximate nearest neighbor (ANN) search algorithms with improved speed and memory saving have saved more and more attention from researchers.

Nearest Neighbor (NN) Search] plays an important role in machine learning and other related fields. For example ,[?? NF ?? Me ?? N r ?? Triv? L]) and [data mining ,[? Det? ? Ma? N?]). Therefore, in this big data era, people are increasingly interested in [massive data (large-scale) data sets)] nearest neighbor searches. In many practical applications, it is unnecessary to use algorithms to return the exact nearest neighbor for each possible query. Therefore, in recent years, the [approximate nearest neighbor (ANN) Search) algorithm, which can increase the speed and save space, has received a lot of attention from researchers.

Due to its low storage cost and fast query speed, hashing has been widely adopted for Ann search in large-scale datasets. the essential idea of hashing is to map the data points from the original feature space into binary codes in the hashcode space with similarities between pairs of data points preserved. the advantage of binary codes representation over the original feature vector representation is twofold. firstly, each dimension of a binary code can be stored using only 1 bit while several bytes are typically required for one dimension of the original feature vector, leading to a dramatic detection ction in storage cost. secondly, by using binary codes representation, all the data points within a specific Hamming distance to a given query can be retrieved in constant or sub-linear time regardless of the total size of the dataset. hence, hashing has become one of the most valid tive methods for big data retrieval and mining.

Due to the low storage cost and high query speed of hash, it is widely used in the approximate Nearest Neighbor Search of big data. The basic idea of hash is to map the data points in the original feature space into the binary code of the hash code space, and also save the similarity between each pair of data points. The representation of binary code has two advantages over that of the original feature vector. First, each binary code can be stored in 1 bit, while a original feature vector requires several bytes for storage, resulting in a significant reduction in storage costs. Second, it is represented by binary code. For a given query, all) ], regardless of the total size of the dataset. Therefore, hash has become one of the most effective methods for big data retrieval and mining.

To get valid hashing codes, most Methods Adopt machine learning techniques for hashing function learning. hence, learning to hash, which tries to design into tive machine learning methods for hashing, has recently become a very hot research topic with wide applications in every big data areas. this tutorial will provide a systematic introduction of learning to hash, including the motivation, models, learning algorithms, and applications. firstly, we will introduce the challenges faced by us when grouping Ming retrieval and mining with big data, which are used to well motivate the adoption of hashing. secondly, we will give a comprehensive coverage of the foundations and recent developments on learning to hash, including unsupervised hashing, supervised hashing, multimodal hashing, etc. thirdly, quantization methods, which are used to turn the real values into binary codes in using hashing methods, will be presented. fourthly, a large variety of applications with hashing will also be introduced, including image retrieval, cross-modal retrieval, recommender systems, and so on.

To obtain efficient hash encoding, many methods for hash function learning use machine learning technology. Therefore, learning hash, that is, designing an effective Machine Learning Method for hash as much as possible, has recently become a very hot research topic and has many applications in many big data fields. This tutorial provides an introduction to the hash learning system, including power, model, learning algorithm, and application. First, we will introduce the challenges we face when searching and mining big data. This is a good motivation for hashing. Next, we will give a comprehensive overview of the basic and recent developments in hash learning, including unsupervised hash, regulatory hash, multi-mode hash, and so on. Third, we will introduce [quantization methods], which is used in many hash methods to convert real values into binary codes. Fourth, a large number of different hash applications will also be introduced, including image retrieval, cross-modal retrieval, and recommendation systems.

References

[1] peichao Zhang, Wei Zhang, Wu-Jun Li, minyi Guo. supervised hashing with latent factor models. to appear in proceedings of the 37th International acm sigir Conference on research and development in information retrieval (SIGIR), 2014.

[2] Dongqing Zhang, Wu-Jun Li. large-scale supervised multimodal hashing with semantic correlation maximization. to appear in proceedings of the twenty-eighth aaai Conference on Artificial Intelligence (aaai), 2014.

[3] Ling Yan, Wu-Jun Li, Gui-rong Xue, dingyi Han. coupled group Lasso for web-scale CTR prediction in display advertising. proceedings of the 31st International Conference on Machine Learning (icml), 2014.

[4] weihao Kong, Wu-Jun Li. isotropic hashing. Proceedings of the 26th Annual Conference on neural information processing systems (NIPs), 2012.

[5] weihao Kong, Wu-Jun Li, minyi Guo. manhattan Hashing for large-scale image retrieval. proceedings of the 35th International acm sigir Conference on research and development in information retrieval (SIGIR), 2012.

[6] weihao Kong, Wu-Jun Li. Double-bit quantization for hashing. Proceedings of the Twenty-sixth aaai Conference on Artificial Intelligence (aaai), 2012.

 

Slides & outline (slides & outlines)

TBD (to be determined ;)

 

Presenter
  Wu-Jun Li

Dr. wu-Jun Li is currently an associate employee sor of the Department of Computer Science and Technology at Nanjing University, P. r. china. from 2010 to 2013, he was a faculty member of the Department of Computer Science and Engineering at Shanghai Jiao Tong University, P. r. china. he has ed his PhD degree from the Department of Computer Science and Engineering at Hong Kong University of Science and Technology in 2010. before that, he has ed his M. eng. degree and B. SC. degree from the Department of Computer Science and Technology, Nanjing University in 2006 and 2003, respectively. his main research interests include machine learning and pattern recognition, especially in statistical relational learning and Big Data Machine Learning (big learning ). in these areas he has published more than 30 peer-reviewed papers, most in prestigious journals such as tkde and top conferences such as aaai, aistats, cvpr, icml, ijcai, nips, SIGIR. he has served as the PC member of icml '14, ijcai '13/'11, Nips '14, SDM '14, uai' 14, etc.

Dr. Li is an associate professor at the Department of Computer Science and Technology, Nanjing University, China. From 2010 to 2013, he is a faculty member in the Computer Science and Engineering Department of China Shanghai Jiao Tong University. In 2010, he received a doctorate in computer science and engineering from the University of Hong Kong. Prior to that, he obtained a master's degree in engineering and a bachelor's degree in science from the Computer Science and Technology Department of Nanjing University in 2006 and 2003 respectively. His main research interests include machine learning and pattern recognition, especially in big data statistical relationship learning and machine learning. In these fields, he has published more than 30 peer review papers, mostly in famous newspapers such as tkde and top conferences such as aaai, aistats, cvpr, icml, ijcai, nips, and SIGIR. He was a member of the Program Committee for icml '14, ijcai '13/'11, Nips '14, SDM '14, uai' 14.

Application of learning hash and hash in big data retrieval and mining

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.