Cold Yang small && dragon Heart Dust
Date: March 2016.
Source: http://blog.csdn.net/han_xiaoyang/article/details/50856583
http://blog.csdn.net/longxinchen_ml/article/details/50903658
Disclaimer: Copyright, reprint please contact the author and indicate the source
1.
Key ContentIntroduction
The system is based on the CVPR2015 of the paper "deep learning of Binary Hash Codes for Fast image retrieval" Implementation of a large amount of data based on the content of the image retrieval system, 250w picture, for a given picture, search top 1000 similar time is about 1s, its basic background and principle will be mentioned below.
2. Basic issues and technologies
As we all know, content-based image retrieval system is based on the content of the image, in the existing image set to find the most "close" picture. The effects of this type of system (accuracy and speed) are directly related to two of things:
- Ability to express image features
- Approximate nearest neighbor's lookup
Let's talk about these two points in the light of the situation in our simple system.
First of all, the expression ability of image features, which has always been the core of content-based image retrieval is one of the most difficult points, the computer "see" The image of the pixel level of the expression of the low level of information and human understanding of the image of the high-dimensional content of the higher levels of information , there is a great gap between So we need a feature that expresses the image hierarchy information as richly as possible. Our previous blog also mentioned that deep learning is a very rich image of this level of information, there is a better expression of the framework, where each layer of intermediate data can express the image of some dimensions of information, relative to the traditional hist,sift and gist, The information expressed may be enriched, so here we use the features of the deep learning output to replace the traditional image features, hoping to have a more precise depiction of the image.
Besides "approximate nearest neighbor", ANN (approximate Nearest Neighbor)/Approximate nearest neighbor has been a hot research field. Because in the case of a massive sample, traversing all the samples, calculating distances, pinpointing the closest top K samples is a very time-consuming process, especially when the dimensions of the sample vectors are quite high , so sometimes we sacrifice a fraction of the precision, To complete a short time to find the approximate top K nearest neighbors, that is, Ann, the most common Ann algorithm includes local sensitivity hash/locality-sensitive hashing, optimal node precedence/best bin First and balanced Box-decomposition Tree, we will use the lsh/local sensitivity hash to complete this process. There are some very professional Ann libraries, such as Flann, interested students can understand.
3. Principle of the retrieval system
The image retrieval system and key links are as follows:
The process of image retrieval is simply to extract the characteristics of each picture database (the general form is the eigenvector), store in the database, for the image to be retrieved, extract the same eigenvector, and then the distance between the vector and the vector in the database, find the closest feature vectors, the corresponding picture is the result of the search.
Content-based image retrieval system The biggest difficulty in the last section has been said, one for most of the neural network output of the middle layer feature is very high, such as Krizhevsky, such as the Alexnet in the 2012 imagenet race, the output of the 7th layer contains a wealth of image information, But the dimensions are up to 4096 dimensions. 4096-dimensional floating-point vector and 4096-dimensional floating-point vector to find the similarity between the computation of large, so babenko in the paper neural codes for image retrieval in the PCA to 4096-dimensional features of PCA reduced-dimensional compression, It is then used for content-based image retrieval, where the effect is better than most traditional image features. At the same time, because the similarity between the features of the high dimension will consume a certain period, it is obviously not advisable to linearly compare the characteristic vectors in the database. Most of the Ann technique is to compress the high-dimensional eigenvector into the low dimension space and express it in 12 value, because the Hamming distance of two two-valued vectors is very fast in the low-dimensional space, so it can alleviate the aging problem to some extent. This part of the hash map of Ann is done in addition to the features, the system framework is trying to make convolutional neural network in the training process to learn the corresponding "binary search vector", or we can understand that the whole diagram first done a bucket operation, each time the retrieval of only the bucket and the picture of the adjacent bucket, Instead of doing it in the whole domain to improve the retrieval speed.
In this paper, we implement the "binary search vector": On the basis of the convolutional neural network structure used in the imagenet in the Krizhevsky and other 2012 years, a hidden layer (fully connected layer) is added between the 7th layer (4,096 neurons) and the output layer. Hidden layer of the neuron excitation function, you can choose sigmoid, so that the output value between 0-1 values, you can set the threshold (for example, 0.5), the output of this layer transformed into a 12-value vector as a "binary retrieval vector", so that in the use of convolutional neural network to do image classification training process, will "learn" The closest 12 value string to the result category can also be understood, we have the 7th layer 4096-dimensional output eigenvector, through the Neuron association compression into a low-dimensional 01 vector, but different from the other down and binary operation, which is done in a neural network, each picture to do a complete forward operation to get the category, The output of the 7th layer of output (4096 D) representing the rich information of the image and the 8th output of the image bucket (the number of neurons are specified by themselves, so the dimensions are not too high). The explanation of the illustrations in the quoted paper is the following structure:
The above image is the convolutional neural network used in the Imagenet race; After the adjustment, the convolution neural network is added between the 7th layer and the output layer (assuming 128 neurons), we will reuse imagenet to get the first 7 layers of the final model fine-tuning , the weights are obtained between the 7th layer, the 8 layer, and the output layer. The image below is the actual retrieval process, for all the pictures do convolutional neural network forward operations to get the 7th layer 4096-dimensional eigenvector and 8th 128-dimensional output (set threshold 0.5 can be converted to 12 value search vector), for the image to be retrieved, The same 4096-dimensional eigenvectors and 128-D 12-value retrieval vectors are found in the database to find the binary retrieval vectors corresponding to the "barrels" of the picture, the distance between the 4096-dimensional eigenvectors, and the final result is a re-pat. The image of the search example is more intuitive, for the "Eagle" image to be retrieved, the binary retrieval vector is 101010, take out the picture in the bucket (you can see the basic is also the eagle), than the 4096-dimensional feature vectors between the distance, re-order to get the final search results.
4. Pre-trained models
Generally speaking, in their own picture training set, for specific scenes of the image category training, the resulting neural network, the ability to express the middle layer features will be more targeted. The specific training process can be described in section 3rd. For those of you who don't want to take time out of training, or want to quickly build a content-based image retrieval system, you'll also find a convolutional neural network model trained on 100w images for everyone to use.
Here are 2 pre-trained models for you to extract "image features" and "Binary search Strings". The data set of 2 model training is consistent, and the convolutional neural network is constructed slightly differently. For tens of thousands of to a hundred thousand of levels of small-scale images to build a retrieval system, use model Image_retrieval_20_hash_code.caffemodel, for more than millions of images to build a retrieval system, please use the model image_retrieval_128_ Hash_code.caffemodel.
For the same picture, both output features are 4096 dimensions, but used as the "binary retrieval vector" length of the bucket, the former is 20, the latter is 128.
The model is the cloud disk address.
Dummies Environment Configuration Manual 1. About the system
This description is for Linux systems, preferably CentOS 7.0 or above, or Ubuntu 14.04 or more. A low version of the system may have issues with BOOST,OPENCV and other library version incompatibilities.
2. CentOS Configuration Method 2.1 Configure Yum Source
Configuring the appropriate Yum source is a "lazy" way to simplify many of the following operations. Without this step, many dependent libraries need to manually compile and specify the Caffe compilation path themselves, which is time-consuming and often compiles unsuccessfully.
In the domestic language with the source of Sohu or 163
rpm -Uvh http://mirrors.sohu.com/fedora-epel/7/x86_64/e/epel-release-7-2.noarch.rpm
If you're in a foreign country, check out the Fedora Mirror list to find the right Yum source to add.
We then let the new source take effect:
yum repolist
2.2 Installing a dependent library
The image retrieval system relies on the Caffe deep learning framework, so it is necessary to install Caffe dependent libraries: For example, PROTOBUF is required for configuration file parsing defined layers in Caffe, LEVELDB is the database that stores picture data during training, OPENCV is the image processing library, boost is the general C + + library, etc...
We installed with Yum install with one click:
sudo yum install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel
2.3 Installing the Scientific Computing Library
This section is understood, because the process of training and identification involves a lot of scientific calculations, so the necessary scientific computing libraries need to be installed. While Python version Caffe relies on some Python science compute libraries, PIP and Easy_install are sometimes installed with some problems, so some libraries are installed directly with Yum install.
yum install openblas-devel.x86_64 gcc-c++.x86_64 numpy.x86_64 scipy.x86_64 python-matplotlib.x86_64 lapack-devel.x86_64 python-pillow.x86_64 libjpeg-turbo-devel.x86_64 freetype-devel.x86_64 libpng-devel.x86_64 openblas-devel.x86_64
2.4 Remaining dependent
including Lmdb, etc.:
sudo yum install gflags-devel glog-devel lmdb-devel
If these extension package is not found here in the Yum source, but manually compiled (to have root privileges):
# Glogwget Https://google-glog. Googlecode. com/files/glog-0.3. 3. Tar. GZTar zxvf glog-0.3. 3. Tar. GZCD glog-0.3. 3./configuremake && Make Install# gflagswget Https://github. com/schuhschuh/gflags/archive/master. zipUnzip Master. zipCD Gflags-mastermkdir build && CD Buildexport cxxflags="-fpic"&& CMake. && make verbose=1Make && make install# Lmdbgit clone Https://github. com/LMDB/LMDBCD lmdb/libraries/liblmdbmake && make install
2.5 Python version dependent
When compiling the Pycaffe, we need some more Python dependent libraries. We can do it with PIP or Easy_install.
The PIP and Easy_install are configured by:
wget --no-check-certificate https://bootstrap.pypa.io/ez_setup.pypython ez_setup.py --insecurewget https://bootstrap.pypa.io/get-pip.pypython get-pip.py
There are Pycaffe python dependency packages in Caffe/python/requirements.txt, as follows:
cython>=0.19. 2numpy>=1.7. 1scipy>=0.13. 2scikit-image>=0.9. 3matplotlib>=1.3. 1ipython>=3.0. 0h5py>=2.2. 0leveldb>=0.191networkx>=1.8. 1nose>=1.3. 0pandas>=0.12. 0python-dateutil>=1.4,<2protobuf>=2.5. 0python-gflags>=2.0pyyaml>=3.10pillow>=2.3. 0
All can be installed with the following shell command:
forindo$reqdone
3. Ubuntu Configuration method
Basically consistent with CentOS, here's a simple list of shell commands to execute:
sudo apt-get install Libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev Libhdf5-serial -dev protobuf-compiler sudo apt-get Install -- no-install - Recommends Libboost-all -dev sudo Apt-get install Libgflags-dev libgoogle< Span class= "Hljs-attribute" >-glog -dev Liblmdb -dev
The Python section relies on package installation in the same way.
4. Compilation and preparation of Caffe
Ensure that the required dependencies for Caffe are complete and executed in the Caffe directory:
cp Makefile.config.example Makefile.config
According to their actual situation, modify the contents of the Makefile.config, mainly modified several of the following:
- If there is no GPU, just try to experiment with the CPU and
# CPU_ONLY := 1
remove the previous # number.
- If you use the GPU and have cudnn acceleration ,
# USE_CUDNN := 1
Remove the previous # number.
- If you use Openblas, it will be
BLAS := atlas
changed and BLAS := open
added BLAS_INCLUDE := /usr/include/openblas
(the default matrix operations library in Caffe is Atlas, but Openblas has some performance optimizations, so it is recommended to change Openblas)
Not to be continued ...
Deep Learning and computer Vision (11) _ Fast Image retrieval system based on Deepin learning