Deepid Practice
Reprint Please specify: http://blog.csdn.net/stdcoutzyx/article/details/45570221
Long time no blog, I have failed my blog. At present, the face verification algorithm can be said to be the strongest deepid, this paper uses Theano to achieve deepid. About Deepid, you can see my blog deepid three generations.
Of course Deepid most strongly refers to the deepid and the combined Bayesian two algorithm, this article only realizes the Deepid neural network, and uses it as the feature extractor to apply in other tasks.
The code used in this article works on GitHub: deepid_faceclassify, if this blog helps you, ask for Star. Well, for GitHub's star count I think I'm too shameless, haha.
Practice Process Environment Configuration
This project uses the Theano library, so before the experiment, the Theano environment must be equipped, Theano environment configuration can be see Theano document. The document is already more comprehensive, this article will not repeat, in the following, assume that the reader has installed the Theano.
Code Overview
The code structure used in this article is as follows:
src/├──conv_net│ ├──deepid_class.py│ ├──deepid_generate.py│ ├──layers.py│ ├──load_data.py│ └──sample_optimization.py└──data_prepare ├──vectorize_img.py ├──youtube_data_split.py └── youtube_img_crop.py
As noted in the name of the file, the code is divided into two modules, the Data Preparation module ( data_prepare
) and the Convolutional Neural Network module ( conv_net
).
Data preparation
I think Deepid's strength is due to two factors, the structure and data of convolutional neural networks, which are important for deepid or for any convolutional neural network.
Unfortunately, I went to the author of the paper to get the data, but was declined. So in this experiment, the data I used was not the data in the paper. As you can see from the following description, if you have other data, you can easily use Python to process it as input data for the DEEPID network in this article.
Take YouTube face data for example. Its folder structure is as follows, with a three-level structure, the first is a man-made unit, then everyone has a different video, and each video captures multiple face images.
youtube_data/├──people_foldera│ ├──video_foldera│ │ ├──img1.jpg│ │ ├──img2.jpg│ │ └──imgn.jpg│ └──video_folderb└──people_folderb
After getting the YouTube face data, there are two things to do:
- Pre-processing the image, the original YouTube face image, the faces are only a small part of the middle, we cut it, so that the proportion of the face is larger. At the same time, the image is scaled to (47,55) size.
- Divides the data collection into training sets and validation sets. This article divides the training set and validation set in the following ways:
- For everyone, mix the images under their different videos
- Randomization
- Select the first 5 as the validation set, 第6-25 Zhang as the training set.
After partitioning, we get 7975 validation sets and 31900 training sets. Obviously, according to these two numbers you can figure out a total of 1595 classes (people).
The code used for data preparation
Note: The YouTube-prefixed program in the Data prep module is designed to work with YouTube data because other data may have different structure for image properties and folders. If you use other data, read youtube_img_crop.py
and youtube_data_split.py
code, and then rewrite the code that fits your data. Data preprocessing code is very simple, I believe that on the basis of my code, do not need to change too much, you can adapt to another kind of data.
youtube_img_crop.py
is used to crop images, the images in the YouTube face data are fairly small in proportion to the faces, this program is used to cut off the edges of the image and then scale the image to 47x55 (the size of the deepid input image).
Usage: python youtube_img_crop.py aligned_db_folder new_folder
- Aligned_db_folder: Original Folder
- New_folder: The result folder is the same as the folder structure of the original folder, except that the image is the processed image.
youtube_data_split.py
Used to slice the data and divide the data into training sets and validation sets.
Usage: python youtube_data_split.py src_folder test_set_file train_set_file
- Src_folder: Original folder, which should be a new folder for the previous step
- Test_set_file: Validation Set Picture Path collection file
- Train_set_file: Training Set Picture Path collection file
test_set_file
and train_set_file
the format is as follows, each line is divided into two parts, the first part is the image path, the second part is the category tag of the image.
Youtube_47_55/alan_ball/2/aligned_detect_2.405.jpg,0youtube_47_55/alan_ball/2/aligned_detect_2.844.jpg, 0youtube_47_55/xiang_liu/5/aligned_detect_5.1352.jpg,1youtube_47_55/xiang_liu/1/aligned_detect_1.482.jpg,1
vectorize_img.py
Used to quantify the image, each image is 47x55, so each picture becomes a 47x55x3 vector.
In order to avoid the appearance of large files, the program automatically divides the data into small files, each small file only 1000 pictures, that is, 1000x (47x55x3) matrix. Of course, the last small file is not necessarily 1000.
Usage: python vectorize_img.py test_set_file train_set_file test_vector_folder train_vector_folder
- Test_set_file:
*_data_split.py
the generated
- Train_set_file:
*_ata_split.py
the generated
- Test_vector_folder: The folder name where the validation set vector file is stored
- Train_vector_folder: folder name to store the training set vector file
Conv_net
After the long road, we can finally destroy Huanglong. It's deepid time. Hey, yell, huh?
In the Conv_net module, there are five program files
- layers.py: Convolution neural network related to various levels of definition, including the logic of the bottom of the regression layer, hidden layer, convolution layer, max_pooling layer, etc.
- load_data.py: Load data for deepid.
- sample_optimization.py: Some test experiments at various levels.
- Deepid_class.py:DeepID Main Program
- deepid_generate.py: Extract the hidden layer according to the DEEPID training parameters
Conv_net code uses deepid_class.py
Usage: python deepid_class.py vec_valid vec_train params_file
- Vec_valid:
vectorize_img.py
the generated
- Vec_train:
vectorize_img.py
the generated
- Params_file: Used to store the parameters of each iteration of training, can be used to continue to run the breakpoint, because the CNN program generally takes a long time, in case of a power outage, it can be used. Naturally, a larger purpose is to save the parameters and use them to extract the features.
Note:
DEEPID training Process There are too many parameters to adjust, in order to ease the use of the program, I did not use these parameters are command-line arguments. If you want to change the number of iterations, the learning rate, the batch size, and so on, please call the function in the last line of the program.
deepid_generate.py
You can use the following command to extract the hidden layer of deepid, which is the layer of 160-d.
Usage: python deepid_generate.py dataset_folder params_file result_folder
- Dataset_folder: Can be a training set vector folder or a validation set vector folder.
- Params_file:
deepid_class.py
Training gets
- Result_folder: The resulting folder, the file under which corresponds to the file name one by one in Dataset_folder, but the length of the vector in the resulting folder becomes 160 instead of the original 7755.
Effect Display Deepid effect
After running deepid_class.py
, you can get the output as follows. The output can be divided into two parts, the first part is each iteration and the training set error of each small batch, validation set error and so on. The second part is a summary that will be epoch train error valid error
printed out in a uniform format.
Epoch, Train_score 0.000444, Valid_score 0.066000 epoch, Minibatch_index 62/63, error 0.000000epoch, train _score 0.000413, Valid_score 0.065733 epoch, Minibatch_index 62/63, error 0.000000epoch, Train_score 0.000508 , Valid_score 0.065333 Epoch, Minibatch_index 62/63, error 0.000000epoch, Train_score 0.000413, Valid_score 0 .070267 epoch, Minibatch_index 62/63, error 0.000000epoch, Train_score 0.000413, Valid_score 0.0645330 0.9743 49206349 0.9629333333331 0.890095238095 0.8974666666672 0.70126984127 0.6666666666673 0.392031746032 0.5201333333334 0.187619047619 0.3606666666675 0.20526984127 0.226 0.054380952381 0.1710666666677 0.0154920634921 0.1288 0.00650793650794 0.1001333333339 0.00377777777778 0.090933333333310 0.00292063492063 0.08611 0.0015873015873 0.079212 0.00133333333333 0.075466666666713 0.00111111111111 0.071466666666714 0.000761904761905 0.06815 0.000444444444444 0.06616 0.000412698412698 0.065733333333317 0.000507936507937 0.065333333333318 0.000412698412698 0.070266666666719 0.000412698412698 0.0645333333333
The above data is drawn as a line chart as follows:
Vector Extraction Effect Display
deepid_generate.py
after running, you can get the following output:
Loading data of VEC_TEST/0.PKL building the model ... generating ... Writing data to deepid_test/0.pklloading data of VEC_TEST/3.PKL building the Model ... generating ... Writing data to deepid_test/3.pklloading data of VEC_TEST/1.PKL building the Model ... generating ... Writing data to deepid_test/1.pklloading data of VEC_TEST/7.PKL building the Model ... generating ... Writing data to Deepid_test/7.pkl
The program extracts each file within the Vectorization folder and obtains the corresponding 160-d file.
After extracting the hidden layer, we can verify the validity of the feature in some other areas, rather than retrieving it. You can use my other GitHub project for testing, which is a link. Use the validation set for the query set, the training set to do the query set, to see how the results of the search.
To make a comparison, this article has done two face search experiments on YouTube faces data.
- PCA exp.
vectorized_img.py
on the generated data, use PCA to reduce the feature to 160-d and then perform a face retrieval experiment.
- Deepid exp.
deepid_generate.py
face retrieval experiments are performed directly on the generated 160-d data.
Note: in two experiments, I used cosine similarity to calculate the distance, before doing a lot of experiments, cosine distance is better than European distance.
The results of the face search are as follows:
- The correct rate is as follows:
Precision |
Top-1 |
Top-5 |
Top-10 |
Pca |
95.2% |
96.75% |
97.22% |
Deepid |
97.27% |
97.93% |
98.25% |
- The average correct rate is as follows:
AP |
Top-1 |
Top-5 |
Top-10 |
Pca |
95.2% |
84.19% |
70.66% |
Deepid |
97.27% |
89.22% |
76.64% |
Precision means that if the same category of people appear in the top-n results, the query will fail if it succeeds. And the AP means, in the top-n results need to count and query the image of the same category of pictures, and then divided by N, is the accuracy of this query, and then the average.
It can be seen from the results that, in the same dimension, Deepid is stronger than PCA in the expression of information.
Reference documents
[1]. Sun Y, Wang x, Tang x deep learning face representation from predicting classes[c]//computer Vision and Patte RN Recognition (CVPR), IEEE Conference on. IEEE, 2014:1891-1898.
Deepid Algorithm Practice