Using CNN (convolutional neural nets) to detect facial key points tutorial (i)

Source: Internet
Author: User
Tags nets shuffle virtual environment theano

This tutorial uses lasagne, a tool based on Theano to quickly build a neural network:
1, the realization of several neural network construction
2, Discussion data augmentation method
3, discuss the importance of learning "potential"
4, Pre-discussion training (pre-training)
The above approach will help to improve our results.

This tutorial is based on a certain understanding of neural networks, as there is no longer a discussion of how neural networks work. But here are some good facts:
1, an online book of deeplearning
2, Alec.radford video tutorial "Using Python's Theano library for Deeplearning"
3, Andrej Karpathy finishing the exciting deeplearning example

Directory:
1, environmental requirements
2, Data Introduction
3, the first model: a traditional neural network with a hidden layer structure
4, test just the network
5, the second model: convolutional neural Networks
6, Data expansion
7, to change the learning rate and learning potential over time
8, Discard tips (dropout)
9, training Expert network
10, supervised pre-training

The first part: environmental requirements

Note: If you just want to see the tutorial you don't really need to run the code here, but if you have a CUDA-enabled GPU on hand and want to follow the steps of the tutorial to execute the code, here are a few guidelines:
1, let's say your support for Cuda's GPU is already ready, and Python2.7.x,numpy,pandas,matplotlib,scikit-learn is already installed.
2, install a virtualenv, and activate the virtual environment.
3. Install lasagne directly from GitHub, run the following command to install lasagne and the corresponding dependencies:

pip install -r https://raw.githubusercontent.com/dnouri/kfkd-tutorial/master/requirements.txt


Now that the environment is installed, you can run routines (mnist recognition) from the src/lasagne/examples/directory of the virtual environment:

cd src/lasagne/examples/python mnist.py

It takes a while to print out a command, because Theano is a compiler that writes in Python and compiles the operations of the matrix into GPU code, so lasagne calls Theano to do some computational conversions before the model is trained, and the conversion builds C code. After the training begins, the following information appears:

Epoch 1 of 500  training loss:            1.352731  validation loss:          0.466565  validation accuracy:              87.70 %Epoch 2 of 500  training loss:            0.591704  validation loss:          0.326680  validation accuracy:              90.64 %Epoch 3 of 500  training loss:            0.464022  validation loss:          0.275699  validation accuracy:              91.98 %...


If you have a GPU and you want Theano to use it, create a ~/.theanorc in your home directory and write down the following configuration in it:

[global]floatX = float32device = gpu0[nvcc]fastmath = True


The above steps have any bug, go to this place to report.

The second part of the score was introduced

For facial KeyPoint Detection This contest, the training set is a grayscale image of 96*96:

The 15 feature points are:
Left and right eye center, 2
Left and right eye lateral point, 2
Medial point of left and right eye, 2
Left and right eyebrows outside the side point, 2
Medial point of left and right eyebrows, 2
Left and right corner of the mouth, 2
Upper and lower Lip Center, 2
Nose tip, 1

An interesting surprise is that the entire sample set has 7,000 training samples for some feature points, but there are only 2000 points for each. Below to read the data:

# file Kfkd.pyimport osimport numpy as npfrom pandas.io.parsers import read_csvfrom sklearn.utils Import shuffleftrain = ' ~/data/kaggle-facial-keypoint-detection/training.csv ' ftest = ' ~/data/kaggle-facial-keypoint-detection/test.csv ' def load (Test=false, Cols=none): "" "Loads data from Ftest if *test* are True, otherwise from Ftrain. Pass a list of *cols* if you ' re is only interested in a subset of thetarget columns. "" " fname = ftest if test else ftraindf = Read_csv (Os.path.expanduser (fname)) # Load Pandas Dataframe # the Image column ha s pixel values separated by space; convert# the values to NumPy arrays:df[' image ' = df[' image '].apply (lambda im:np.fromstring (IM, sep= ")) if cols: # get  A subset of columns DF = df[list (cols) + [' Image ']]print (Df.count ()) # Prints the number of values for each COLUMNDF =  Df.dropna () # Drop all rows this has missing values in themx = Np.vstack (df[' Image '].values)/255. # scale pixel values to [0, 1]x = X.astype (np.float32) if not ' test: # only FtraiN have any target columns y = df[df.columns[:-1]].values y = (y-48)/$ # scale target coordinates to [-1, 1] X, y = Shuffle (x, Y, random_state=42) # Shuffle train data y = Y.astype (np.float32) else:y = Nonereturn X, YX, y = Load () print ("X.shape = = {}; X.min = = {:. 3f}; X.max = = {:. 3f} ". Format (X.shape, X.min (), X.max ())) print (" Y.shape = = {}; Y.min = = {:. 3f}; Y.max = = {:. 3f} ". Format (Y.shape, Y.min (), Y.max ()))

The read result is this:

$ python kfkd.pyleft_eye_center_x            7034left_eye_center_y            7034right_eye_center_x           7032right_eye_center_y           7032left_eye_inner_corner_x      2266left_eye_inner_corner_y      2266left_eye_outer_corner_x      2263left_eye_outer_corner_y      2263right_eye_inner_corner_x     2264right_eye_inner_corner_y     2264...mouth_right_corner_x         2267mouth_right_corner_y         2267mouth_center_top_lip_x       2272mouth_center_top_lip_y       2272mouth_center_bottom_lip_x    7014mouth_center_bottom_lip_y    7014Image                        7044dtype: int64X.shape == (2140, 9216); X.min == 0.000; X.max == 1.000y.shape == (2140, 30); y.min == -0.920; y.max == 0.996


This result tells us that the feature points of many graphs are incomplete, such as the right lip angle, only 2,267 samples. We dropped all the images with less than 15 feature points, and this line did it:
DF = Df.dropna () # Drop all rows this has missing values in them
Train our network with the remaining 2140 pictures as a training set. In this case, the feature (9216) is more than the input sample (2143), and overfitting will be a problem that bothers us.

Another notable point is that in the function of reading data, the image pixel values are scaled from 0~255 to [0,1], and the target value (the location of the feature points) is also reduced from 0~95 to [ -1,1].

adjourned

Using CNN (convolutional neural nets) to detect facial key points tutorial (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.