Part five The second model: convolutional neural Networks
Demonstrates the convolution operation
LeNet-5-type convolutional neural network is the core of the great breakthrough in the field of computer vision recently. The convolution layer differs from the previous fully connected layer by using some techniques to avoid excessive number of parameters, but preserves the model's descriptive ability. These tips are:
1, local coupling: Neurons connect only a small fraction of the neurons in the previous layer.
2, weight sharing: In convolutional layers, the weights between subsets of neurons are shared. (The forms of these neurons are called feature maps [feature map])
3, Pooling: Static sub-sampling of input.
Illustration of locality and weight sharing
The convolution unit actually connects a small 2-D patch in the previous neuron, and prior knowledge allows the network to take advantage of the 2-D structure in the input.
When using the convolution layer in lasagne, we have to make some input preparations. The input is no longer a 9216-dimensional flat vector, but a three-dimensional matrix with a (c,0,1) Form, where C represents the channel (color), and 0 and 1 correspond to the x and y dimensions of the image. In our question, the specific three-dimensional matrix is (1,96,96), because we only use grayscale as a color channel.
A function load2d the above load function to complete the 2-dimensional to three-dimensional transformation:
def load2d(test=False, cols=None): X, y = load(test=test) X = X.reshape(-119696) return X, y
We are going to create a convolutional neural network with three convolution layers and two fully connected layers. Each convolutional layer is followed by a 2*2 to maximize the pooling layer. The initial convolution layer has 32 filter, then each convolution layer we double the number of filter. The form of a fully connected hidden layer will contain 500 neurons.
There is still no regularization of any form (penalty weights or dropout). It turns out that when we use filter of very small size, such as 3*3 or 2*2, these filter actually has a very good regularization effect.
The code is as follows:
Net2 = Neuralnet (layers=[(' input ', layers. Inputlayer), (' Conv1 ', layers. Conv2dlayer), (' Pool1 ', layers. Maxpool2dlayer), (' Conv2 ', layers. Conv2dlayer), (' Pool2 ', layers. Maxpool2dlayer), (' Conv3 ', layers. Conv2dlayer), (' Pool3 ', layers. Maxpool2dlayer), (' Hidden4 ', layers. Denselayer), (' Hidden5 ', layers. Denselayer), (' Output ', layers. Denselayer),], input_shape= (None,1, the, the), conv1_num_filters= +, Conv1_filter_size= (3,3), Pool1_pool_size= (2,2), conv2_num_filters= -, Conv2_filter_size= (2,2), Pool2_pool_size= (2,2), conv3_num_filters= -, Conv3_filter_size= (2,2), Pool3_pool_size= (2,2), hidden4_num_units= -, hidden5_num_units= -, output_num_units= -, output_nonlinearity=None, update_learning_rate=0.01, update_momentum=0.9, regression=True, max_epochs= +, verbose=1,) X, y = load2d ()# load 2-d dataNet2.fit (X, y)# Training for Epochs would take a while. We ' ll Pickle the # trained model So, we can load it back later: ImportCpickle asPickle withOpen' Net2.pickle ',' WB ') asF:pickle.dump (Net2, F,-1)
Training This network is going to cost a huge amount of space-time resources compared to the first network. Each iteration is 15 times times slower, and it takes more than 20 minutes for the entire 1000 iterations, which is based on a fairly good GPU.
But patience is always given back, and our results are naturally much better. Let's take a look at the output of running the script, showing the shape of a series of network layers, and note that because of the window size we chose, the first convolution layer's 32 filter outputs 32 94*94 of the feature map.
InputLayer (None, 1, 96, 96) produces 9216 outputsConv2DCCLayer (None, 32, 94, 94) produces 282752 outputsMaxPool2DCCLayer (None, 32, 47, 47) produces 70688 outputsConv2DCCLayer (None, 64, 46, 46) produces 135424 outputsMaxPool2DCCLayer (None, 64, 23, 23) produces 33856 outputsConv2DCCLayer (None, 128, 22, 22) produces 61952 outputsMaxPool2DCCLayer (None, 128, 11, 11) produces 15488 outputsDenseLayer (None, 500) produces 500 outputsDenseLayer (None, 500) produces 500 outputsDenseLayer (None, 30) produces 30 outputs
Next we see that the same as the first network output, each iteration training loss and verification loss and the ratio between them.
1000 iterations After the result is a very good upgrade relative to the first network.
>>> np.sqrt(0.00156648 1.8994904579913006
We take the same example from the test set to draw a comparison between the two networks ' predictions:
Sample1 = Load (test=True)[0][6:7] Sample2 = load2d (test=True)[0][6:7] y_pred1 = net1.predict (sample1) [0] Y_pred2 = net2.predict (Sample2) [0] Fig = pyplot.figure (figsize= (6,3)) Ax = Fig.add_subplot (1,2,1, xticks=[], yticks=[]) plot_sample (sample1[0], y_pred1, ax) ax = Fig.add_subplot (1,2,2, xticks=[], yticks=[]) plot_sample (sample1[0], Y_PRED2, Ax) pyplot.show ()
Comparison of Net1 (left) and Net2 prediction results
Then we can draw the learning curve of two networks:
It looks great, the new curve is very smooth, but it should be noted that Net2 's validation error curve tends to be faster than the training error curve compared to Net1. I believe that if you use a larger training set, the results will be further improved. If we transform the training use cases horizontally, will the network be further improved by doubling the training set?
Part VI expansion of data
In general, increasing the number of training sets will result in better training results for an over-fitting network. (If your network hasn't been fitted, you'd better make it bigger.) (otherwise insufficient to describe the data))
Data amplification allows us to increase the number of training cases by means of a number of tools (morphing, adding noise, and so on). Obviously, this is much more economical than collecting more samples by hand. Therefore, data expansion is an essential tool in the Deep Learning Toolkit.
We used to briefly mention batch learning in the previous paragraph. Capturing a matrix of training sets, divided into batches (128 use cases in our task batch), is the task of a batch iterator. When the training sample is divided into batches, the batch iterator can do the same thing quickly and easily by passing the input variant. So when we want to flip horizontally, we don't need to double the huge number of training sets. A better approach is that when we do batch iterations, we choose a 50% chance to flip horizontally. This is very handy, and for some problems this approach allows us to produce nearly unlimited training sets without the need to increase memory usage. At the same time, the warp operation of the input image can be performed on the GPU during the last batch operation, so it can be said that this operation hardly increases any additional resource consumption.
Flipping a picture horizontally is actually just a slicing operation of the matrix:
X, y = load2d()X_flipped = X[:, :, :, ::-1] # simple slice to flip all images# plot two images:fig = pyplot.figure(figsize=(63))ax = fig.add_subplot(121, xticks=[], yticks=[])plot_sample(X[1], y[1], ax)ax = fig.add_subplot(122, xticks=[], yticks=[])plot_sample(X_flipped[1], y[1], ax)pyplot.show()
Original picture (left) and flip picture
In the picture on the right, it is worth noting that the location of the key points does not match. Because we flip the picture, we also have to flip the horizontal axis of the target position, and also to exchange the position of the target value, because the value after the left_eye_center_x transformation is actually right_eye_center_x. We set up a tuple flip_indices to hold which column of the target vector needs to be swapped for a position. If you remember, the number of data bars we read at the beginning is this:
left_eye_center_x 7034left_eye_center_y 7034right_eye_center_x 7032right_eye_center_y 7032left_eye_inner_corner_x 2266left_eye_inner_corner_y 2266...
Because left_eye_center_x to and right_eye_center_x change position, we record (0,2), same left_eye_center_y to and right_eye_center_y change position, we record tuple (1,3), And so on Finally, we get a set of tuples as follows:
Flip_indices = [(0,2), (1,3), (4,8), (5,9), (6,Ten), (7, One), ( A, -), ( -, -), ( -, -), ( the, +), ( A, -), ( at, -), ]# Let's see if we got it right:DF = Read_csv (Os.path.expanduser (Ftrain)) forI, JinchFlip_indices:print ("# {}, {}". Format (Df.columns[i], df.columns[j]))# This prints out:# left_eye_center_x-right_eye_center_x# left_eye_center_y-right_eye_center_y# left_eye_inner_corner_x-right_eye_inner_corner_x# left_eye_inner_corner_y-right_eye_inner_corner_y# left_eye_outer_corner_x-right_eye_outer_corner_x# left_eye_outer_corner_y-right_eye_outer_corner_y# left_eyebrow_inner_end_x-right_eyebrow_inner_end_x# left_eyebrow_inner_end_y-right_eyebrow_inner_end_y# left_eyebrow_outer_end_x-right_eyebrow_outer_end_x# left_eyebrow_outer_end_y-right_eyebrow_outer_end_y# mouth_left_corner_x-mouth_right_corner_x# mouth_left_corner_y-mouth_right_corner_y
The implementation of our batch iterator will derive from the Bachiterator class, overloading the transform () method. Combine these things together to see the full code:
class flipbatchiterator(batchiterator):Flip_indices = [(0,2), (1,3), (4,8), (5,9), (6,Ten), (7, One), ( A, -), ( -, -), ( -, -), ( the, +), ( A, -), ( at, -), ] def transform(self, Xb, YB):XB, YB = Super (Flipbatchiterator, self). Transform (XB, YB)# Flip Half of the images in this batch at random:BS = xb.shape[0] indices = Np.random.choice (BS, BS/2, replace=False) Xb[indices] = Xb[indices,:,:,::-1]ifYb is not None:# Horizontal flip of all x coordinates:Yb[indices,::2] = yb[indices,::2] * -1 # Swap places, e.g left_eye_center_x-right_eye_center_x forA, binchSelf.flip_indices:yb[indices, a], yb[indices, b] = (yb[indices, b], yb[indices, a])returnXb, YB
Using the above batch iterator for training, you need to pass it as the Batch_iterator_train parameter to neuralnet. Let's define NET3, a network that is very similar to Net2. Just add these lines at the end of the network:
net3 = NeuralNet( # ... regression=True, batch_iterator_train=FlipBatchIterator(batch_size=128), max_epochs=3000, verbose=1, )
Now that we've adopted the latest flipping techniques, we've increased the number of iterations by three times times. Because we don't really change the total number of training sets, each epoch still uses the same number of samples as just as many. It turns out that with the new tricks, each session of the epoch is more than just a little more time. This time our network learned more general, theoretically speaking, learning more general law than to learn to fit is always more difficult.
This network will take an hour of training time, and we want to make sure that the resulting model is saved after training. Then you can go to have a cup of tea or do housework, washing clothes is also a good choice.
net3.fit(X, y)importas picklewith open(‘net3.pickle‘‘wb‘as f: pickle.dump(net3, f, -1)
$ python kfkd.py...Epoch | Train Loss | Valid Loss | Train/val--------|--------------|--------------|----------------... -|0.002238|0.002303|0.971519... +|0.001365|0.001623|0.841110 the|0.001067|0.001457|0.732018 -|0.000895|0.001369|0.653721 2500|0.000761|0.001320|0.576831 the|0.000678|0.001288|0.526410
Let's draw a learning curve and net2 comparison. You should see the effect after 3,000 iterations, and the Net3 is 5% less than the Net2 verification loss. We see that after 2000 iterations, Net2 has stopped learning and the curve has not been smooth, but the net3 effect has been improving, albeit slowly.
Looks like a lot of effort, but only a little bit of bonus, isn't it? We will find the answer in the next section.
To be continue
Using CNN (convolutional neural nets) to detect facial key points Tutorial (iii): convolutional neural Network training and data augmentation