Keras Series ︱ Image Multi-classification training and using bottleneck features to fine-tune (iii)

Source: Internet
Author: User
Tags generator generator generator shuffle theano keras

Have to say, the depth of learning framework update too fast, especially to the Keras2.0 version, fast to Keras Chinese version is a lot of wrong, fast to the official document also has the old did not update, the anterior pit too much.
To the dispatch, there have been THEANO/TENSORFLOW/CNTK support Keras, although said TensorFlow a lot of momentum, but I think the next keras is the right path.
I first learned the Caffe, from the use of, more simple than Caffe, very useful, especially the training of a model, but, in fine-tuning, encountered a lot of problems, the novice is more difficult.

Chinese Document: http://keras-cn.readthedocs.io/en/latest/
Official documents: https://keras.io/
The document is mainly based on keras2.0.

Training, training is mainly "practice", so a few cases to know how to do.
.

. Keras Series:

1, Keras series ︱sequential and model models, KERAS basic structure functions (i)
2, Keras series of ︱application Five training models, VGG16 framework (sequential, model) interpretation (ii)
3, Keras series ︱ Image Classification training and the use of bottleneck features fine-tuning (iii)
4, Keras series ︱ Facial Expression Classification and recognition: OpenCV Face Detection +keras emotional Classification (iv)
5, Keras series of ︱ Migration learning: Using InceptionV3 for fine-tuning and forecasting, complete case (v)

. One, CIFAR10 small picture Classification example (sequential type)

To train a model, you first have to know what the data looks like. Let's take a look at how the classic Cifar10 is trained.
In the example, CIFAR10 uses the sequential to compile the network structure.

From __future__ import print_function import Keras to keras.datasets import cifar10 from Keras.preprocessing.image Impor  T imagedatagenerator from keras.models Import sequential to keras.layers import dense, dropout, activation, flatten from
Keras.layers import conv2d, maxpooling2d batch_size = num_classes = Ten epochs = data_augmentation = True # Data loading (X_train, Y_train), (x_test, y_test) = Cifar10.load_data () # Multi-category label Generation Y_train = Keras.utils.to_categorical (Y_train, NUM_CL Asses) y_test = keras.utils.to_categorical (Y_test, num_classes) # network configuration model = sequential () Model.add (conv2d (32, (3, 3) , padding= ' same ', input_shape=x_train.shape[1:]) Model.add (Activation (' Relu ')) Model.add (conv2d (32, 3, 3)) Model.add (Activation (' Relu ')) Model.add (Maxpooling2d (pool_size= (2, 2)) Model.add (Dropout (0.25)) Model.add ( Conv2d (3, 3), padding= ' same ')) Model.add (Activation (' Relu ')) Model.add (conv2d (3, 3)) Model.add (Activation (' Relu ') Model.add (maxpooling2d) (pool_sIze= (2, 2)) Model.add (Dropout (0.25)) Model.add (Flatten ()) Model.add (dense ()) Model.add (Activation (' Relu ')) Model.add (Dropout (0.5)) Model.add (dense (num_classes)) Model.add (Activation (' Softmax ')) # Training parameter Settings # initiate Rmsprop Optimizer opt = Keras.optimizers.rmsprop (lr=0.0001, decay=1e-6) # Let ' s train the model using Rmsprop model.compile (loss= ' Categorical_crossentropy ', optimizer=opt, metrics=[' accuracy '] # Generate training Data X_train = X_train.a Stype (' float32 ') x_test = X_test.astype (' float32 ') x_train/= 255 x_test-/= 255 if not data_augmentation:print (' not
    Using data augmentation. ') Model.fit (X_train, Y_train, Batch_size=batch_size, Epochs=epochs, Validation_dat
    A= (X_test, y_test), shuffle=true) else:print (' Using real-time data augmentation. ') # This'll do preprocessing and realtime data Augmentation:datagen = Imagedatagenerator (featurewise_center= False, # Set input meanTo 0 over the dataset Samplewise_center=false, # Set Each of the sample mean to 0 featurewise_std_normalization=
        False, # Divide inputs by Std ' DataSet samplewise_std_normalization=false, # Divide each input by its STD Zca_whitening=false, # Apply Zca whitening rotation_range=0, # Randomly rotate images in the range (Degr EES, 0 to 180) width_shift_range=0.1, # Randomly shift images horizontally (fraction of total width) Heig ht_shift_range=0.1, # Randomly shift images vertically (fraction of total height) horizontal_flip=true, # Random Ly Flip Images vertical_flip=false) # Randomly flip Images # Compute quantities required for feature-wise no
    Rmalization # (STD, mean, and principal components if ZCA whitening is applied).
    Datagen.fit (x_train) # Fit Training # fit the ' model on the ' batches generated by Datagen.flow ().
                              Model.fit_generator (Datagen.flow (X_train, Y_train,       batch_size=batch_size), steps_per_epoch=x_train.shape[0]//Batch_size, Epochs=epochs, Validation_data= (X_test, Y_test))

Just as the Caffe needs to compile the data into Lmdb, Keras also submits the data to its format. Look at the CIFAR10 data format:
. 1. Loading Data

(X_train, Y_train), (x_test, y_test) = Cifar10.load_data ()

This sentence used to upload data into the network, with the previous application, Pre-model like, there is time to constantly download the Internet, so you can change the same address, so that it read local files.
X_train format for example (100,100,100,3), 100 format for 100*100*3 image set; Y_train format is (100,)

. 2. Multi-category label specify Keras format

Keras labels for multiple categories require a fixed format, so you need to convert them in the following way, num_classes to the number of categories, assuming there are 5 categories:

Y_train = keras.utils.to_categorical (Y_train, num_classes)

The final output should be in the format (100,5)
. 3. Image preprocessing generator Imagedatagenerator

DataGen = Imagedatagenerator () 
Datagen.fit (X_train)

The generator initializes the DataGen and then datagen.fit to compute the statistics needed for the transformation that depends on the data
. 4. Final training format-batch

The data is divided according to each batch, so that it can be sent to the model for training. Lmdb much faster than Caffe.

Datagen.flow (X_train, Y_train, Batch_size=batch_size)

Receives numpy arrays and labels as parameters, generates batch data that has been upgraded or normalized, and returns batch data in an infinite loop.

.

. Ii. Official adaptation--multi-Classification Simple network structure (sequential type)

Image classification model for small datasets based on official documents
. 1. Data source and download

The official document is the cat and dog Two classification, at this time becomes a 5 classification, due to the pursuit of efficiency, from the Internet to find a very small dataset. From blog:
Caffe Learning Series (12): Training and testing their own pictures
Data Description:
A total of 500 pictures, divided into buses, dinosaurs, elephants, flowers and horses five classes, each class 100.
Download Address: Http://pan.baidu.com/s/1nuqlTnN
The numbering begins with 3,4,5,6,7, each of which is a class. I selected 20 pieces from each of them as a test and the remaining 80 as training. Therefore the final training picture 400, the test picture 100, altogether 5 class. The following figure:

. 2, load and model network construction

It's keras. Chinese documents This section has not been updated in time, but also need to see the original website. For example Keras Chinese document is convolution2d, but now is conv2d so a bit of a pit.

# Load and model network construction from
keras.models import sequential
to keras.layers import conv2d, maxpooling2d from
Keras.layers import activation, dropout, flatten, dense

model = sequential () model.add (
conv2d (3, 3), Input_ Shape= (150,3)))
# Filter Size 3*3, number 32, original image size 3,150,150
model.add (Activation (' Relu '))
Model.add ( Maxpooling2d (Pool_size= (2, 2))

Model.add (conv2d (3, 3))
Model.add (Activation (' Relu ')
) Model.add (Maxpooling2d (pool_size= (2, 2))

Model.add (conv2d (3, 3))
Model.add (Activation (' Relu '))
Model.add (Maxpooling2d (pool_size= (2, 2))
Model.add (Flatten ())  # This converts we 3D feature maps to 1D feature vectors
model.add (dense ())
Model.add (Activation (' Relu ')) Model.add (Dropout (
0.5))
Model.add (Dense (5))   #                               Matt, several classifications will have several dense
model.add (activation (' Softmax ')) #                     Matt, multiple categories

Two classification and multiple classification in front of the structure is not a problem, that is, need to change the final full connection layer, because there are 5 categories, so need dense (5), the activation function is Softmax, if two is dense (2) +sigmoid (activation function).

The following error has also occurred:

Error 1:model.add (convolution2d, 3, 3, input_shape= (3,))
valueerror:negative dimension size caused by SUBTR Acting 3 from 1 to ' Conv2d_6/convolution ' (op: ' conv2d ') with input shapes: [?, 1,148,32], [3,3,32,32].

Error 2:model.add (Maxpooling2d (pool_size= (2, 2))
valueerror:negative dimension size caused by subtracting 2 from 1 for ' Max_pooling2d_11/maxpool ' (op: ' Maxpool ') with input shapes: [?, 1,148,32].

Reason:
Input_shape= (3,150, 150) is the Theano, and TensorFlow need to write: (150,150,3);
Need to modify input_size. That is, "channels_last" and "Channels_first" data format problems.
. 3. Set Training parameters

# Two classification
#model. Compile (loss= ' binary_crossentropy ',
#              optimizer= ' Rmsprop ',
#              metrics=[') Accuracy '])

# Multi-classification
model.compile (loss= ' categorical_crossentropy ',                                 # Matt, multiple classifications, not binary_crossentropy
              optimizer= ' Rmsprop ',
              metrics=[' accuracy ')
# Optimizer Rmsprop: In addition to the learning rate adjustment, it is recommended to keep the optimizer's other default parameters unchanged

There are some differences between the parameters of two classifications and the parameters of multiple classifications.

. 4, Image preprocessing

Then we start to prepare the data, using the. flow_from_directory () to generate data and labels directly from our JPGs pictures.
One of the things that should be noted: Imagedatagenerator: Used to generate a batch image data to support real-time data elevation. During training, the function generates data indefinitely until the specified number of epoch is reached. Flow_from_directory (directory):
Takes the folder path as the parameter, produces after the data promotes/normalized data, produces the batch data in an infinite loop infinitely

Train_datagen = Imagedatagenerator (
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        Horizontal_ flip=true)

Test_datagen = Imagedatagenerator (rescale=1./255)

train_generator = Train_datagen.flow_from_ Directory (
        '/.../train ', 
        target_size=), # All images'll be  resized to 150x150
        batch_size=
        class_mode= ' categorical ')                               # matt, multiple classifications

Validation_generator = Test_datagen.flow_from_directory (
        '/.../validation ',
        target_size= (),
        batch_size=32,
        class_mode= ' categorical ')                             # Matt, multiple categories
# class_mode= ' binary '

This step is the data preparation phase, will be slow, and multiple classification, you need to set Class_mode as "categorical". Flow_from_directory is a number of attribute values that are computed and then thrown directly into these generators in the training phase.
. 5. Training Stage

Model.fit_generator (
        train_generator,
        samples_per_epoch=2000,
        nb_epoch=50,
        validation_data= Validation_generator,
        nb_val_samples=800)
# Samples_per_epoch, equivalent to the peak of each epoch data volume, each epoch to the sample number by model When _per_epoch, remember a epoch end
model.save_weights ('/.../first_try_animal5.h5 ')  

The final results demonstrate:

Epoch 48/50
62/62 [==============================]-39s-loss:0.0464-acc:0.9929-val_loss:0.3916-val_acc:0. 9601
Epoch 49/50
62/62 [==============================]-38s-loss:0.0565-acc:0.9914-val_loss:0.6423-v  al_acc:0.9500
Epoch 50/50
62/62 [==============================]-38s-loss:0.0429-acc:0.9960-val_loss: 0.4238-val_acc:0.9599
<keras.callbacks.history Object at 0x7f049fc6f090>

. 6. Problems that Arise

Question one: loss is negative
Cause: If loss is negative, it is because the label of the previous multiple categories which is not set, is now 5 categories, written in 2 categories after the result of loss as negative, like the following

Epoch 43/50
62/62 [==============================]-39s-loss: -16.0148-acc:0.1921-val_loss: -15.9440-VAL_ACC : 0.1998
Epoch 44/50
61/62 [============================>.]-Eta:0s-loss: -15.8525-acc:0.2049segmentatio N Fault (core dumped)

.

. three, fine-tuning mode one: using the bottleneck characteristics of the pre-training network

This section is mainly derived from: small data sets to build image classification model
Of course, the Keras Chinese version inside the loophole a lot of ... Not following the version update, resulting in a lot of content is wrong, hey ...

First look at the VGG-16 network structure is as follows:

In this section, the bottleneck feature is extracted and rolled into the next "small" model, which is the fully connected layer, through a trained model.
The implementation steps are: 1, the weight of the training model to take to, Model 2, run, extract bottleneck feature (network in full connection before the last layer of activated feature
Map, convolution-full connection layer, take out separately, and save 3, bottleneck layer data, after + dense full connection layer, carry out fine-tuning
. 1, import pre-training weights and network framework

Here Keras Chinese document is wrong, to see now the original author's blog,

Weights_path = '/home/ubuntu/keras/animal5/vgg16_weights_tf_dim_ordering_tf_kernels.h5 '
WEIGHTS_PATH_NO_TOP = '/home/ubuntu/keras/animal5/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5 ' from

Keras.applications.vgg16_matt Import VGG16
model = VGG16 (include_top=false, weights= ' imagenet ')

Weights_path_no_top is to remove the full connection layer, you can use his direct extraction of bottleneck characteristics, thank the original author.
. 2, extract the bottleneck characteristics of the picture

Need step: Load picture, pour into pre-model weight, get bottleneck feature

#如何提取bottleneck feature from Keras.models import sequential to keras.layers import conv2d, maxpooling2d from Keras.layer s import activation, dropout, flatten, dense # (1) Load Picture # Image Builder Initialize from Keras.preprocessing.image import Imagedatagenerato
        R Import NumPy as NP DataGen = Imagedatagenerator (rescale=1./255) # Training set Image Generator Generator = Datagen.flow_from_directory (
        '/home/ubuntu/keras/animal5/train ', target_size=, batch_size=32, Class_mode=none, Shuffle=false) # Validation set image Builder Generator = Datagen.flow_from_directory ('/home/ubuntu/keras/animal5/validation ', target_size= (batch_size=32, Class_mode=none, Shuffle=false) # (2) poured into the Pre-model
The weights model.load_weights ('/.../vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5 ') # (3) Get bottleneck feature Bottleneck_features_train = Model.predict_generator (generator, 500) # Core, steps is the number of wheels the generator returns data, each epoch contains 500 pictures, Relative to the Model.fit (Samples_per_epoch) np.save (open (' BottleneCk_features_train.npy ', ' W '), bottleneck_features_train) bottleneck_features_validation = Model.predict_generator ( Generator, 100) # compared to Model.fit (Nb_val_samples), a epoch has 800 pictures, validation set Np.save (' Bottleneck_features_validation.npy ', ' W '), bottleneck_features_validation

note that the Class_mode, at this time for the prediction scenario, the data phase, do not need to set the label, because this is in order, and in the Train_generator data training before the data preparation, you need to set the label shuffle, this time for the forecast scenario, Make data sets, do not disturb, but in the model.fit process needs to be disrupted, whether in the training process of each epoch before randomly upset the order of the input samples.

. 3, fine-tuning-"small" network

Main steps: (1) Import bottleneck_features data, (2) set up labels and standardize them into Keras default format, (3) write "Small Network" network structure (4) Set parameters and train

# (1) Import bottleneck_features data Train_data = np.load (open (' Bottleneck_features_train.npy ')) # The features were saved in Ord Er, so recreating the labels are easy train_labels = Np.array ([0] * 100 + [1] * 100 + [2] * 100 + [3] * 100 + [4] * 96) # Matt, play Tag validation_data = np.load (open (' bottleneck_features_validation.npy ')) Validation_labels = Np.array ([0] * 20 + [ 1] * + [2] * + [3] * + [4] *) # Matt, play tag # (2) Set the label and spec it into keras default format train_labels = keras.utils.to_categorical (t Rain_labels, 5) validation_labels = Keras.utils.to_categorical (Validation_labels, 5) # (3) write "Small Network" network structure model = Sequential () #train_data. Shape[1:] Model.add (Flatten (input_shape= (4,4,512))) # 4*4*512 Model.add (dense, 256 ' activation= ') ) Model.add (Dropout (0.5)) #model. Add (Dense (1, activation= ' sigmoid ')) # Two classification Model.add (dense (5, activation= ' Softmax ')) # Matt, multiple categories #model. Add (dense (1)) #model. Add (Dense (5)) #model. Add (Activation (' Softmax ')) # (4) Set parameters and train model.compile (loss = ' Categorical_crossentropy ', # Matt,Multiple classifications, not binary_crossentropy optimizer= ' Rmsprop ', metrics=[' accuracy '] model.fit (train_data, tr Ain_labels, Nb_epoch=50, batch_size=16, Validation_data= (Validation_data, validation_labels)) model.sa Ve_weights (' Bottleneck_fc_model.h5 ')

Because the size of the feature is very small, the model will run on the CPU very quickly, about 1s a epoch.

#正确的结果:
Epoch 48/50
496/496 [==============================]-0s-loss:0.3071-acc:0.7762-val_loss:4.9337 -val_acc:0.3229
Epoch 49/50
496/496 [==============================]-0s-loss:0.2881-acc:0.8004-val_los  s:4.3143-val_acc:0.3750
Epoch 50/50
496/496 [==============================]-0s-loss:0.3119-acc:0.7984 -val_loss:4.4788-val_acc:0.5625
<keras.callbacks.history object at 0x7f25d4456e10>

. 4. Problems encountered

(1) Flatten layer--the hardest layer to deal with
In the configuration network, I found flatten is the most prone to problems layer. A lot of problems because of the wrong format to lose to this layer. such as Error:

Statement: Model.add (Flatten (input_shape=train_data.shape[1:])
Valueerror:input 0 is incompatible with layer flatten_5: Expected min_ndim=3, found ndim=2

So to change to (4,4,512), so write (512,4,4) is not correct.

(2) label format problem
Model.fit after the error:

Valueerror:error when checking target:expected dense_2 to have shape (None, 5) but got array with shape (500, 1)

The label format is not set, especially if multiple classifications meet this problem. Need Keras.utils.to_categorical ()

Train_labels = Keras.utils.to_categorical (Train_labels, 5)

.

. Four, fine-tuning mode two: To adjust the weight

Keras Chinese Document + Original Author document This section is not written right.

Take a look at the whole structure first.

Fine-tune is divided into three steps:
-Build vgg-16 and load weights, add the previously defined fully connected network to the top of the model and load weights
-Freeze part of VGG16 network parameters
-Model Training

Note: 1, fine-tune, all layers should be trained to the weight of the initial value, for example, you can not be a random initial full connection on the pre-training convolution layer, this is because the large gradient generated by random weights will destroy the convolution layer of the weight of pre-training. 2, select only fine-tune last convolution block, rather than the entire network, this is to prevent the fit. The whole network has a large entropy capacity, so it has a high tendency to fit. The features learned by the underlying convolution module are more general, more abstract, so we have to keep the first two convolution blocks (learning general characteristics) do not move, only fine-tune the back of the convolution (learning special Features) 3, fine-tune should be at a very low rate of study, It is common to use SGD optimization instead of other adaptive learning rate optimization algorithms, such as Rmsprop. This is to ensure that the magnitude of the update is kept to a lesser extent so as not to destroy the characteristics of the pre training.
. 1. Step One: Build vgg-16 and load weights 1.1 Keras Document results

First look at the Keras Chinese document is this:

From Keras

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.