Deep Learning: Keras Learning Notes _ deep learning

Source: Internet
Author: User
Tags shuffle theano keras
Python vector:
Import NumPy as np

a = Np.array ([[[1,2],[3,4],[5,6]])
SUM0 = Np.sum (A, axis=0)
sum1 = Np.sum (A, Axis=1)

PR int SUM0
Print sum1

> Results:

[9 12]
[3 7] Dropout

In the training process of the deep Learning Network, for the Neural network unit, it is temporarily discarded from the network according to certain probability.
Dropout is a big kill for CNN to prevent the effect of fitting. Output is 10 categories, so the dimension is 10

Model.add (Dense, init= ' glorot_uniform ') Batch

Batch gradient descent, batch gradient descent
Random gradient descent, stochastic gradient descent
Mini-batch gradient decent, small batches of gradient drops
Batch_size Initialization method

Init is the keyword, and the ' uniform ' representation is initialized with a uniform distribution.
Model.add (Dense, init= ' uniform ') activation function

Each neuron has an activation function: Linear,sigmoid,tanh,softmax,leakyrelu and Prelu.
Model.add (Dense (64))
Model.add (Activation (' tanh '))
Or
Model.add (Dense, activation= ' tanh ')) #此处 ' Tanh ' is a string model related compile (optimizer, loss, class_model= "categorical"):

Parameters:
Optimizer: Specifies model-trained optimizer;
loss: objective function;
Class_mode: One of "categorical" and "binary", used only to compute the precision of the classification or using the Predict_classes method
Theano_mode:Atheano.compile.mode.Mode Instance controllingspecifying compilation options 
Fit (X, y, batch_size=128, nb_epoch=100, verbose=1, Validation_split=0,validation_data=none,shuffle=true,show_ Accuracy=false,callbacks=[],class_weight=noe, Sample_weight=none)

A model for training a fixed number of iterations

Back: Record dictionaries, including training error rates and validation error rates for each iteration; parameters: X: Training data y: Label batch_size: size of each training and gradient update block.
Nb_epoch: Number of iterations. Verbose: The way the progress is expressed.
0 means that no data is displayed, 1 indicates a progress bar, and 2 indicates that only one data is displayed. Callbacks: List of callback functions.
is the list of functions that are called automatically when the function is finished.
Validation_split: Verifies the proportion of data used. Validation_data: (X, y) tuples used as validation data.
will replace the validation data divided by Validation_split. Shuffle: Type Boolean or str (' batch '). Do you want to shuffle the sample for each iteration (see Bowen Theano Learning Notes 01--dimshuffle () function).
' Batch ' is a special option for handling data in HDF5 (Keras data format for storing weights).
Show_accuracy: Whether the classification accuracy is displayed for each iteration. Class_weigh: Classification weights key value pairs. Original: Dictionary mapping classes to a weight value, used for scaling the lossfunction (during only). The key is a category and the value is the corresponding weight for the category.
The loss function is measured only during the training process. Sample_weight:list or NumPy array with1:1 mapping to the training of samples, used for scaling the loss function (Duringtra Ining only). For time-distributed data, there are one weight per sample pertimestep, i.e. if your output data are shaped (Nb_samples, time Steps, Output_dim), your mask should be of shape (Nb_samples, Timesteps, 1). This allows your to maskout or reweight individual output timestEPS, which is useful in sequence tosequence learning.
 
Evalute (X, y, batch_size=128, show_accuracy=false,verbose=1, Sample_weight=none)

Show the effect of the model on validating data

Return: Error rate or (error rate, accuracy) tuple (if show_accuracy=true)

parameter: The parameters in the Fit function are basically the same, where the verbose takes 1 or 0, indicating a progress bar or no
Predict (X, batch_size=128, verbose=1)

Used to predict the test data

Returns: the predictive array parameters for the test data

: The same as the parameters in the Fit function.
Predict_classes (X, batch_size=128, verbose=1)

Used to predict the classification of test data

Returns: Array parameters for the prediction classification results for test data

: The same as the parameters in the evaluate function. 
Train_on_batch (X, y, Accuracy=false, Class_weight=none, Sample_weight=none)

Compute and gradient the data blocks

Returns: the error rate or tuple (if show_accuracy=true) parameter of the data block in an existing model

: the same as the parameter in the Evaluate function. 
Test_on_batch (X, y, Accuracy=false, Sample_weight=none)

Perform performance verification with data blocks

Returns: the error rate or tuple (error rate, accuracy) (if show_accuracy=true) parameter of the data block in the existing model

: the same as the parameter in the Evaluate function. 
Save_weights (fname, Overwrite=false)

Save the weights of all layers as HDF5 files

return: If Overwrite=false and fname already exist, an exception is thrown.

parameters:

fname: File name
overwrite: If already exist, overwrite the original file. 
Load_weights (fname):

Load the existing weight data into the model in the program. The models and programs in the file must have the same model structure. The Load_ weights function can be invoked before or after compile.

Parameters: FName file name
Sequential (linear superposition model) illustration
From keras.models import sequential to Keras.layers.core import dense, dropout, activation from Keras.optimizers Imp  

ORT SGD model = sequential () model.add (Dense (2, init= ' uniform ', input_dim=64)) Model.add (Activation (' Softmax ')) Model.compile (loss= ' MSE ', optimizer= ' sgd ') ' Verbose=1 or 2 results demo ' Model.fit (X_train, Y_train, nb_epoch=3, batch _size=16, verbose=1) # Output information ' "' Train on 37800 samples, validate on 4200 samples Epoch 0 37800/37800 [===========  ===================]-7s-loss:0.0385 Epoch 1 37800/37800 [==============================]-8s-loss:0.0140 Epoch 2 10960/37800 [=======> ...]-eta:4s-loss:0.0109 ' Model.fit (X_train, Y_train, nb_epoch=, ..........) 3, Batch_size=16, verbose=2) # Output information ' "' Train on 37800 samples, validate on 4200 samples Epoch 0 loss:0.0190 Ep och 1 loss:0.0146 Epoch 2 loss:0.0049 ' ' Show_accuracy=true demo, will output error rate-correct rate ' model.fit (X_train, Y_tra In, Nb_epoch=3, Batch_siZe=16, verbose=2, show_accuracy=true) # output Info ' ' Train on 37800 samples, validate on 4200 samples Epoch 0 loss:0 0190-acc.: 0.8750 Epoch 1 loss:0.0146-acc.: 0.8750 Epoch 2 loss:0.0049-acc.: 1.0000 ' "' Validation _split=0.1 indicates that 10% of the total sample is used for validation. 
For example, the example below, the total number of samples 42000, the validation data accounted for 10%, that is 4200, the remaining 37800 for training data. "' Model.fit (X_train, Y_train, nb_epoch=3, batch_size=16, validation_split=0.1, Show_accuracy=true, verbose=1) # OUTP UTS ' Train on 37800 samples, validate on 4200 samples Epoch 0 37800/37800 [==============================]-7s -Loss:0.0385-acc.:0. 7258-val. Loss:0.0160-val acc.: 0.9136 Epoch 1 37800/37800 [==============================]-8s-loss:0.0140- acc.:0. 9265-val. Loss:0.0109-val. Acc.: 0.9383 Epoch 2 10960/37800 [=======> ...]--eta:4s-loss:0------.  .0109-acc.: 0.9420 "'
Graph (arbitrary connection graph model) method and attribute introduction

Any connection graph can have any number of inputs and outputs, and each output is trained by a specialized cost function. The optimization of the graph model depends on the sum of all the cost functions. Look at the code will feel very clear, is completely a node connection diagram. It is possible to draw the direction diagram clearly. A model that applies to partial joins.

Model = Keras.models.Graph ()

Let's look at the methods in object model. Add_input (name, Input_shape, dtype= ' float ')

Add an input layer to model inside

Parameters:
Name: The unique string identifier of the input layer
input_shape: An integer tuple representing the new layer's shape. For example (10,) represents a 10-D vector, (None, 128) represents a variable-length 128-D vector, (3, 32, 32) represents a picture of a 3-channel (RGB) 32*32.
dtype:float or int; type of input data.
Add_output (name, Input=none, inputs=[], merge_mode= ' concat ')

Add an output layer that is connected to input or inputs

Parameters: Name
: Unique string identifier of the output layer
input: the names of the hidden layers to which the output layer is connected. Can only be one of the input or inputs.
Inputs: A list of the names of multiple hidden layers to which you want to connect the new layer.
merge_mode: "Sum" or "concat". is valid when specifying inputs, merging different inputs.
Add_node (layer, name, input=none,inputs=[], merge_mode= ' concat ')

Add an output layer connected to input or inputs (that is, remove the hidden layer outside the input output)

Parameters:
Layer:layer instance (described in layer)
name: Unique string identifier for hidden layer
input: Add the name of a hidden layer or input layer to which the hidden layer is connected. Can only be one of the input or inputs.
Inputs: Adds the list of names of multiple hidden layers to which the hidden layer is to be connected.
merge_mode: "Sum" or "concat". is valid when specifying inputs, merging different inputs. 
Compile (optimizer, loss)

Add an output layer to the input or inputs inside

Parameters:
Optimizer: Optimizer name or Optimizer object
loss: Dictionary key value pair. The key is the name of the output layer, and the value is the corresponding target function name or Target function object for that layer. 
Fit (data, batch_size=128, nb_epoch=100, verbose=1, Validation_split=0,validation_data=none, Shuffle=true, callbacks= [])

A model for training a fixed number of iterations

Back: Record dictionary, including training error rate and validation error rate for each iteration;

parameter:

data: Dictionary. The key is the name of the input or output layer, and the value is specific input data or output data. The example below is visible.
batch_size: Each training and gradient update block size
Nb_epoch: Iteration number
verbose: progress presentation. 0 means that no data is displayed, 1 indicates a progress bar, and 2 indicates that only one data is displayed.
Callbacks: callback function List
Validation_split: Verifies the usage of the data.
Validation_data: (X, y) tuples used as validation data. would override Validation_split.
Shuffle: Type Boolean or str (' batch '). Do you want to shuffle the sample for each iteration (see Bowen Theano Learning Notes 01--dimshuffle () function). ' Batch ' is a special option for handling data in HDF5 (Keras data format for storing weights).
Evalute (data, batch_size=128, verbose=1): Demonstrating the effect of the model on validating data
Return: Error Rate

parameter: The parameters in the Fit function are basically the same, where verbose takes 1 or 0, indicating a progress bar or no
Predict (data, batch_size=128, verbose=1): Used to predict test data
Returns: A key-value pair. The key is the output layer name, and the value is the prediction array parameter of the corresponding layer

: the same as the parameter in the FIT function. The input layer needs to be declared in data.
Train_on_batch (data): Compute blocks and gradient updates
Returns: the error rate parameters in the existing model for the block of data

: the same as the parameters in the evaluate function.
Test_on_batch (data): Performance verification of the model with a block
Returns: the error rate parameters of the block in the existing model

: the same as the parameters in the evaluate function.
Save_weights (fname, Overwrite=false): Save the weights of all layers as HDF5 files
return: If Overwrite=false and fname already exist, an exception is thrown.

parameters:

fname: File name
overwrite: If already exist, overwrite the original file. 
Load_weights (fname): Loads the existing weight data into the program's model. The models and programs in the file must have the same model structure. The Load_ weights function can be invoked before or after compile. Graph (arbitrary connection graph model) illustration
# with one input, two output graph model graph = graph () graph.add_input (name= ' input ', input_shape= ()) Graph.add_node (dense), name= ' Dense1 ', input= ' input ') graph.add_node (dense (4), name= ' Dense2 ', input= ' input ') graph.add_node (dense (4), Name= ' Dense3 ', input= ' dense1 ') graph.add_output (name= ' output1 ', input= ' Dense2 ') graph.add_output (name= ' output2 ', input= ') Dense3 ') graph.compile (' Rmsprop ', {' output1 ': ' MSE ', ' output2 ': ' MSE '}) History = Graph.fit ({' Input ': x_train, ' output1 ') : Y_train, ' Output2 ': Y2_train}, nb_epoch=10) # with two inputs, two output graph model graph = graph () graph.add_input (name= ' input1 ', input_ Shape= ()) graph.add_input (name= ' Input2 ', input_shape= ()) Graph.add_node (dense), name= ' dense1 ', input= ' Input1 ') Graph.add_node (dense (4), name= ' Dense2 ', input= ' Input2 ') graph.add_node (dense (4), name= ' dense3 ', input= ' Dense1 ') graph.add_output (name= ' output ', inputs=[' dense2 ', ' dense3 '], merge_mode= ' sum ') graph.compile (' Rmsprop ', {' Output ': ' MSE '}) History = Graph.fit ({' input1 ': X_train, ' input2 ': x2_train, ' ouTput ': Y_train}, nb_epoch=10) predictions = graph.predict ({' input1 ': x_test, ' Input2 ': X2_test}) # {' Output ': ...}
 
How to use a rule item

A rule item is a penalty term for a weight parameter. It is included in the cost function.
In Keras dense Layer, Timedistributeddense Layer, Maxoutdense Layer, convolution1d Layer and convolution2d There is a unified API in layer to apply the rule item.

The above layers have 3 key parameters:
W_regularizer: Instantiation to Keras.regularizers.WeightRegularizer (weighted rule)
b_ Regularizer: Instantiated in Keras.regularizers.WeightRegularizer (for bias rule)
activity_ Regularizer: Instantiation to Keras.regularizers.ActivityRegularizer  (rule of activation value, that is, weight and matrix point multiplication after the output of the rule)

Here's the same class for the rules of W and B, because they're basically the same way of implementing them. However, in everyday use, B is rarely used as a rule. Even if the rule of B, the result only a little improvement. Therefore, it is often used to rule the W.

Use the sample code as follows:

From keras.regularizers import L2, Activity_l2
model.add (Dense, input_dim=64, w_regularizer=l2 (0.01), activity _regularizer=activity_l2 (0.01))
Constraint restrictions

In Keras dense Layer, Timedistributeddense Layer, Maxoutdense Layer, convolution1d Layer and convolution2d A unified API is used in layer to use constraints.

2 Key parameters:
w_constraint: Constraint main weight matrix
b_constraint: Constraint bias value
From Keras.constraintsimport maxnorm
model.add (Dense (W_constraint =maxnorm (2)) #限制权值的各个参数不能大于2
Available constraint restrictions
Maxnorm (m=2): Maximum constraint
Nonneg (): Negative value Unitnorm not allowed
(): Normalized 

http://keras-cn.readthedocs.io/en/latest/getting_started/sequential_model/
http://keras-cn.readthedocs.io/en/latest/getting_started/concepts/
http://blog.csdn.net/niuwei22007/article/details/49375195

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.