Learning Note TF052: convolutional networks, neural network development, alexnet TensorFlow implementation

Source: Internet
Author: User

convolutional Neural Network (convolutional neural network,cnn), weighted sharing (weight sharing) network structure reduces the complexity of the model and reduces the number of weights, which is the hotspot of speech analysis and image recognition. No artificial feature extraction, data reconstruction, direct image input, automatic extraction of features, translation, scaling, tilt and other picture deformation has a high degree of deformation. Convolution (convolution), the Universal Function Analysis integral transformation mathematical method, two functions f and G to generate a third function mathematical operator, the characterization function sensitive F and g flip, translation overlap part of the area. f (x), g (x) are R1 two integrable functions. The new integral function is the function f and g convolution. ∫-∞+∞f (Tau) g (X-E) dτ.

A neural network (neural networks,nn) consists of an input layer, a hidden layer, and an output layer. The convolution neural network hiding layer is divided into convolution layer and pooling layer (pooling layer). Convolution cores (conventional kernel) of the convolution layer extract features in the original image translation, and each feature is a feature map. The pooling feature sparse parameter reduces the number of learning parameters and reduces the complexity of the network. Maximum pooling (max pooling), average pooling (average pooling). Convolution kernel extraction feature map action padding, moving step (Stride) does not necessarily evenly divide the width of the image, the edge sampling is same, not across the edge sampling as valid.

Necognitron->lecun->lenet->alexnet
Network deepening: Vgg16->vgg19
Enhanced convolution layer Features: nin->googlenet->incption V3 incption V4
Combining ResNet
Category task to detection task: KCNN->FASTRCNN->FASTER-CNN
Add new function module: fcn->stnet->cnn+rm/lstm
Convolution neural Network starting point neuro-cognitive machine (Neocognitron) model, the convolution structure appears. The first convolutional neural network model lecun,1989, inventor LeCun. Paper Http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf. Radial basis functions (radial basis function RBF). 1998, Lecun,lenet. Http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf. SVM manual design feature classifier. Relu,dropout, GPU, Big Data, 2012 historic breakthrough alexnet.

Network deepened.

LeNet. The input layer 32x32, the image preprocessing reshape, the latent obvious characteristic appears in the highest layer characteristic monitoring convolution core center. Convolution Layer 3 (C1, C3, C5), enhance the original signal characteristics, reduce noise, online Demo: https://graphics.stanford.edu/courses/cs178/applets/convolution.html Different convolution cores have different output feature mappings. Reduced sampling Layer 2 (S2, S4), reduce network training parameters, model overfitting degree, maximum pooling (max pooling) selected area maximum, average pooling (mean pooling) selected area mean. The fully connected Layer 1 (F6), the input vector and the weight vector dot product plus offset, passed to the sigmoid function, generating the unit I state. Output layer (Gaussian connection), European radial basis function (Euclidean radial basis funtion) unit, 10 categories corresponding to 10 radial basis function units, 84 inputs per unit, Output RBF unit calculates the input vector and the class tag vector Euclidean distance, the farther the larger.

alexnet,2012, Geoffrey hintion and student Alex Krizhevsky, Ilya Sutskever, ImageNet classification with deep convolutional Neural Networks ". Different GPUs process different parts of the image, communicating only on a partial layer. 5 convolution layers, 3 fully connected layers, 50 million adjustable parameters. Finally, the fully connected layer outputs to the 1000-D softmax layer, resulting in a 1000-class marker distribution. To prevent overfitting, dropout 0.5 probability hidden layer neurons output 0, share weights, reduce mutual adaptation, and increase the number of convergence iterations one times. Data Enhancement (augmentation) distortion (flip image horizontally, reflection change Flip, original image random shift transform crop, random illumination, color transform, color jitter) adds new data. Nonlinear activation function, ReLU, converges faster than Sigmoid/tanh. Big data training, 1.2 million imagenet image data. GPU implementation that reads and writes directly from the GPU memory. The LRN (local response normalization) normalization layer.

Enhanced convolutional layer functionality.

Vggnet,karen Simonyan, Andrew zisserman "Very deep convolutional Networks for Large_scale Visual recognition"/HTTP// www.robots.ox.ac.uk/~vgg/research/very_deep/. 5 convolution group (8-16 layer), 2 layer full-attached layer image feature, 1-layer full-connection classification feature. Increase the number of convolutional layers to achieve the correct rate of improvement bottleneck.

Googlenet. NIN (Network in Network) thought, Min Lin, Qiang Chen, Shuicheng Yan thesis "Network in Network" https://arxiv.org/abs/1312.4400. The linear convolution layer (linear convolution layer) changes multilayer perceptual convolution layer (multilayer perceptron), and the full connection layer changes to global average pooling. 2014 Googlenet (Inception V1), Christian szegedy, Wei Liu thesis going deeper with convolutions https://arxiv.org/abs/1409.4842. The 1x1?3x3 and 5x5 convolution results are connected at the full connection layer using 1x1 convolution kernel dimensionality reduction. Width, depth expansion, acceleration. Layers deeper, 22 layers, different depths add two loss functions to avoid the reverse propagation gradient disappear. Adding a variety of size convolution cores, reduced dimension inception model, 1x1 convolution kernel to reduce the feature map thickness.

Combine the network to deepen and enhance the convolution module function.

ResNet. 2015, ILSVRC does not rely on external data object detection, object recognition Project champion, MSRA He Keming, 152 floor. ImageNet classification, detection, positioning, coco Data Set Detection (Deteciton), separation (segmentation) champion. Kaiming He, Xiangyu Zhang, shaoqing Ren, Jian Sun "deep residual learning for Image recognition" https://arxiv.org/abs/ 1512.03385. NET degradation (network degradation), shortcut structure, input jump layer transfer and convolution results. Residuals (residual), Complex nonlinear mappings H (x) predictive image classification, residual function (residual functions) F (x) =h (x)-X, optimized residual mapping is simpler than direct optimization h (x).

From classification tasks to inspection tasks. Image target detection, video target detection (VID).

R-cnn,region proposal Networks (RPN) and CNN combined. RPNs, any size picture a series of suggested areas with the probability score of a recognized object. Using a small network in the final convolution feature mapping sliding scanning, the sliding network each time with the feature map window is connected, mapped to the low-dimensional vector, into two full-connected layers (box regression layer box-regression layers and box classification layer box-classification layers). Repeated calculations, thousands of suggested regions (region) overlap each other, repeated extraction characteristics.
Fast R-CNN, accelerated version, finally recommended area mapping CNN finally convolution layer feature map, a picture only extracts one feature, improve speed, bottleneck in RPN, support multi-class object simultaneous detection, pedestrian vehicle detection technology.
FATER-R-CNN,RPN to CNN to do, to reach real-time. Shaoqing Ren, kaiming He, Ross Girshick, Jian Sun thesis Faster r-cnn:towards real-time Object Detection with region proposal NETW Orks "https://arxiv.org/abs/1506.01497.

Add a new function module.

FCN (deconvolution), Stnet, CNN and RNN/LSTM hybrid structures.

MNIST alexnet implementation. Network structure diagram.
1. Study the network paper carefully, understand each layer input, output value, network structure.
2. Implement the network by loading data, defining the network model, training the model, and evaluating model steps.

https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py.
https://github.com/tensorflow/models/blob/master/tutorials/image/alexnet/alexnet_benchmark.py.

#!/usr/bin/python
#-*-Coding:utf8-*-
# input data
From tensorflow.examples.tutorials.mnist import Input_data
Mnist = Input_data.read_data_sets ("/tmp/data/", One_hot=true)
Import TensorFlow as TF
# define network Hyper-parameters
Learning_rate = 0.001
Training_iters = 20000
Batch_size = 128
Display_step = 10
# define Network parameters
N_input = 784 # input Dimension (img shape:28x28)
n_classes = 10 # marked dimension (0-9 digits)
Dropout = probability of 0.75 # dropout, output possibility
# placeholder Input
x = Tf.placeholder (Tf.float32, [None, N_input])
y = Tf.placeholder (Tf.float32, [None, n_classes])
Keep_prob = Tf.placeholder (tf.float32) #dropout
# convolution operations
def conv2d (name, X, W, B,strides=1):
x = tf.nn.conv2d (x, W, strides=[1, strides, strides, 1], padding= ' same ')
x = Tf.nn.bias_add (x, B)
return Tf.nn.relu (x, Name=name) #用relu激活函数
# Maximum down-sampling operation
def maxpool2d (name, X, k=2):
return Tf.nn.max_pool (x, ksize=[1, K, K, 1], strides=[1, K, K, 1], padding= ' same ', name=name)
# Normalization Operations
def norm (name, L_input, lsize=4):
Return Tf.nn.lrn (L_input, lsize, bias=1.0, alpha=0.001/9.0, beta=0.75, Name=name)
# all the network parameters
weights = {
' WC1 ': TF. Variable (Tf.random_normal ([11, 11, 1, 96])),
' WC2 ': TF. Variable (Tf.random_normal ([5, 5, 96, 256])),
' WC3 ': TF. Variable (Tf.random_normal ([3, 3, 256, 384])),
' Wc4 ': TF. Variable (Tf.random_normal ([3, 3, 384, 384])),
' Wc5 ': TF. Variable (Tf.random_normal ([3, 3, 384, 256])),
' Wd1 ': TF. Variable (Tf.random_normal ([4*4*256, 4096])),
' Wd2 ': TF. Variable (Tf.random_normal ([4096, 4096])),
' Out ': TF. Variable (Tf.random_normal ([4096, 10])
}
biases = {
' BC1 ': TF. Variable (Tf.random_normal ([96])),
' BC2 ': TF. Variable (Tf.random_normal ([256])),
' BC3 ': TF. Variable (Tf.random_normal ([384])),
' Bc4 ': TF. Variable (Tf.random_normal ([384])),
' Bc5 ': TF. Variable (Tf.random_normal ([256])),
' Bd1 ': TF. Variable (Tf.random_normal ([4096])),
' Bd2 ': TF. Variable (Tf.random_normal ([4096])),
' Out ': TF. Variable (Tf.random_normal ([n_classes]))
}
# define Alexnet Network model
def alex_net (x, Weights, biases, dropout):
# Vector to Matrix
x = Tf.reshape (x, Shape=[-1, 28, 28, 1])
# The first convolutional layer
# convolution
CONV1 = conv2d (' Conv1 ', X, weights[' WC1 '], biases[' BC1 ')
# maximum pooling (down sampling)
Pool1 = Max_pool (' pool1 ', CONV1, k=2)
# Normalization (normalized)
Norm1 = Norm (' Norm1 ', Pool1, lsize=4)
# second convolutional layer
# convolution
Conv2 = conv2d (' Conv2 ', conv1, weights[' WC2 '], biases[' BC2 ')
# maximum pooling (down sampling)
Pool2 = Max_pool (' pool2 ', Conv2, k=2)
# Normalization (normalized)
Norm2 = Norm (' Norm2 ', pool2, lsize=4)
# Third convolutional layer
# convolution
Conv3 = conv2d (' Conv3 ', conv2, weights[' wc3 '], biases[' BC3 ')
# maximum pooling (down sampling)
Pool3 = Max_pool (' pool3 ', Conv3, k=2)
# Normalization (normalized)
NORM3 = Norm (' norm3 ', Pool3, lsize=4)
# Fourth convolutional layer
Conv4 = conv2d (' conv4 ', conv3, weights[' wc4 '], biases[' Bc4 ')
# maximum pooling (down sampling)
Pool4 = Max_pool (' Pool4 ', conv4, k=2)
# Normalization (normalized)
NORM4 = Norm (' norm4 ', Pool4, lsize=4)
# Five convolution layer
CONV5 = conv2d (' conv5 ', conv4, weights[' wc5 '], biases[' Bc5 ')
# maximum pooling (down sampling)
POOL5 = Max_pool (' Pool5 ', conv4, k=2)
# Normalization (normalized)
NORM5 = Norm (' NORM5 ', POOL5, lsize=4)
# first fully connected layer, first turn feature map to Vector
FC1 = Tf.reshape (NORM5, [-1, weights[' Wd1 '].get_shape (). As_list () [0]])
FC1 = Tf.add (Tf.matmul (FC1, weights[' wd1 ']), biases[' BD1 '])
FC1 = Tf.nn.relu (FC1)
# dropout
FC1 = Tf.nn.dropout (FC1, dropout)
# Second full connection layer
FC2 = Tf.reshape (FC1, [-1, weights[' Wd1 '].get_shape (). As_list () [0]])
FC2 = Tf.add (Tf.matmul (FC2, weights[' wd1 ']), biases[' BD1 '])
FC2 = Tf.nn.relu (FC2)
# dropout
FC2 = Tf.nn.dropout (FC2, dropout)
# Network Output Layer
out = Tf.add (Tf.matmul (FC2, weights["out"), biases[' out ')
Return out
# Build a model
pred = alex_net (x, Weights, biases, keep_prob)
# define loss function, optimizer (learning Step)
Cost = Tf.reduce_mean (Tf.nn.softmax_cross_entropy_with_logits (logits=pred, labels=y))
Optimizer = Tf.train.AdamOptimizer (learning_rate=learning_rate). Minimize (Cost)
# evaluation function
correct_pred = Tf.equal (Tf.argmax (pred, 1), Tf.argmax (Y, 1))
accuracy = Tf.reduce_mean (Tf.cast (correct_pred, Tf.float32))
# Training models and evaluation models
# Initialize all shared variables
init = Tf.global_variables_initializer ()
# Start a workout
With TF. Session () as Sess:
Sess.run (INIT)
Step = 1
# Start training until you reach Training_iters, which is 200000
While step * Batch_size < training_iters:
# Get Batch Data
batch_x, batch_y = Mnist.train.next_batch (batch_size)
Sess.run (Optimizer, feed_dict={x:batch_x, y:batch_y, keep_prob:dropout})
If step% Display_step = = 0:
# Calculate loss value and accuracy, output
Loss, acc = Sess.run ([cost, accuracy], feed_dict={x:batch_x, Y:batch_y, Keep_prob:1.})
Print "Iter" + str (step*batch_size) + ", Minibatch loss=" + "{:. 6f}". Format (Loss) + ", Training accuracy=" + "{:. 5f}". f Ormat (ACC)
Step + = 1
Print "Optimization finished!"
# Calculate Test Accuracy
Print "Testing accuracy:", sess.run (accuracy, feed_dict={x:mnist.test.images[:256], y:mnist.test.labels[:256], Keep_ Prob:1.})


Resources:
"TensorFlow Technical Analysis and actual combat"

Welcome to Shanghai Machine learning jobs, my: Qingxingfengzi

Learning Note TF052: convolutional networks, neural network development, alexnet TensorFlow implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.