Introduction to tensorflow: simple neural network training method, tensorflow Neural Network

Source: Internet
Author: User

Introduction to tensorflow: simple neural network training method, tensorflow Neural Network

I have been learning tensorflow for the past few days. I will make a learning record first.

1. Steps for solving the problem through neural networks:

1. Extract the feature vector of the entity as the input of the neural network. That is to say, we need to perform Feature Engineering on the dataset and then know the feature dimensions of each sample to define the number of input neurons.

2. define the structure of the neural network and how to get the output from the input of the neural network. That is to say, the input layer, the hidden layer, and the output layer are defined.

3. adjust the parameter values in the neural network through training data, which is the process of training the neural network. In general, the loss function of the model and parameter optimization methods should be defined, such as cross entropy loss function and gradient descent method optimization.

4. Use the trained model to predict unknown data. That is, to evaluate the quality of the model.

2. Training a simple Forward Propagation Neural Network

The trained neural model is the simplest class and linear (that is to say, there is no reverse propagation process without the activation function going linearly, it just briefly describes the workflow of neural networks.

Import tensorflow as tf # defines the hidden layer parameters. Each w variable is a tensor (which can be considered as an array of n * m. n indicates the number of nodes on the previous layer, and m indicates the number of nodes on the current layer) indicates the connection weight between the previous layer and the current layer. The weights w1 = tf are randomly defined here. variable (tf. random_normal ([2, 3], stddev = 1) w2 = tf. variable (tf. random_normal ([3, 1], stddev = 1) # defines the place where the input data is stored, that is, the x vector. Here, shape indicates the number of samples passed in for training, the dimension size x = tf for each input and output sample. placeholder (tf. float32, shape = (None, 2), name = "input") # matrix multiplication a = tf. matmul (x, w1) y = tf. matmul (a, w2) with tf. session () as sess: # The new version does not seem to be able to use this function to initialize all variables. init_op = tf. initialize_all_variables () sess. run (init_op) # feed_dict is used to input parameters to x in y. If three parameters are input here, y outputs a tensor print (sess. run (y, feed_dict = {x: [[0.7, 0.9], [1.0, 1.5], [2.1, 2.3]})

So far, a single linear neural network is defined after each dimension of x is multiplied by two weights.

3. Define loss functions and reverse Propagation Algorithms

With the above foundation, we can define the loss function and the inverse propagation algorithm to fit the data. For non-linear data, we can define the activation function to delinearity. There are also some details about the learning rate. This time we use the dynamic learning rate. First, we set the learning rate to a relatively large value to accelerate convergence. Then, as the number of iterations increases, the learning rate keeps decreasing, avoid missing the local minimum value. Another problem is preventing overfitting. Generally, there are two methods for neural networks to prevent overfitting. One is regularization and the other is dropout. We will not discuss the latter for the moment.

Loss function: cross entropy

Inverse propagation algorithm: Gradient Descent

Activation function: relu

#-*-Coding: UTF-8-*-"Created on Fri Aug 18 14:02:19 2017 @ author: osT" import tensorflow as tf import numpy as your ', dtype = 'float', delimiter = ',') # convert the sample label into a unique hot code def label_change (before_label): label_num = len (before_label) change_arr = np. zeros (label_num, 33) for I in range (label_num): # The sample label is originally 0-32, I accidentally marked 32 as 33 if before_label [I] = 33.0: change_arr [I, int (before_label [I]-1)] = 1 else: change_arr [I, int (before_label [I])] = 1 return change_arr # defines the input and output nodes of the neural network. Each sample is 1*315 dimensions, and the output classification result INPUT_NODE = 315OUTPUT_NODE = 33 # defines two layers of hidden neural networks, one layer of 300 nodes, and the other layer of 100 nodes LAYER1_NODE = 300LAYER2_NODE = 100 # defines the learning rate and the learning rate attenuation rate, regularization coefficient, number of training adjustment parameters, and smooth attenuation RATE LEARNING_RATE_BASE = 0.5LEARNING _ RATE_DECAY = 0.99REGULARIZATION _ RATE = 0.0001TRAINING _ STEPS = 2000MOVING_AVERAGE_DECAY = 0.99 # define the structure of the entire neural network, that is, the process of Forward propagation. avg_class is a smooth and trainable class. If it is not passed in, the smooth def inference (input_tensor, avg_class, w1, b1, w2, b2, w3, b3) is not used): if avg_class = None: # the first layer of the hidden layer, input and weight matrix multiplication plus constants passed into the activation function as the output layer1 = tf. nn. relu (tf. matmul (input_tensor, w1) + b1) # Layer 2 hidden layer. After multiplying the output and weight matrix of the first layer, a constant is added as the output layer2 = tf. nn. relu (tf. matmul (layer1, w2) + b2) # return the return tf by adding constants to the hidden layer and weight matrix of the second layer. matmul (layer2, w3) + b3 else: # avg_class.average () smooth training variable, that is, the weight of each layer and the previous layer layer1 = tf. nn. relu (tf. matmul (input_tensor, avg_class.average (w1) + avg_class.average (b1) layer2 = tf. nn. relu (tf. matmul (layer1, avg_class.average (w2) + avg_class.average (b2) return tf. matmul (layer2, avg_class.average (w3) + avg_class.average (b3) def train (data): # mixed data np. random. shuffle (data) # Take 850 samples for training, followed by all test samples, about 250 data_train_x = data [: 850,: 315] data_train_y = label_change (data [: 850,-1]) data_test_x = data [850:,: 315] data_test_y = label_change (data [850:,-1]) # define the output data location, none indicates the number of training samples input at a time. y _ is the place where the sample tag is stored. x = tf. placeholder (tf. float32, shape = [None, INPUT_NODE], name = 'x-input') y _ = tf. placeholder (tf. float32, shape = [None, OUTPUT_NODE], name = 'Y-input') # define the weights of each layer and the previous layer in sequence. Here, random numbers are used for initialization, note the relation w1 = tf of shape. variable (tf. truncated_normal (shape = [INPUT_NODE, LAYER1_NODE], stddev = 0.1) b1 = tf. variable (tf. constant (0.1, shape = [LAYER1_NODE]) w2 = tf. variable (tf. truncated_normal (shape = [LAYER1_NODE, LAYER2_NODE], stddev = 0.1) b2 = tf. variable (tf. constant (0.1, shape = [LAYER2_NODE]) w3 = tf. variable (tf. truncated_normal (shape = [LAYER2_NODE, OUTPUT_NODE], stddev = 0.1) b3 = tf. variable (tf. constant (0.1, shape = [OUTPUT_NODE]) # output Forward Propagation Result y = inference (x, None, w1, b1, w2, b2, w3, b3) # The variable global_step = tf is added after each training. variable (0, trainable = False) # class defining smooth variables. The input is a smooth attenuation rate and global_stop, so that the smooth process variable_averages = tf is used after each training. train. exponentialMovingAverage (MOVING_AVERAGE_DECAY, global_step) # applies smoothly to all trainable variables, that is, the variable variable_averages_op = variable_averages.apply (tf. trainable_variables () # predicted value after smooth output: average_y = inference (x, variable_averages, w1, b1, w2, b2, w3, b3) # defines the cross entropy and loss functions, but why is the arg_max () of the label passed in, which is the subscript of the corresponding category? Let's talk about cross_entropy = tf later. nn. sparse_softmax_cross_entropy_with_logits (logits = y, labels = tf. arg_max (y _, 1) # calculate the average cross entropy, that is, the average cross_entrip_mean = tf for all training samples in this round of training. performance_mean (cross_entropy) # defines the regularization weight and adds the cross entropy to it as the loss function regularizer = tf. contrib. layers. l2_regularizer (REGULARIZATION_RATE) regularization = regularizer (w1) + regularizer (w2) + regularizer (w3) loss = cross_entrip_mean + regularizer # defines the dynamic learning rate, learning_rate = tf. train. exponential_decay (LEARNING_RATE_BASE, global_step, 900, LEARNING_RATE_DECAY) # defines the backward propagation algorithm and gradient descent. Note that global_step train_step = tf will be input in the minimize. train. gradientDescentOptimizer (learning_rate ). minimize (loss, global_step = global_step) # manage the variables to be updated. The input parameter is the train_op = tf process that contains the variables to be trained. group (train_step, variable_averages_op) # correct_prediction = tf. equal (tf. arg_max (average_y, 1), tf. arg_max (y _, 1) accuracy = tf. performance_mean (tf. cast (correct_prediction, tf. float32) with tf. session () as sess: # Start all variables tf. global_variables_initializer (). run () # training set input dictionary validate_feed = {x: data_train_x, y _: data_train_y} # test set input dictionary test_feed = {x: data_test_x, y _: data_test_y} for I in range (TRAINING_STEPS): if I % 1000 = 0: validate_acc = sess. run (accuracy, feed_dict = validate_feed) print ("After % d training step (s), validation accuracy using average model is % g" % (I, validate_acc )) # sess is unavailable for each round of training through the same training set because there are too few samples. run (train_op, feed_dict = validate_feed) # use the test set to view the model accuracy. test_acc = sess. run (accuracy, feed_dict = test_feed) print ("After % d training step (s), test accuracy using average model is % g" % (TRAINING_STEPS, test_acc) train (data)

Then let's take a look at why we need to input the subscripts of the sample when calculating the cross entropy:

First, we know that there are 33 output nodes. After multiplying them by the preceding weights, each node will have an output. Each output is considered to correspond to the probability of each class for the moment, the larger the value, the more we consider the sample as the corresponding class. The logits parameter is the direct output of the neural network, that is, the output without being processed by the softmax function. labels imports a single value, that is, the subscript corresponding to the classification, this is because we use the cross entropy function tf. nn. sparse_softmax_cross_entropy_with_logits. This function accelerates the calculation of a model with only one correct classification, and the input of this labels is "This correct classification". corresponding to the output node is its subscript. We also have a non-accelerated cross-entropy function: tf. nn. softmax_cross_entropy_with_logis (logis =, labels =). At this time, we should pass in our own labels.

Finally, let's summarize the methods to improve the accuracy of the model:

1. Use the activation function. This step is almost necessary, that is, delinearity.
2. Add a hidden layer. In this example, a single hidden layer has 300 nodes with an accuracy of about 89%. A single hidden layer has 400 nodes with an accuracy of about 93%. The double hidden layer has 300 nodes and 100 nodes, the accuracy is about 94%. However, adding a hidden layer means increasing the training time.
3. Use dynamic learning rate. This not only accelerates the training speed, but also increases the probability that the neural network converges to a lower minimum value, thus increasing the accuracy.
4. Use a smooth model. It mainly increases the robustness of the model to make it more general.
5. Add regularization or use dropout to prevent overfitting.

Attach training set

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.