tags (space delimited): Wang Cao TensorFlow notes
Note-taker: Wang Grass
Note Finishing Time February 24, 2017
TensorFlow official English document address: Https://www.tensorflow.org/get_started/mnist/beginners
Official documents When this article was compiled last updated: February 15, 2017 1. Case Background
This article is followed by the second tutorial of the official TensorFlow document – Identifying handwritten numbers.
Mnist is a simple computer vision dataset that consists of a series of handwritten digital images, such as:
In the dataset, each picture will have a label labeled to indicate what the number is on that picture. For example, the above image corresponding to the label is: 5,0,4,1
For starters, why do you want to introduce this case? To give a chestnut, when we learn to write a program, the first sentence is "Hello world". So mnist is compared to machine learning, just as "Hello world" is relative to the program
This article is about training a model that allows the model to predict what numbers are written according to the input images. But the purpose of this article is not to teach you to train a perfect model with super excellence (this is given in a later document), but just to build a simple model (Softmax regression) to get everyone to taste tensorflow.
Although it is a matter of a few lines of code to complete this model, it is important to understand the principles behind the TensorFlow operation and the core machine learning concept of tensorflow. Therefore, the whole process and principle will be explained in detail in the next. 2. Data acquisition: The MNIST
mnist datasets can be downloaded to the site http://yann.lecun.com/exdb/mnist/.
But in TensorFlow, in order to facilitate the learner to obtain this data, encapsulated a method, we only need to call this method, the program will automatically download and get the data set. The code is as follows:
Import Input_data This class
from tensorflow.examples.tutorials.mnist import input_data
//Call Read_data_sets This method from this class
mnist = input_data.read_data_sets ("mnist_data/", One_hot=true)
There are 3 parts to the data set that are obtained:
(1) 55,000 training samples (to train the model)
(2) 10,000 test samples (used to test the model to avoid overfitting)
(3) 5,000 verification samples (to verify the hyper-parameters)
(typically in machine learning modeling, you need to prepare these 3 types of datasets.) )
For each sample point, two pieces of data are composed:
(1) Handwritten digital pictures, remember to do X
(2) Label, remember to do y
Handwritten digital images are made up of 28*28 pixels and can be converted into 28*28 arrays of numbers. Here's an example:
And this 28*28 array can be flattened into a vector of 1*784. As for the flattening of the way it doesn't matter, just make sure all the pictures are paved in the same way. So, a picture can be represented by a vector of 1*784.
You may ask, will the picture be flattened from the two-dimensional space, would it lose some information and bring about a bad effect? You can rest assured that the best computer image processing methods are now using this structure, which will be discussed later in the tutorial.
Well, if you run the above two lines of code successfully, then the data has been downloaded, you can directly call Mnist.train.images, get the training data set of the picture, which is a size of [55000, 784] tensor, the first dimension is 55000 pictures, The second dimension is 784 pixels. The intensity of the pixel is expressed as a number between 0-1.
Also said above, each picture will correspond to a label, indicating the corresponding handwritten number on this picture.
As required, we need to convert these 0-9 digital labels to one-hot encoding, what is one-hot encoding, for example, as the original tag is 3, after the encoding becomes the [0,0,0,1,0,0,0,0,0,0], that is, the generation of a 1*10 vector, The number corresponding to 0-9 in 3 corresponds to the position of 1, the rest of the position is 0.
After this encoding, the label data of the training set becomes an array of [55000,10] floating-point types.
NN, the data are downloaded, and we also know the specific format and content of the data, then, let's start building a model Bar ~ ~ 3. Establish a multi-classification model: Softmax regressions 3.1 Softmax regressions principle
Because our goal is to distinguish 0-9 of the numbers, that is, to classify the images in these 10 classes, belong to the multivariate classification model. For a picture, we want the model to get the probability of belonging to these 10 categories, for example, if the model determines that a picture belongs to 9 of the probability of 80%, 8 of the probability is 5%, the probability of belonging to other numbers are very small, then the last picture should be attributed to the category 9.
Softmax regressions is a very classic method for multi-classification models. Even the more complex models described later in the notes, their last layer would call the Softmax method.
A Softmax regressions has 2 main steps:
(1) Add the evidence that the input data belongs to a category, respectively
(2) Convert this evidence into probabilities
OK, so first this evidence is a god horse East, is actually a linear model, by the weight of W, and paranoid B composition:
I represents the Class I, J represents the input of the image of the J-pixel, that is, to multiply each pixel by its weight of W, in addition to the sum of the paranoid.
After finding out the evidence, it is necessary to use the Softmax function to convert it into probabilities.
The Softmax here is actually equivalent to an activation function or a join function that converts the output result into the form we want (here, a probability distribution that is converted to 10 categories). So what is the function of this softmax process? The following formula:
Expand the formula above:
That is, the just linear output evidence as input x in the Softmax function, advanced over a power function, and then do the normality, so that all the probability of adding equal to 1.
The process of Softmax regression is drawn as follows:
If you write a formula, it is as follows:
The above formula is modified to form a matrix and a vector:
If you want to be simple and intuitive, then this is:
At this point, we know the multivariate classification Softmax regressions calculation principle, then you can try to use TensorFlow to achieve Softmax regressions ~ 2.2 TensorFlow Implementation Softmax Regression
1. To use TensorFlow, implement the Import TensorFlow library:
Import TensorFlow as TF
2. Create placeholders for input data x
x = Tf.placeholder (Tf.float32, [None, 784])
The x here is not a specific value, but a placeholder, that is, to enter the data to occupy a location, and so seriously let TensorFlow run the calculation, and then pass in the real data of X. Because our input data n is a vector of 1*784, can be represented as a 2-layer tensor, the size is [none,784],none means that the data transmitted after that time can be any length, that is, can be any number of sample points.
3. Create two weight variables
W and B are continuously optimized during the training process, using variable to create:
W = tf. Variable (Tf.zeros ([784, ten]))
B = tf. Variable (Tf.zeros ([10]))
Above, we initialize two weights are 0, in the subsequent training and learning will continue to be optimized to other values. Note that the size of W is [784,10] that represents 784 pixel input points multiplied by 10-dimensional vectors (10 categories). The size of B is [10]
4. Building a Softmax model
y = Tf.nn.softmax (Tf.matmul (x, W) + b)
As the above code shows, we first multiply X with W, then we add B, and then we convert the linear output to Softmax.
Well, this Softmax's model is ready to be written. 4. Model Training
1. Create a loss function
Above, we set up an initial Softmax model, noting that the parameter w,b in the model is freely defined by itself. The goal of the training model is to allow the model to continuously optimize the parameters in the course of learning samples, making the model perform best.
So how to evaluate the performance of the model is good or bad. In machine learning, we generally use the loss function to evaluate the quality of the model. If the model predicts that the result is farther from the real result, the loss is greater and the model behaves poorly. Therefore, we are eager to minimize the loss and get the optimal model.
Here is an introduction to the most common loss function: Cross entropy loss. The formula is as follows:
Y represents the probability distribution that the model predicts, and Y ' represents the real category probability distribution (that is, the label after the One-hot encoding). Yi represents the probability of predicting the class I, Yi ' denotes the probability that the real class I belongs to (only Class I is 1 and the remainder is 0)
Cross entropy can measure the fitting degree of the model to the real situation in a certain sense, the greater the cross entropy, the less fitting the model, the worse the expressive force.
To implement the cross-entropy function, the code is as follows:
Add placeholders for real labels
y_ = Tf.placeholder (Tf.float32, [None, ten])
//create cross-entropy function
cross_entropy = Tf.reduce_mean (- Tf.reduce_sum (Y_ * tf.log (y), reduction_indices=[1]))
To explain the above code, Tf.log represents each element of Y as a log operation, and then multiplies it by the element in the category Y_ of the corresponding real tag. Tf.reduce_sum represents the sum of the values that are indexed to 1. Tf.reduce_mean represents the mean value of cross entropy for all samples.
Note that in the source code, we do not use this formula, because the calculation goes on the numerical instability. Instead, the tf.nn.softmax_cross_entropy_with_gogits is applied directly after the linear function (i.e., not individually through the Softmax function), which is more stable.
2. Optimizing parameters using BP algorithm
The loss function is our objective function, and to find the minimum value of the objective function, the parameter is biased to equal to 0. We can use the optimizer to continuously reduce the loss to find the optimal parameters. For example, the gradient descent method is most commonly used. The code is as follows:
Train_step = Tf.train.GradientDescentOptimizer (0.5). Minimize (Cross_entropy)
The above uses the gradient descent method to minimize the cross-entropy loss, the learning rate is set to 0.5, each iteration, w,b two kinds of parameters will change, until the loss is minimized, the optimal w,b is obtained.
But TensorFlow also offers a number of other optimizer (follow-up introduction)
3. Running iterations
The graph of the model training is basically done, and now we can initialize the variables, create the session, and cycle through the training.
Create session
Sess = tf. InteractiveSession ()
//Initialize all variables
tf.global_variables_initializer (). Run ()
//Cycle 1000 Training model for
_ in Range:
# Get training set and tag set, get 100 samples each time
batch_xs, Batch_ys = mnist.train.next_batch
# feed data, training
Sess.run (Train_step, Feed_dict={x:batch_xs, Y_: Batch_ys})
In each cycle, a batch of samples is taken from the training set, each with 100 samples. It then runs the train_step operation and feeds the previous placeholder into the data.
You might find it strange why you don't take all the training sets to do the cycle training, why do you randomly take 100 samples at a time? The stochastic gradient descent method is applied here. The "Gradient descent" method, which uses all the samples directly to make a cyclic iteration, is more desirable, but will greatly increase the cost of the calculation, while the "random gradient descent" method reduces the computational volume and also maintains a relatively consistent accuracy rate. 5. Model Evaluation
Through the training of the model, we get the optimal parameters, but under this optimal parameter, the expression of the model is exactly how.
We can look at how much the model predicted on the test set has the same label as the actual label of the sample.
Tf.argmax returns an index of the maximum value in a tensor in a dimension, such as Tf.argmax (y,1), which is the index (corresponding to a category) of the maximum probability in the probability vectors of the 10 categories of the predicted output, Tf.argmax (y_,1) The index of the real category of the sample is taken out, and if the two are the same for a sample, the predictions for that sample are correct. The following code, returned with Tf.equal, is a list of Boolean types.
Correct_prediction = Tf.equal (Tf.argmax (y,1), Tf.argmax (y_,1))
The correct rate is required, as long as the list of Boolean types is summed and then the mean value can be:
accuracy = Tf.reduce_mean (Tf.cast (correct_prediction, Tf.float32))
Now we feed the test data set to run the accuracy rate on the calculated test set:
Print (Sess.run (accuracy, feed_dict={x:mnist.test.images, Y_: Mnist.test.labels}))
Print out about 92%.
This is a good result. Amount: In fact, there is no, and can say very bad.
The reason for the poor model is that we are using a very simple model, making some minor improvements that can reach 97% of the correct rate. The best model can achieve a 99.7 accuracy rate.
At this point, we introduce a complete example of using TensorFlow to train Softmax regression multivariate classification model.