When learning a new programming language, we always export "Hello word" as the beginning of learning the programming language, indicating that we have opened the door to this programming language. In the field of machine learning, recognizing handwritten numbers is like outputting "Hello word" as the gateway to machine learning.
I. Introduction of Mnist
MNIST: In the implementation of handwritten numeral recognition needs to use the handwritten number of pictures, is downloaded from MNIST. MNIST (Modified National Institute of Standards and technology) is a large database of handwritten numerals commonly used to train various image processing systems. Mnist divides the handwritten numeric data set into two parts, training sets, and test sets. The training set includes 60000 rows of training data, and the test set consists of 10000 rows of test data, each with a handwritten digital picture and its corresponding label. The size of each handwritten digital picture is 28*28 (length and width), which consists of a total of 784 pixel points.
second, the data preprocessing
Data preprocessing: In the process of machine learning training model, the preprocessing of data is very important, which refers to processing data directly after acquiring data from the data source. In general, when using machine learning to train a model, the original data needs to be processed and then used for model training. In the handwritten numeral recognition model, the input data is a picture, in order to simplify the processing, each picture of the 28*28 is replaced by a 784-dimensional vector, it is obvious in the process of conversion, have to discard the image of the two-dimensional structure information, if you want to discard the two-dimensional structure of the picture information, you can use convolution, In this example, simplified processing is done. In the model of handwritten numeral recognition, the ultimate goal is to correspond a picture to a number (0-9), which is to turn the problem into a multi-classification problem to deal with. In machine learning, for multi-classification problems, you can use Softmax regression for processing. In the Mnist dataset, Mnist.train.images is a tensor of shape [59999,784], the size of the first dimension represents the size of the training data in mnist, so the size of the first dimension is 0 to 59999,0 represents the first picture, Therefore the largest is to 59999. The second dimension represents the intensity value (between 0 and 1) of the 784 pixels of the image on the image, such as [0,0,.... 0.342,0.4232 ..... And, Mnist.train.labels is a [59999,10] tensor, the first dimension is the image subscript, the second dimension is that number, 10 means a 10-dimensional vector, such as 9 means [0,0,0,0,0,0,0,0,0,1].
three, Softmax function
Softmax is the extension of the logistic, logistic is mainly used to deal with two classification problems, when the result y is greater than 0.5 when the 1 class, less than 0.5 when the 0 class, while the Softmax is used to deal with multi-classification problems. The Softmax function is a generalization of a logical function that can "compress" a k-dimensional vector containing any real number into another k-dimensional real vector, the range of elements in the compacted k-dimensional real vector is between (0,1), and the sum of all elements is 1. The Softmax function is in the following form:
Here is an example of Wikipedia that illustrates the softmax of a simple image:
The value of the Softmax function corresponding to the input vector {\displaystyle [1,2,3,4,1,2,3]} is {\displaystyle [0.024,0.064,0.175,0.475,0.024,0.064,0.175]}. The item with the largest weight in the output vector corresponds to the maximum value "4" in the input vector. This also shows the general meaning of this function: normalization of vectors, highlighting the maximum values and suppressing other components that are far below the maximum value.
The following is sample code for function calculations using Python:
Import math
z = [1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0]
z_exp = [Math.exp (i) for i in z]
print (z_exp) # Result : [2.72, 7.39, 20.09, 54.6, 2.72, 7.39, 20.09]
sum_z_exp = SUM (z_exp)
print (sum_z_exp) # result:114.98
s Oftmax = [Round (I/sum_z_exp, 3) for I in Z_exp]
print (Softmax) # Result: [0.024, 0.064, 0.175, 0.475, 0.024, 0.0 64, 0.175]
Four, cross-entropy cost function
In the previous blog when using TensorFlow to design a linear regression model, there is an introduction to the square difference cost function, the square difference cost function is also a more commonly used function. However, one drawback is that the descent is too slow, leading to the need to iterate many times to get close to the desired result. To give an example, for example, we do a certain topic in the exam, when the test curls down, we found that the problem is wrong, then we will not make the same mistake next time. And, the square difference cost function After many errors, it can be slowly corrected. The cross-entropy cost function has the advantage of guaranteeing the cost function of squared difference, and it can compensate the disadvantage of the cost function of the square difference well.
code realization of handwritten digit recognition
From tensorflow.examples.tutorials.mnist import input_data import TensorFlow as tf if __name__ = = "__main__": #定义输入变量 , using the placeholders in TensorFlow x = Tf.placeholder ("float", [none,784]) #定义模型的权重, initialized to 0 w = tf. Variable (Tf.zeros ([784,10])) #定义模型的偏置, initialized to 0 B = tf. Variable (Tf.zeros ([x])) #定义模型的输出, Y is a one-dimensional vector of size 10, representing the probability that this handwritten number picture is 0-9 y = Tf.nn.softmax (Tf.matmul (x,w) + b) #使用交 Fork entropy as a cost function Y_ = Tf.placeholder ("float", [none,10]) #计算交叉熵, Cross_entropy constant is positive, and the size of each number in one-dimensional y is between 0-1, so log (y) is less than 0 cross_en Tropy =-tf.reduce_sum (Y_ * tf.log (y)) #使用梯度下降来最小化交叉熵 train_step = Tf.train.GradientDescentOptimizer (0.01). Minimiz E (cross_entropy) #初始化Variable变量 init = Tf.initialize_all_variables () session = TF.
Session () session.run (init) #下载minist的手写数字的数据集 mnist = Input_data.read_data_sets ("mnist_data/", One_hot=true) For I in range (1000): # 100 data per Iteration Batch_xs,batch_ys = Mnist.train.next_batch (+) # Batch_xs For y = Tf.nn.softmax (tf.Input x # Batch_ys in Matmul (x,w) + B) is the actual output value in the Model Session.run (Train_step,feed_dict={x:batch_xs,y_:batch_ys}) # If the output of the model is the same as the actual output, then the prediction is correct, returning an array of type bool Correct_prediction = tf.equal (Tf.argmax (y,1), Tf.argmax (y_,1)) #将bool类型的 Number of assemblies for accuracy, such as [true,flase,true], accuracy rate is 2/3 accuracy = Tf.reduce_mean (Tf.cast (correct_prediction, "float")) print (SESSION.R Un (accuracy,feed_dict={x:mnist.test.images,y_:mnist.test.labels})) #0.9182