Pattern Recognition field Application machine learning scene is very many, handwriting recognition is one of the most simple digital recognition is a multi-class classification problem, we take this multi-class classification problem to introduce Google's latest open source TensorFlow framework, The content behind the deep learning will be presented and demonstrated based on TensorFlow.
Please respect original, reprint please indicate source website www.shareditor.com and original link address
What is TensorFlow
Tensor means the tensor, flow is a stream.
Tensor is originally a term in mechanics, indicating the stress state of each point in an elastic medium. In mathematics, tensor represents a generalized "quantity", the 0 order tensor is a scalar (for example: 0, 1, 2 ...), 1 order tensor is a vector (for example: (1,3,4)), 2-order tensor is a matrix, originally these forms are irrelevant, but all belong to the tensor, is because they satisfy some characteristics at the same time: 1) can be expressed in a coordinate system, 2) observe the same transformation law in the coordinate transformation, 3) have the same basic operation (such as: add, subtract, multiply, divide, scale, dot product, symmetry ...)
Then TensorFlow can be understood as a framework for handling tensor in the form of "flow", developed by Google and Open source, that has been applied to the development of Google brain projects
TensorFlow Installation
sudo pip install HTTPS://STORAGE.GOOGLEAPIS.COM/TENSORFLOW/MAC/TENSORFLOW-0.9.0-PY2-NONE-ANY.WHL
Different platforms to find the corresponding WHL package
Problems you may encounter:
Found unable to import TensorFlow, the problem is that the PROTOBUF version is not correct, you must uninstall, and then install TensorFlow, which will automatically install version 3.0 Protobuf
--upgrade HTTPS://STORAGE.GOOGLEAPIS.COM/TENSORFLOW/MAC/TENSORFLOW-0.9.0-PY2-NONE-ANY.WHL
Handwritten numeric data set acquisition
http://yann.lecun.com/exdb/mnist/can download handwritten datasets, http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz and http:/ /yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz, downloaded after decompression found not picture format, but its own specific format, in order to illustrate what kind of data, I wrote a program to show these numbers:
/************************ * author:shareditor * date:2016-08-02 * brief:read MNIST data ************************/#include<stdio.h>#include<stdint.h>#include<assert.h>#include<stdlib.h>unsignedChar*lables =NULL;/** * All the integers in the files is stored in the MSB first (high endian) format*/voidCopy_int (uint32_t *target, unsignedChar*src) { * (((unsignedChar*) target) +0) = src[3]; * (((unsignedChar*) target) +1) = src[2]; * (((unsignedChar*) target) +2) = src[1]; * (((unsignedChar*) target) +3) = src[0];}intRead_lables () {FILE*FP = fopen ("./train-labels-idx1-ubyte","R"); if(NULL = =FP) { return-1; } unsignedCharhead[8]; Fread (Head,sizeof(unsignedChar),8, FP); uint32_t Magic_number=0; uint32_t Item_num=0; Copy_int (&magic_number, &head[0]); //Magic Number CheckASSERT (Magic_number = =2049); Copy_int (&item_num, &head[4]); uint64_t values_size=sizeof(unsignedChar) *Item_num; Lables= (unsignedChar*) malloc (values_size); Fread (Lables,sizeof(unsignedChar), Values_size, FP); Fclose (FP); return 0;}intread_images () {FILE*FP = fopen ("./train-images-idx3-ubyte","R"); if(NULL = =FP) { return-1; } unsignedCharhead[ -]; Fread (Head,sizeof(unsignedChar), -, FP); uint32_t Magic_number=0; uint32_t Images_num=0; uint32_t rows=0; uint32_t cols=0; Copy_int (&magic_number, &head[0]); //Magic Number CheckASSERT (Magic_number = =2051); Copy_int (&images_num, &head[4]); Copy_int (&rows, &head[8]); Copy_int (&cols, &head[ A]); uint64_t image_size= Rows *cols; uint64_t values_size=sizeof(unsignedChar) * Images_num * rows *cols; unsignedChar*values = (unsignedChar*) malloc (values_size); Fread (values,sizeof(unsignedChar), Values_size, FP); for(intImage_index =0; Image_index < Images_num; image_index++) { //Print the labelprintf"=========================================%d ======================================\n", Lables[image_index]); for(intRow_index =0; Row_index < rows; row_index++) { for(intCol_index =0; Col_index < cols; col_index++) { //print the pixels of imageprintf"%3d", values[image_index*image_size+row_index*cols+Col_index]); } printf ("\ n"); } printf ("\ n"); } free (values); Fclose (FP); return 0;}intMainintargcChar*argv[]) { if(-1==Read_lables ()) { return-1; } if(-1==read_images ()) { return-1; } return 0;}
Download and extract the dataset files Train-images-idx3-ubyte and train-labels-idx1-ubyte into the directory where the source code is located, compile and execute:
gcc-o read_images read_images.c. /read_images
The results shown are as follows:
A total of 60,000 pictures, from the code can be seen in the data set is stored in the actual image of the pixel
Softmax model
We introduced the logistic regression model in machine learning Tutorial 13-using Scikit-learn for logistic regression. Logistic regression is used to solve the two class classification problem (using the sigmoid function), while the Softmax model is an extension of the logistic regression model to solve the multi-class classification problem.
Softmax meaning for the soft maximum, that is, if a ZJ is greater than the other z, then the component of this mapping is approximated to 1, the other components are approximated to 0, so that it belongs to this classification, multiple components corresponding to the multi-classification, the mathematical form and sigmoid are different, as follows:
It is characterized by that all softmax Plus is 1, in fact it represents a probability, that is, the probability that x belongs to a classification.
When doing the sample training, the XI calculation method here is:
where W is the weight of the sample feature, XJ is the characteristic value of the sample, and bi is the offset.
In detail: Suppose we design two features in a model training, their values are F1 and F2, and their weights for Class I are 0.2 and 0.8, the offset is 1, then
Xi=f1*0.2+f2*0.8+1
If all categories calculate the value of x, if it is a well-trained model, then it should be the Softmax value that corresponds to the category that belongs to the largest
The Softmax regression algorithm is based on this principle, using a large number of samples to train the W and B, which is used for classification
Advantages of TensorFlow
TensorFlow uses an external language to compute complex operations to improve efficiency, but switching between different languages and data transfer between different computing resources consumes a lot of resources, so it uses diagrams to describe a series of computation operations, which are then passed on to external computations, and the final result is returned only once, so that the transfer cost is minimal. Highest computational efficiency
As an example:
Import= Tf.placeholder (Tf.float32, [None, 784])
Here x is not an actual x, but a placeholder, that is, a description, described as a two-dimensional floating point, followed by the actual value to be populated, which is similar to printf ("%d", 10) in the placeholder%d, where the first dimension is none means infinitely expandable, The second dimension is a 784 floating-point variable
If you want to define a modifiable tensor, you can define this:
W = tf. Variable (Tf.zeros ([784,10= tf. Variable (Tf.zeros ([10]))
Where the dimension of w is [784, the shape of 10],b is [10]
With these three variables, we can define our Softmax model:
y = tf.nn.softmax(tf.matmul(x,W) + b)
This is defined, but there is no real calculation, because this is just a diagram to describe the calculation operation
Where Matmul is the matrix multiplication, because the dimension of x is [None, the dimension of 784],w is [784, 10], so the matrix multiplication is [None, 10], which can be added to the vector b
The Softmax function calculates the probability value of a 10-dimensional component, which is the shape of y [10].
Realization of digital recognition model
Based on the above defined X, W, B, and the model we define:
y = Tf.nn.softmax (Tf.matmul (x,w) + b)
We need to define our objective function, which we use as the objective function of the cross-entropy (measuring the inefficiency of the prediction used to describe the truth), to minimize it:
Where y ' is the actual distribution, Y is the predicted distribution, i.e.:
Y_ = Tf.placeholder ("float", [none,10=-tf.reduce_sum (Y_*tf.log (y))
The gradient descent method is used to optimize the variable defined above:
Train_step = Tf.train.GradientDescentOptimizer (0.01). Minimize (Cross_entropy)
0.01 is the learning rate, that is, each time the variable to a large revision
According to the above idea, the final implementation of the Code digital_recognition.py (placed in the folder Mnist_data the top level of the directory) is as follows:
#Coding:utf-8Importsysreload (SYS) sys.setdefaultencoding ("Utf-8" ) fromTensorflow.examples.tutorials.mnistImportInput_dataImportTensorFlow as Tfflags=Tf.app.flagsFLAGS=flags. Flagsflags.define_string ('Data_dir','mnist_data/','Directory for storing data') Mnist= Input_data.read_data_sets (Flags.data_dir, one_hot=True) x= Tf.placeholder (Tf.float32, [None, 784]) W= TF. Variable (Tf.zeros ([784,10])) b= TF. Variable (Tf.zeros ([10])) y= Tf.nn.softmax (Tf.matmul (x,w) +b) Y_= Tf.placeholder ("float", [none,10]) cross_entropy=-tf.reduce_sum (y_*Tf.log (y)) Train_step= Tf.train.GradientDescentOptimizer (0.01). Minimize (cross_entropy) init=tf.initialize_all_variables () sess=TF. InteractiveSession () sess.run (init) forIinchRange (1000): Batch_xs, Batch_ys= Mnist.train.next_batch (100) Sess.run (Train_step, Feed_dict={x:batch_xs, y_: Batch_ys}) correct_prediction= Tf.equal (Tf.argmax (y, 1), Tf.argmax (Y_, 1)) Accuracy=Tf.reduce_mean (Tf.cast (correct_prediction, tf.float32))Print(Accuracy.eval ({x:mnist.test.images, y_: Mnist.test.labels}))
The results are as follows:
[[email protected] $] python digital_recognition.pyextracting. /train-images-idx3-ubyte.gzextracting. /train-labels-idx1-ubyte.gzextracting. /t10k-images-idx3-ubyte.gzextracting. /t10k-labels-idx1-ubyte.gz0.9039
Explain
Flags. Define_string ('data_dir'mnist_data/ ' Directory for storing data')
Indicates that we use Mnist_data's top level directory as a storage directory for training data, and if we do not have good training data and test data in advance, the program will automatically download it for us.
Mnist = Input_data.read_data_sets (Flags.data_dir, One_hot=true)
This is a direct use of the library to help us achieve a good way to read training data, no need to resolve their own
for in range: = Mnist.train.next_batch (+) sess.run (train_step, feed_dict={ X:batch_xs, Y_: Batch_ys})
These lines indicate that we loop 1000 times, each time we select 100 samples from the training sample to do the training, so we can modify the configuration to observe the running speed.
The last few lines print forecast accuracy, and when you adjust the number of cycles you can see that the more samples you have, the higher the accuracy.
Original link: http://www.shareditor.com/blogshow/?blogId=94
"Turn" machine learning Tutorial 14-handwritten numeral recognition using TensorFlow