Tensorflow recognizes handwritten numbers and tensorflow recognizes handwritten numbers.
Tensorflow, as an open-source google Project, has now surpassed caffe and seems to be the most popular deep learning framework. It is true that the real existence of the Code is better felt when writing. This is different from caffe. caffe uses the configuration file to generate the network. The tensorflow environment is version 0.10. Note that statements in other versions may be incorrect. This is a compatibility issue between tensorflow versions.
You also need to install PIL: pip install Pillow
Image Format:
-Image standardization, which can be installed in a 20*20 pixel frame while retaining its aspect ratio.
-Images are concentrated in a 28 × 28 image.
-Sort pixels by column. The pixel value ranges from 0 to 255, indicating the background (white) and indicating the foreground (black ).
Create a .png file. The background is white and the handwritten font is black,
Below is the data test code, a two-layer convolutional neural network, and then save the model with save.
# Coding: UTF-8 import tensorflow as tf import numpy as np import matplotlib. pyplot as plt import input_data ''' to obtain the data ''' mnist = input_data.read_data_sets ("MNIST_data/", one_hot = True) training = mnist. train. images trainlable = mnist. train. labels testing = mnist. test. images testlabel = mnist. test. labels print ("MNIST loaded") # obtain the interactive mode sess = tf. interactiveSession () # initialization variable x = tf. placeholder ("float", shape = [None, 784]) y _ = tf. placeholder ("float", shape = [None, 10]) W = tf. variable (tf. zeros ([784, 10]) B = tf. variable (tf. zeros ([10]) ''' to generate a weighting function. shape is the data shape ''' def weight_variable (shape): initial = tf. truncated_normal (shape, stddev = 0.1) return tf. variable (initial) ''' generates paranoid items. The shape is the data shape ''' def bias_variable (shape): initial = tf. constant (0.1, shape = shape) return tf. variable (initial) def conv2d (x, W): return tf. nn. conv2d (x, W, strides = [1, 1, 1], padding = 'same') def max_pool_2x2 (x): return tf. nn. max_pool (x, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'same ') w_conv1 = weight_variable ([5, 5, 1, 32]) B _conv1 = bias_variable ([32]) x_image = tf. reshape (x, [-1, 28, 28, 1]) h_conv1 = tf. nn. relu (conv2d (x_image, W_conv1) + B _conv1) h_pool1 = random (h_conv1) W_conv2 = weight_variable ([5, 5, 32, 64]) B _conv2 = bias_variable ([64]) h_conv2 = tf. nn. relu (conv2d (h_pool1, W_conv2) + B _conv2) h_pool2 = reverse (h_conv2) W_fc1 = weight_variable ([7*7*64,102 4]) B _fc1 = bias_variable ([1024]) h_pool2_flat = tf. reshape (h_pool2, [-1, 7*7*64]) h_fc1 = tf. nn. relu (tf. matmul (h_pool2_flat, W_fc1) + B _fc1) keep_prob = tf. placeholder ("float") h_fc1_drop = tf. nn. dropout (h_fc1, keep_prob) W_fc2 = weight_variable ([1024, 10]) B _fc2 = bias_variable ([10]) y_conv = tf. nn. softmax (tf. matmul (h_fc1_drop, W_fc2) + B _fc2) cross_entropy =-tf. reduce_sum (y _ * tf. log (y_conv) train_step = tf. train. adamOptimizer (1e-4 ). minimize (cross_entropy) correct_prediction = tf. equal (tf. argmax (y_conv, 1), tf. argmax (y _, 1) accuracy = tf. performance_mean (tf. cast (correct_prediction, "float") # Save the network training parameter saver = tf. train. saver () sess. run (tf. initialize_all_variables () for I in range (8000): batch = mnist. train. next_batch (50) if I % 100 = 0: train_accuracy = accuracy. eval (feed_dict = {x: batch [0], y _: batch [1], keep_prob: 1.0}) print "step % d, training accuracy % g" % (I, train_accuracy) train_step.run (feed_dict = {x: batch [0], y _: batch [1], keep_prob: 0.5}) save_path = saver. save (sess, "model_mnist.ckpt") print ("Model saved in life:", save_path) print "test accuracy % g" % accuracy. eval (feed_dict = {x: mnist. test. images, y _: mnist. test. labels, keep_prob: 1.0 })
Input_data.py: the following code downloads the mnist Dataset: the code is the official download version provided by the mnist dataset.
# Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== """Functions for downloading and reading MNIST data.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function import gzip import os import tensorflow.python.platform import numpy from six.moves import urllib from six.moves import xrange # pylint: disable=redefined-builtin import tensorflow as tf SOURCE_URL = 'http://yann.lecun.com/exdb/mnist/' def maybe_download(filename, work_directory): """Download the data from Yann's website, unless it's already here.""" if not os.path.exists(work_directory): os.mkdir(work_directory) filepath = os.path.join(work_directory, filename) if not os.path.exists(filepath): filepath, _ = urllib.request.urlretrieve(SOURCE_URL + filename, filepath) statinfo = os.stat(filepath) print('Successfully downloaded', filename, statinfo.st_size, 'bytes.') return filepath def _read32(bytestream): dt = numpy.dtype(numpy.uint32).newbyteorder('>') return numpy.frombuffer(bytestream.read(4), dtype=dt)[0] def extract_images(filename): """Extract the images into a 4D uint8 numpy array [index, y, x, depth].""" print('Extracting', filename) with gzip.open(filename) as bytestream: magic = _read32(bytestream) if magic != 2051: raise ValueError( 'Invalid magic number %d in MNIST image file: %s' % (magic, filename)) num_images = _read32(bytestream) rows = _read32(bytestream) cols = _read32(bytestream) buf = bytestream.read(rows * cols * num_images) data = numpy.frombuffer(buf, dtype=numpy.uint8) data = data.reshape(num_images, rows, cols, 1) return data def dense_to_one_hot(labels_dense, num_classes=10): """Convert class labels from scalars to one-hot vectors.""" num_labels = labels_dense.shape[0] index_offset = numpy.arange(num_labels) * num_classes labels_one_hot = numpy.zeros((num_labels, num_classes)) labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1 return labels_one_hot def extract_labels(filename, one_hot=False): """Extract the labels into a 1D uint8 numpy array [index].""" print('Extracting', filename) with gzip.open(filename) as bytestream: magic = _read32(bytestream) if magic != 2049: raise ValueError( 'Invalid magic number %d in MNIST label file: %s' % (magic, filename)) num_items = _read32(bytestream) buf = bytestream.read(num_items) labels = numpy.frombuffer(buf, dtype=numpy.uint8) if one_hot: return dense_to_one_hot(labels) return labels class DataSet(object): def __init__(self, images, labels, fake_data=False, one_hot=False, dtype=tf.float32): """Construct a DataSet. one_hot arg is used only if fake_data is true. `dtype` can be either `uint8` to leave the input as `[0, 255]`, or `float32` to rescale into `[0, 1]`. """ dtype = tf.as_dtype(dtype).base_dtype if dtype not in (tf.uint8, tf.float32): raise TypeError('Invalid image dtype %r, expected uint8 or float32' % dtype) if fake_data: self._num_examples = 10000 self.one_hot = one_hot else: assert images.shape[0] == labels.shape[0], ( 'images.shape: %s labels.shape: %s' % (images.shape, labels.shape)) self._num_examples = images.shape[0] # Convert shape from [num examples, rows, columns, depth] # to [num examples, rows*columns] (assuming depth == 1) assert images.shape[3] == 1 images = images.reshape(images.shape[0], images.shape[1] * images.shape[2]) if dtype == tf.float32: # Convert from [0, 255] -> [0.0, 1.0]. images = images.astype(numpy.float32) images = numpy.multiply(images, 1.0 / 255.0) self._images = images self._labels = labels self._epochs_completed = 0 self._index_in_epoch = 0 @property def images(self): return self._images @property def labels(self): return self._labels @property def num_examples(self): return self._num_examples @property def epochs_completed(self): return self._epochs_completed def next_batch(self, batch_size, fake_data=False): """Return the next `batch_size` examples from this data set.""" if fake_data: fake_image = [1] * 784 if self.one_hot: fake_label = [1] + [0] * 9 else: fake_label = 0 return [fake_image for _ in xrange(batch_size)], [ fake_label for _ in xrange(batch_size)] start = self._index_in_epoch self._index_in_epoch += batch_size if self._index_in_epoch > self._num_examples: # Finished epoch self._epochs_completed += 1 # Shuffle the data perm = numpy.arange(self._num_examples) numpy.random.shuffle(perm) self._images = self._images[perm] self._labels = self._labels[perm] # Start next epoch start = 0 self._index_in_epoch = batch_size assert batch_size <= self._num_examples end = self._index_in_epoch return self._images[start:end], self._labels[start:end] def read_data_sets(train_dir, fake_data=False, one_hot=False, dtype=tf.float32): class DataSets(object): pass data_sets = DataSets() if fake_data: def fake(): return DataSet([], [], fake_data=True, one_hot=one_hot, dtype=dtype) data_sets.train = fake() data_sets.validation = fake() data_sets.test = fake() return data_sets TRAIN_IMAGES = 'train-images-idx3-ubyte.gz' TRAIN_LABELS = 'train-labels-idx1-ubyte.gz' TEST_IMAGES = 't10k-images-idx3-ubyte.gz' TEST_LABELS = 't10k-labels-idx1-ubyte.gz' VALIDATION_SIZE = 5000 local_file = maybe_download(TRAIN_IMAGES, train_dir) train_images = extract_images(local_file) local_file = maybe_download(TRAIN_LABELS, train_dir) train_labels = extract_labels(local_file, one_hot=one_hot) local_file = maybe_download(TEST_IMAGES, train_dir) test_images = extract_images(local_file) local_file = maybe_download(TEST_LABELS, train_dir) test_labels = extract_labels(local_file, one_hot=one_hot) validation_images = train_images[:VALIDATION_SIZE] validation_labels = train_labels[:VALIDATION_SIZE] train_images = train_images[VALIDATION_SIZE:] train_labels = train_labels[VALIDATION_SIZE:] data_sets.train = DataSet(train_images, train_labels, dtype=dtype) data_sets.validation = DataSet(validation_images, validation_labels, dtype=dtype) data_sets.test = DataSet(test_images, test_labels, dtype=dtype) return data_sets
Then perform the code test:
# import modules import sys import tensorflow as tf from PIL import Image, ImageFilter def predictint(imvalue): """ This function returns the predicted integer. The imput is the pixel values from the imageprepare() function. """ # Define the model (same as when creating the model file) x = tf.placeholder(tf.float32, [None, 784]) W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) def weight_variable(shape): initial = tf.truncated_normal(shape, stddev=0.1) return tf.Variable(initial) def bias_variable(shape): initial = tf.constant(0.1, shape=shape) return tf.Variable(initial) def conv2d(x, W): return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') W_conv1 = weight_variable([5, 5, 1, 32]) b_conv1 = bias_variable([32]) x_image = tf.reshape(x, [-1, 28, 28, 1]) h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) h_pool1 = max_pool_2x2(h_conv1) W_conv2 = weight_variable([5, 5, 32, 64]) b_conv2 = bias_variable([64]) h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) h_pool2 = max_pool_2x2(h_conv2) W_fc1 = weight_variable([7 * 7 * 64, 1024]) b_fc1 = bias_variable([1024]) h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) keep_prob = tf.placeholder(tf.float32) h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) W_fc2 = weight_variable([1024, 10]) b_fc2 = bias_variable([10]) y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2) init_op = tf.initialize_all_variables() saver = tf.train.Saver() """ Load the model_mnist.ckpt file file is stored in the same directory as this python script is started Use the model to predict the integer. Integer is returend as list. Based on the documentatoin at https://www.tensorflow.org/versions/master/how_tos/variables/index.html """ with tf.Session() as sess: sess.run(init_op) saver.restore(sess, "model_mnist.ckpt") # print ("Model restored.") prediction = tf.argmax(y_conv, 1) return prediction.eval(feed_dict={x: [imvalue], keep_prob: 1.0}, session=sess) def imageprepare(argv): """ This function returns the pixel values. The imput is a png file location. """ im = Image.open(argv).convert('L') width = float(im.size[0]) height = float(im.size[1]) newImage = Image.new('L', (28, 28), (255)) # creates white canvas of 28x28 pixels if width > height: # check which dimension is bigger # Width is bigger. Width becomes 20 pixels. nheight = int(round((20.0 / width * height), 0)) # resize height according to ratio width if (nheight == 0): # rare case but minimum is 1 pixel nheigth = 1 # resize and sharpen img = im.resize((20, nheight), Image.ANTIALIAS).filter(ImageFilter.SHARPEN) wtop = int(round(((28 - nheight) / 2), 0)) # caculate horizontal pozition newImage.paste(img, (4, wtop)) # paste resized image on white canvas else: # Height is bigger. Heigth becomes 20 pixels. nwidth = int(round((20.0 / height * width), 0)) # resize width according to ratio height if (nwidth == 0): # rare case but minimum is 1 pixel nwidth = 1 # resize and sharpen img = im.resize((nwidth, 20), Image.ANTIALIAS).filter(ImageFilter.SHARPEN) wleft = int(round(((28 - nwidth) / 2), 0)) # caculate vertical pozition newImage.paste(img, (wleft, 4)) # paste resized image on white canvas # newImage.save("sample.png") tv = list(newImage.getdata()) # get pixel values # normalize pixels to 0 and 1. 0 is pure white, 1 is pure black. tva = [(255 - x) * 1.0 / 255.0 for x in tv] return tva # print(tva) def main(argv): """ Main function. """ imvalue = imageprepare(argv) predint = predictint(imvalue) print (predint[0]) # first value in list if __name__ == "__main__": main('2.png')
The code I used for testing is as follows:
You can save the image to another path and test it.
(1) load an image of my Handwritten digits.
(2) convert the image to black and white (mode "L ")
(3) determine that the size of the original image is the largest
(4) Adjust the image size so that the maximum size (ether height and width) is 20 pixels and the scale is minimized in the same proportion.
(5) Sharpen the image. This will greatly enhance the results.
(6) paste the image on a 28x28 pixel white canvas. The image is centered from the top or the side at a maximum size of 4 pixels. The maximum size is always 20 pixels and 4 + 20 + 4 = 28. The minimum size is located at half the difference between 28 and the new size of the scaled image.
(7) obtain the pixel value of the new image (canvas + center image.
(8) Normalize a value between the pixel value 0 and 1 (this is also done in the TensorFlow MNIST tutorial ). 0 is white, and 1 is pure black. The pixel value obtained from step 7 is the opposite, where 255 is white and 0 is black, so the value must be reversed. The following formula includes inversion and normalization (255-X) * 1.0/255.0