TensorFlow Create and read 17flowers datasets _tensorflow

Source: Internet
Author: User
Tags image processing library
Recently began to learn TensorFlow, read a lot of video tutorials and blogs, most seniors in the introduction of the use of TensorFlow will call the official document given the dataset, but for my small white, if you want to train their own dataset, It is really difficult to convert your pictures into a format that you can enter into your network. But if you do not do the image preprocessing, do not take this step, the future of the road to learning will become more and more difficult to go, so today or the hard to take me these days have been realized part of a summary.     The main reference to a blog, the end of the article has links, through the Blogger's method I successfully generated their own dataset. First of all, introduce the use of the two libraries, one is the OS, one is PIL. PiL (Python Imaging library) is the most commonly used image processing library in Python, and the image class is a very important class in the PIL library, through which you can create instances that have direct loading of image files. Three methods of reading processed images and images obtained by means of fetching.
The dataset I used was the Category Flower dataset. 17flowers is the most common 17 flowers in the UK selected by Visual Geometry Group at Oxford University. Each flower has 80 pictures, the entire data and has 1360 pictures, can be downloaded at the official website.     However, in the follow-up training process encountered the problem of fitting, later explained. Since the structure of the 17-flower dataset is shown in the following illustration, the label is the name of the outermost folder. So when you enter a label, you can read it directly through the file.
We created the dataset through Tfrecords, Tfrecords is actually a binary file, although it is not as good as other formats to understand, but it can make better use of memory, easier to copy and move, and does not require a separate label file (label).
 import OS import tensorflow as TF from pil import Image CWD = OS.GETCWD () classes = Os.listdir (cwd+ "/17flowers/jpg") writer = Tf.python_io.  Tfrecordwriter ("Train.tfrecords") for index, name in enumerate (classes): Class_path = CWD + "/17flowers/jpg/" + name +  '/' If Os.path.isdir (Class_path): For Img_name in Os.listdir (class_path): Img_path = Class_path + Img_name img = image.open (img_path) img = img.resize ((224, 224)) Img_raw = Img.tobyte S () #将图片转化为原生bytes example = Tf.train.Example (Features=tf.train.features (feature={"L Abel ": Tf.train.Feature (Int64_list=tf.train.int64list (value=[int (name))), ' Img_raw ': Tf.train.Feature (bytes_l Ist=tf.train.byteslist (Value=[img_raw]))) Writer.write (example. Serializetostring ()) #序列化为字符串 writer.close () print (img_name) 

We use Tf.train.Example to define the data format we want to fill in, where label is the label, which is the outermost folder name, Img_raw for the Easy manager binary image. Then use Tf.python_io. Tfrecordwriter to write. Basically, a example contains a dictionary of the Features,features containing feature (there is no s). Finally, the feature contains a floatlist, or bytelist, or int64list. In this way, we put the relevant information into a file, so before we said that the separate label file. And the reading is also very convenient.

Here's a test to see if the saved training set is available:

For serialized_example in Tf.python_io.tf_record_iterator ("Train.tfrecords"):
    example = Tf.train.Example ()
    Example. Parsefromstring (serialized_example)

	image = example.features.feature[' image '].bytes_list.value
    label = example.features.feature[' label '].int64_list.value
    # can do some preprocessing such as
    print image, label

Can output a value, now we have created a good dataset already stored in the Train.tfrecords in the statistics directory. The next task is to read the data in this training set through queues.

def read_and_decode (filename):    
  #根据文件名生成一个队列    
  filename_queue = tf.train.string_input_producer ([filename])    
  reader = tf. Tfrecordreader ()    
  _, Serialized_example = Reader.read (filename_queue)   
  #返回文件名和文件    
  features = Tf.parse_ Single_example (serialized_example, features={     
                                               ' label ': TF. Fixedlenfeature ([], Tf.int64),                                                                    ' Img_raw ': TF. Fixedlenfeature ([], tf.string),})    
  img = Tf.decode_raw (features[' Img_raw '), tf.uint8)    
  img = Tf.reshape ( IMG, [224, 224, 3])    
  img = Tf.cast (IMG, tf.float32) * (1./255)-0.5    
  label = tf.cast (features[' label '), tf.in T64) return    
  IMG, label
The filename, the training set that has just been generated by Tfreader. By converting it to string type data and then reading the files in the queue through reader, the corresponding label and picture data are obtained by features's name, ' label ' and ' Img_raw '.     After that is a series of transcoding and reshape work. With these training sets in place, the next step is to use the resulting label and IMG to train the network.
IMG, label = Read_and_decode ("train.tfrecords")
img_batch, Label_batch = Tf.train.shuffle_batch ([img, label], batch_size=100, capacity=2000, min_after_dequeue=1000)
labels = tf.one_hot (label_batch,17,1,0)
 coord = Tf.train.Coordinator ()
 threads = tf.train.start_queue_runners (coord=coord,sess=sess) for
 I in range (200) :    
   batch_xs, Batch_ys = Sess.run ([Img_batch, labels])    
   print (Sess.run (Train_step, Feed_dict={xs:batch_xs, Ys : Batch_ys, keep_prob:0.5})    
   print ("Loss:", Sess.run (Cross_entropy,feed_dict={xs:batch_xs, Ys:batch_ys, Keep_ prob:0.5}))     
   if I% = 0:         
     print (Compute_accuracy (mnist.test.images, mnist.test.labels))

 Coord.request_stop ()
 coord.join ()

Note that because the queue is used here to read the training set, the asynchronous way is to let the queue runner through the coordinator to start the threads through the coordinator and terminate the thread after the last read queue ends. However, in the training of the training set of the process of continuous output loss function value, found that only 5 times the iteration is 0, the reason for the current thinking may be that the training set is too small, each class only 80 picture. Another reason may be that the network structure is too deep, due to the use of vggnet, too many training parameters, easy to fit. Next time, do a small network test.


Original blog Http://ycszen.github.io/2016/08/17/TensorFlow efficient reading of data/.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.