TensorFlow and tensorflow
Overview
The newly uploaded mcnn contains complete data read/write examples. For details, refer.
The official website provides three methods for Tensorflow to read data:
- Feeding: each step of TensorFlow execution allows Python code to supply data.
- Read data from a file: at the beginning of a TensorFlow graph, let an input pipeline read data from the file.
- Pre-load data: Define constants or variables in the TensorFlow graph to save all the data (only applicable when the data volume is small ).
For a small amount of data, it is possible to directly load the data into the memory, and then input the network into the batch for training (tip: This method is more concise when combined with yield, let's try it by yourself. I won't go into details ). However, if the data size is large, this method is not applicable. Because it is too memory-consuming, it is best to use the queue provided by tensorflow, that is, the second method to read data from the file. For some specific reads, such as the csv file format, there are descriptions on the official website. Here I will introduce a common and efficient reading method (few on the official website ), TFRecords
If it's too long to look at the source code, please visit my github. Remember to add a star.
TFRecords
TFRecords is actually a binary file. Although it is not as easy to understand as other formats, it can make better use of memory and facilitate copying and moving, and there is no need for a separate Tag file (I will know why later )... ... All in all, this file format has many advantages, so let's use it.
The TFRecords file contains the tf. train. Example protocol memory block (protocol buffer) (the protocol memory block contains the Features field ). We can write a piece of code to get your data, fill in the data into the Example protocol memory block (protocol buffer), serialize the protocol memory block into a string, and use tf. python_io.TFRecordWriter writes data to the TFRecords file.
To read data from the TFRecords file, you can use the tf. parse_single_example parser of tf. TFRecordReader. This operation can resolve the memory block (protocol buffer) of the Example protocol to a tensor.
Next, let's start reading data ~
Generate a TFRecords File
We use tf. train. Example to define the data format we want to fill in, and then use tf. python_io.TFRecordWriter to write data.
Import osimport tensorflow as tf from PIL import Imagecwd = OS. getcwd () ''' the data directory I loaded here is as follows: 0 -- img1.jpg img2.jpg img3.jpg... 1 -- img1.jpg img2.jpg... 2 --... here 0, 1, 2... category, that is, the classes in the following section, is a list defined by my own data type. You can use it flexibly according to your own data situation... '''writer = tf. python_io.TFRecordWriter ("train. tfrecords ") for index, name in enumerate (classes): class_path = cwd + name +"/"for img_name in OS. listdir (class_path): img_path = class_path + img_name img = Image. open (img_path) img = img. resize( (224,224) img_raw = img. tobytes () # convert the image to native bytes example = tf. train. example (features = tf. train. features (feature = {"label": tf. train. feature (int64_list = tf. train. int64List (value = [index]), 'img _ raw': tf. train. feature (bytes_list = tf. train. bytesList (value = [img_raw]) writer. write (example. serializeToString () # serialize to a string writer. close ()
For the definition and details of Example Feature, I recommend you go to the official website to view related APIs.
Basically, an Example contains Features, and Features contains the Feature (not s here) dictionary. Finally, Feature contains a FloatList, ByteList, or Int64List.
In this way, we store the relevant information in a file, so we didn't need to use a separate label file. It is also easy to read.
The following is a simple example of reading small data:
For serialized_example in tf. python_io.tf_record_iterator ("train. tfrecords "): example = tf. train. example () example. parseFromString (serialized_example) image = example. features. feature ['image']. bytes_list.value label = example. features. feature ['label']. int64_list.value # print image, label
Read from queue
Once a TFRecords file is generated, in order to efficiently read data, TF uses the queue to read data.
Def read_and_decode (filename): # generate a queue named filename_queue = tf Based on the file name. train. string_input_producer ([filename]) reader = tf. TFRecordReader () _, serialized_example = reader. read (filename_queue) # returns the file name and file features = tf. parse_single_example (serialized_example, features = {'label': tf. fixedLenFeature ([], tf. int64), 'img _ raw': tf. fixedLenFeature ([], tf. string),}) img = tf. decode_raw (features ['img _ raw'], tf. uint8) img = tf. reshape (img, [224,224, 3]) img = tf. cast (img, tf. float32) * (1. /255)-0.5 label = tf. cast (features ['label'], tf. int32) return img, label
Then we can use it during training.
Img, label = read_and_decode ("train. tfrecords ") # Use shuffle_batch to randomly disrupt the input img_batch, label_batch = tf. train. shuffle_batch ([img, label], batch_size = 30, capacity = 2000, min_after_dequeue = 1000) init = tf. initialize_all_variables () with tf. session () as sess: sess. run (init) threads = tf. train. start_queue_runners (sess = sess) for I in range (3): val, l = sess. run ([img_batch, label_batch]) # We can also process val and l as needed # l = to_categorical (l, 12) print (val. shape, l)
So far, tensorflow's efficient reading of data from a file is almost complete.
Well? Wait... What is it like? By the way, there are several precautions:
First, graph in tensorflow can remember the state, which enables TFRecordReader to remember the position of tfrecord and always return to the next one. This requires that the entire graph must be initialized before use. Here we use the tf. initialize_all_variables () function for initialization.
Second, the queues in tensorflow are similar to normal queues. However, operation and tensor in tensorflow are both symbolic and executed only when sess. run () is called.
Third, TFRecordReader will pop up the name of the file in the queue until the queue is empty.
Summary
- Generate a tfrecord File
- Define record reader to parse tfrecord files
- Construct a batch generator (batcher)
- Build other operations
- Initialize all operations
- Start QueueRunner
Please stamp the sample code to my github. If you think it is helpful, you can add a star.
The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.