Tensorflow TFRecords file generation and reading methods,
TensorFlow provides the TFRecords format to store data in a unified manner. Theoretically, TFRecords can store any form of data.
Data in the TFRecords file is stored in the format of tf. train. Example Protocol Buffer. The following code defines tf. train. Example.
message Example { Features features = 1; }; message Features { map<string, Feature> feature = 1; }; message Feature { oneof kind { BytesList bytes_list = 1; FloatList float_list = 2; Int64List int64_list = 3; } };
The following describes how to generate and read tfrecords files:
First, we will introduce how to generate a tfrecords file and use the following code:
From random import shuffle import numpy as np import glob import tensorflow as tf import cv2 import sys import OS # Because I installed the CPU version, there will be 'warning' during running ', the solution goes to the bottom, and the blind side is ~ OS. environ ['tf _ CPP_MIN_LOG_LEVEL '] = '2' shuffle_data = True image_path ='/path/to/image /*. jpg '# obtain the path of all images under the path, type (addrs) = list addrs = glob. glob (image_path) # detailed analysis of tag data acquisition, type (labels) = list labels =... # Here is the disordered data order. if shuffle_data: c = list (zip (addrs, labels) shuffle (c) addrs, labels = zip (* c) # split the dataset train_addrs = addrs [0: int (0.7 * len (addrs)] train_labels = labels [0: int (0.7 * len (labels)] val_addrs = addrs [int (0.7 * len (addrs): int (0.9 * len (addrs)] val_labels = labels [int (0.7 * len (labels )): int (0.9 * len (labels)] test_addrs = addrs [int (0.9 * len (addrs):] test_labels = labels [int (0.9 * len (labels):] # Didn't I get the image address above? The following function gets the image def load_image (addr) based on the address: # A function to Load image img = cv2.imread (addr) img = cv2.resize (img, (224,224), interpolation = cv2.INTER _ CUBIC) img = cv2.cvtColor (img, cv2.COLOR _ BGR2RGB) # Here/255 is used to normalize the pixel value to [255] img = img. img = img. astype (np. float32) return img # convert data to the corresponding attribute def _ int64_feature (value): return tf. train. feature (int64_list = tf. train. int64List (value = [value]) def _ bytes_feature (value): return tf. train. feature (bytes_list = tf. train. bytesList (value = [value]) def _ float_feature (value): return tf. train. feature (float_list = tf. train. floatList (value = [value]) # write data to the TFRecods file train_filename = '/path/to/train. tfrecords '# output file address # create a writer to write the TFRecords file writer = tf. python_io.TFRecordWriter (train_filename) for I in range (len (train_addrs): # This is the write operation visualization processing if not I % 1000: print ('train data :{}/{}'. format (I, len (train_addrs) sys. stdout. flush () # Load image img = load_image (train_addrs [I]) label = train_labels [I] # create an attribute (feature) feature = {'train/label ': _ int64_feature (label), 'train/image': _ bytes_feature (tf. compat. as_bytes (img. tostring ()} # create a example protocol buffer example = tf. train. example (features = tf. train. features (feature = feature) # Write the preceding example protocol buffer to the file writer. write (example. serializeToString () writer. close () sys. stdout. flush ()
The above section only describes the generation of the train. tfrecords file, and the rest of the validation and test are similar ..
Next we will introduce how to read tfrecords files:
Import tensorflow as tf import numpy as np import matplotlib. pyplot as plt import OS. environ ['tf _ CPP_MIN_LOG_LEVEL '] = '2' data_path = 'train. tfrecords '# tfrecords file address with tf. session () as sess: # define the feature first, which must be consistent with the feature = {'train/image': tf. fixedLenFeature ([], tf. string), 'train/label': tf. fixedLenFeature ([], tf. int64)} # create a queue to maintain the input file list filename_queue = tf. train. string_input_producer ([data_path], num_epochs = 1) # define a reader and read the next record reader = tf. TFRecordReader () _, serialized_example = reader. read (filename_queue) # parse a record features = tf. parse_single_example (serialized_example, features = feature) # resolve the string to the pixel Group image = tf corresponding to the image. decode_raw (features ['train/image'], tf. float32) # convert the label to int32 label = tf. cast (features ['train/label'], tf. int32) # Here, the image is restored to the original dimension image = tf. reshape (image, [224,224, 3]) # You can perform other preprocessing operations .... # Here is the creation of random batches (Baidu) images, labels = tf. train. shuffle_batch ([image, label], batch_size = 10, capacity = 30, min_after_dequeue = 10) # initialize init_op = tf. group (tf. global_variables_initializer (), tf. local_variables_initializer () sess. run (init_op) # Start multithreading to process input data coord = tf. train. coordinator () threads = tf. train. start_queue_runners (coord = coord ).... # Shut down the coord thread. request_stop () coord. join (threads) sess. close ()
Okay. Here we will introduce you... if you have any questions, please leave a message .. Let's learn together .. I hope it will be helpful for everyone's learning, and I hope you can support the house of helping customers more.