Tensorflow TFRecords file generation and reading methods,

Last Update:2018-02-11 Source: Internet

Author: User

Tags glob

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Tensorflow TFRecords file generation and reading methods,

TensorFlow provides the TFRecords format to store data in a unified manner. Theoretically, TFRecords can store any form of data.

Data in the TFRecords file is stored in the format of tf. train. Example Protocol Buffer. The following code defines tf. train. Example.

message Example {   Features features = 1; }; message Features {   map<string, Feature> feature = 1; }; message Feature {   oneof kind {   BytesList bytes_list = 1;   FloatList float_list = 2;   Int64List int64_list = 3; } };

The following describes how to generate and read tfrecords files:

First, we will introduce how to generate a tfrecords file and use the following code:

From random import shuffle import numpy as np import glob import tensorflow as tf import cv2 import sys import OS # Because I installed the CPU version, there will be 'warning' during running ', the solution goes to the bottom, and the blind side is ~ OS. environ ['tf _ CPP_MIN_LOG_LEVEL '] = '2' shuffle_data = True image_path ='/path/to/image /*. jpg '# obtain the path of all images under the path, type (addrs) = list addrs = glob. glob (image_path) # detailed analysis of tag data acquisition, type (labels) = list labels =... # Here is the disordered data order. if shuffle_data: c = list (zip (addrs, labels) shuffle (c) addrs, labels = zip (* c) # split the dataset train_addrs = addrs [0: int (0.7 * len (addrs)] train_labels = labels [0: int (0.7 * len (labels)] val_addrs = addrs [int (0.7 * len (addrs): int (0.9 * len (addrs)] val_labels = labels [int (0.7 * len (labels )): int (0.9 * len (labels)] test_addrs = addrs [int (0.9 * len (addrs):] test_labels = labels [int (0.9 * len (labels):] # Didn't I get the image address above? The following function gets the image def load_image (addr) based on the address: # A function to Load image img = cv2.imread (addr) img = cv2.resize (img, (224,224), interpolation = cv2.INTER _ CUBIC) img = cv2.cvtColor (img, cv2.COLOR _ BGR2RGB) # Here/255 is used to normalize the pixel value to [255] img = img. img = img. astype (np. float32) return img # convert data to the corresponding attribute def _ int64_feature (value): return tf. train. feature (int64_list = tf. train. int64List (value = [value]) def _ bytes_feature (value): return tf. train. feature (bytes_list = tf. train. bytesList (value = [value]) def _ float_feature (value): return tf. train. feature (float_list = tf. train. floatList (value = [value]) # write data to the TFRecods file train_filename = '/path/to/train. tfrecords '# output file address # create a writer to write the TFRecords file writer = tf. python_io.TFRecordWriter (train_filename) for I in range (len (train_addrs): # This is the write operation visualization processing if not I % 1000: print ('train data :{}/{}'. format (I, len (train_addrs) sys. stdout. flush () # Load image img = load_image (train_addrs [I]) label = train_labels [I] # create an attribute (feature) feature = {'train/label ': _ int64_feature (label), 'train/image': _ bytes_feature (tf. compat. as_bytes (img. tostring ()} # create a example protocol buffer example = tf. train. example (features = tf. train. features (feature = feature) # Write the preceding example protocol buffer to the file writer. write (example. serializeToString () writer. close () sys. stdout. flush ()

The above section only describes the generation of the train. tfrecords file, and the rest of the validation and test are similar ..

Next we will introduce how to read tfrecords files:

Import tensorflow as tf import numpy as np import matplotlib. pyplot as plt import OS. environ ['tf _ CPP_MIN_LOG_LEVEL '] = '2' data_path = 'train. tfrecords '# tfrecords file address with tf. session () as sess: # define the feature first, which must be consistent with the feature = {'train/image': tf. fixedLenFeature ([], tf. string), 'train/label': tf. fixedLenFeature ([], tf. int64)} # create a queue to maintain the input file list filename_queue = tf. train. string_input_producer ([data_path], num_epochs = 1) # define a reader and read the next record reader = tf. TFRecordReader () _, serialized_example = reader. read (filename_queue) # parse a record features = tf. parse_single_example (serialized_example, features = feature) # resolve the string to the pixel Group image = tf corresponding to the image. decode_raw (features ['train/image'], tf. float32) # convert the label to int32 label = tf. cast (features ['train/label'], tf. int32) # Here, the image is restored to the original dimension image = tf. reshape (image, [224,224, 3]) # You can perform other preprocessing operations .... # Here is the creation of random batches (Baidu) images, labels = tf. train. shuffle_batch ([image, label], batch_size = 10, capacity = 30, min_after_dequeue = 10) # initialize init_op = tf. group (tf. global_variables_initializer (), tf. local_variables_initializer () sess. run (init_op) # Start multithreading to process input data coord = tf. train. coordinator () threads = tf. train. start_queue_runners (coord = coord ).... # Shut down the coord thread. request_stop () coord. join (threads) sess. close ()

Okay. Here we will introduce you... if you have any questions, please leave a message .. Let's learn together .. I hope it will be helpful for everyone's learning, and I hope you can support the house of helping customers more.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More