Brief introduction
This article describes the second method of data import for TensorFlow.
This approach is somewhat cumbersome to maintain efficiency. There are several steps to be divided:
-Write all samples to binary (execute only once)
-Create tensor to read a sample from a binary file
-Create tensor, randomly read a mini-batch from binary files
-Mini-batchtensor the incoming network as an input node. binary files
Use Tf.python_io. Tfrecordwriter creates a writer that stores tensorflow data specifically, with the name extension '. Tfrecord '.
A sample of the serialized tf.train.Example type is stored sequentially in the file.
writer = Tf.python_io. Tfrecordwriter ('/tmp/data.tfrecord ') for
I in range (0):
# Create Sample Example
#
... serialized = example. Serializetostring () # serialized
Writer.write (serialized) # write to File
writer.close ()
Each example feature member variable is a dict that stores different parts of a sample (for example, image pixels + class labels). The sample in the following example contains three key a,b,c:
# Create a sample example
A_data = 0.618 + i # float
b_data = [2016 + i, 2017+i] # int64
c_data = Numpy.array ([[0, 1, 2],[3, 4, 5]] + I # bytes
c_data = C_data.astype (numpy.uint8)
C_raw = c.tostring () # converted to string
exam ple = Tf.train.Example (
features=tf.train.features (
feature={
' a ': Tf.train.Feature (
float_ List=tf.train.floatlist (Value=[a_data]) # square brackets denote input as List
),
' B ': tf.train.Feature (
int64_list= Tf.train.Int64List (Value=b_data) # B_data itself is
a list),
' C ': tf.train.Feature (
bytes_list= Tf.train.BytesList (Value=[c_raw])))
The value portion of the DICT member accepts three types of data:
-Tf.train.FloatList: Each element of the list is float. For example A.
-Tf.train.Int64List: Each element of the list is int64. For example, B.
-Tf.train.BytesList: Each element of the list is string. For example, C.
The third type is especially suitable for image samples. Note that you want to set the Uint8 type before you turn it into a string. Read a sample
Next, we define a function, create the "read a sample from file" operation, and return the result tensor.
def read_single_sample (filename):
# Read the sample example each member A,b,c
# ...
Return a, B, c
First create a read file queue, using TF. Tfrecordreader reads a serialized sample from the file queue.
# Read the sample example each member A,b,c
filename_queue = tf.train.string_input_producer ([filename], num_epochs=none) # No Limit read quantity
reader = tf. Tfrecordreader ()
_, Serialized_example = Reader.read (filename_queue)
If the sample size is very large, can be divided into several files, the list of file names into the Tf.train.string_input_producer.
Unlike just the writer, this reader is symbolic and only run in Sess.
Then parse the symbolic sample
# get feature from serialized example
features = Tf.parse_single_example (
serialized_example,
features={
' a ': TF. Fixedlenfeature ([], Tf.float32), #0D, scalar
' B ': TF. Fixedlenfeature ([2], Tf.int64), # 1 D, length 2
' C ': TF. Fixedlenfeature ([], tf.string) # 0D, scalar
}
)
a = features[' a ']
B = features[' B ']
C_raw = Features[' C ']
C = Tf.decode_raw (C_raw, tf.uint8)
C = Tf.reshape (c, [2, 3])
For Byteslist, to decode the 0-D tensor of string type into the uint8 type of 1-D tensor. Read Mini-batch
Using Tf.train.shuffle_batch, the aforementioned a,b,c are randomized to obtain mini-batchtensor:
A_batch, b_batch, C_batch = Tf.train.shuffle_batch ([A, B, C], batch_size=2, capacity=200, min_after_dequeue=100, num_ threads=2)
Use
Create a session and initialize:
# sess
sess = tf. Session ()
init = tf.initialize_all_variables ()
sess.run (init)
tf.train.start_queue_runners (sess=sess )
Because the Read file queue is used, it is start_queue_runners.
Each time it runs, a Mini-batch sample is randomly generated:
A_val, b_val, c_val = Sess.run ([A_batch, B_batch, C_batch])
a_val, b_val, c_val = Sess.run ([A_batch, B_batch, C_batch] )
Such mini-batch can be used as input nodes of the network. Summarize
If you want to learn more about the queue mechanism in the example, see this article.
This article refers to the following examples:
Https://github.com/mnuke/tf-slim-mnist
https://indico.io/blog/tensorflow-data-inputs-part1-placeholders-protobufs-queues/
Https://github.com/tensorflow/tensorflow/tree/r0.11/tensorflow/models/image/cifar10
The complete code is as follows:
Import TensorFlow as TF import numpy def write_binary (): writer = Tf.python_io.
Tfrecordwriter ('/tmp/data.tfrecord ') for I in range (0, 2): a = 0.618 + i b = [2016 + i, 2017+i] c = Numpy.array ([[0, 1, 2],[3, 4, 5]]) + i c = c.astype (numpy.uint8) C_raw = c.tostring () exam ple = Tf.train.Example (Features=tf.train.features (feature={' a ': Tf.trai N.feature (Float_list=tf.train.floatlist (value=[a)), ' B
': Tf.train.Feature (Int64_list=tf.train.int64list (value=b)),
' C ': Tf.train.Feature (Bytes_list=tf.train.byteslist (Value=[c_raw)) )) serialized = example. Serializetostring () writer.write (serialized) Writer.close () def read_single_sample (filename): # OUTput file name string to a queue Filename_queue = Tf.train.string_input_producer ([filename], num_epochs=none) # C reate a reader from file queue reader = tf. Tfrecordreader () _, Serialized_example = Reader.read (filename_queue) # get feature from serialized example F Eatures = Tf.parse_single_example (serialized_example, features={' a ': TF. Fixedlenfeature ([], tf.float32), ' B ': TF. Fixedlenfeature ([2], Tf.int64), ' C ': TF.
Fixedlenfeature ([], tf.string)}) A = Features[' a '] b = features[' B '] C_raw = features[' C ']
c = Tf.decode_raw (C_raw, tf.uint8) c = Tf.reshape (c, [2, 3]) return A, B, C #-----Main function-----If 1: Write_binary () Else: # Create tensor A, B, C = read_single_sample ('/tmp/data.tfrecord ') A_batch, B_batch, C_b Atch = Tf.train.shuffle_batch ([A, B, C], batch_size=3, capacity=200, min_after_dequeue=100, num_threads=2) queues = t F.get_collectiOn (TF. graphkeys.queue_runners) # Sess sess = tf. Session () init = Tf.initialize_all_variables () sess.run (init) tf.train.start_queue_runners (sess=sess) a_v Al, B_val, c_val = Sess.run ([A_batch, B_batch, C_batch]) print (A_val, B_val, C_val) A_val, b_val, c_val = Sess.run
([A_batch, B_batch, C_batch]) print (A_val, B_val, C_val)