"TensorFlow to play" Data import 2_tensorflow

Source: Internet
Author: User
Tags scalar
Brief introduction

This article describes the second method of data import for TensorFlow.

This approach is somewhat cumbersome to maintain efficiency. There are several steps to be divided:
-Write all samples to binary (execute only once)
-Create tensor to read a sample from a binary file
-Create tensor, randomly read a mini-batch from binary files
-Mini-batchtensor the incoming network as an input node. binary files

Use Tf.python_io. Tfrecordwriter creates a writer that stores tensorflow data specifically, with the name extension '. Tfrecord '.
A sample of the serialized tf.train.Example type is stored sequentially in the file.

writer = Tf.python_io. Tfrecordwriter ('/tmp/data.tfrecord ') for
I in range (0):
    # Create Sample Example
    #
    ... serialized = example. Serializetostring ()   # serialized
    Writer.write (serialized)    # write to File
writer.close ()

Each example feature member variable is a dict that stores different parts of a sample (for example, image pixels + class labels). The sample in the following example contains three key a,b,c:

    # Create a sample example
    A_data = 0.618 + i         # float
    b_data = [2016 + i, 2017+i]     # int64
    c_data = Numpy.array ([[0, 1, 2],[3, 4, 5]] + I    # bytes
    c_data = C_data.astype (numpy.uint8)
    C_raw = c.tostring ()             # converted to string

    exam ple = Tf.train.Example (
        features=tf.train.features (
            feature={
                ' a ': Tf.train.Feature (
                    float_ List=tf.train.floatlist (Value=[a_data])   # square brackets denote input as List
                ),
                ' B ': tf.train.Feature (
                    int64_list= Tf.train.Int64List (Value=b_data)    # B_data itself is
                a list),
                ' C ': tf.train.Feature (
                    bytes_list= Tf.train.BytesList (Value=[c_raw])))
    

The value portion of the DICT member accepts three types of data:
-Tf.train.FloatList: Each element of the list is float. For example A.
-Tf.train.Int64List: Each element of the list is int64. For example, B.
-Tf.train.BytesList: Each element of the list is string. For example, C.

The third type is especially suitable for image samples. Note that you want to set the Uint8 type before you turn it into a string. Read a sample

Next, we define a function, create the "read a sample from file" operation, and return the result tensor.

def read_single_sample (filename):
    # Read the sample example each member A,b,c
    # ...
    Return a, B, c

First create a read file queue, using TF. Tfrecordreader reads a serialized sample from the file queue.

    # Read the sample example each member A,b,c
    filename_queue = tf.train.string_input_producer ([filename], num_epochs=none)    # No Limit read quantity
    reader = tf. Tfrecordreader ()
    _, Serialized_example = Reader.read (filename_queue)

If the sample size is very large, can be divided into several files, the list of file names into the Tf.train.string_input_producer.
Unlike just the writer, this reader is symbolic and only run in Sess.

Then parse the symbolic sample

    # get feature from serialized example
    features = Tf.parse_single_example (
        serialized_example,
        features={
            ' a ': TF. Fixedlenfeature ([], Tf.float32),    #0D, scalar
            ' B ': TF. Fixedlenfeature ([2], Tf.int64),   # 1 D, length 2
            ' C ': TF. Fixedlenfeature ([], tf.string)  # 0D, scalar
        }
    )
    a = features[' a ']
    B = features[' B ']
    C_raw = Features[' C ']
    C = Tf.decode_raw (C_raw, tf.uint8)
    C = Tf.reshape (c, [2, 3])

For Byteslist, to decode the 0-D tensor of string type into the uint8 type of 1-D tensor. Read Mini-batch

Using Tf.train.shuffle_batch, the aforementioned a,b,c are randomized to obtain mini-batchtensor:

A_batch, b_batch, C_batch = Tf.train.shuffle_batch ([A, B, C], batch_size=2, capacity=200, min_after_dequeue=100, num_ threads=2)
Use

Create a session and initialize:

# sess
sess = tf. Session ()
init = tf.initialize_all_variables ()
sess.run (init)
tf.train.start_queue_runners (sess=sess )

Because the Read file queue is used, it is start_queue_runners.

Each time it runs, a Mini-batch sample is randomly generated:

A_val, b_val, c_val = Sess.run ([A_batch, B_batch, C_batch])
a_val, b_val, c_val = Sess.run ([A_batch, B_batch, C_batch] )

Such mini-batch can be used as input nodes of the network. Summarize

If you want to learn more about the queue mechanism in the example, see this article.

This article refers to the following examples:
Https://github.com/mnuke/tf-slim-mnist
https://indico.io/blog/tensorflow-data-inputs-part1-placeholders-protobufs-queues/
Https://github.com/tensorflow/tensorflow/tree/r0.11/tensorflow/models/image/cifar10

The complete code is as follows:

Import TensorFlow as TF import numpy def write_binary (): writer = Tf.python_io.
        Tfrecordwriter ('/tmp/data.tfrecord ') for I in range (0, 2): a = 0.618 + i b = [2016 + i, 2017+i] c = Numpy.array ([[0, 1, 2],[3, 4, 5]]) + i c = c.astype (numpy.uint8) C_raw = c.tostring () exam ple = Tf.train.Example (Features=tf.train.features (feature={' a ': Tf.trai N.feature (Float_list=tf.train.floatlist (value=[a)), ' B
                    ': Tf.train.Feature (Int64_list=tf.train.int64list (value=b)),
                ' C ': Tf.train.Feature (Bytes_list=tf.train.byteslist (Value=[c_raw)) )) serialized = example. Serializetostring () writer.write (serialized) Writer.close () def read_single_sample (filename): # OUTput file name string to a queue Filename_queue = Tf.train.string_input_producer ([filename], num_epochs=none) # C reate a reader from file queue reader = tf. Tfrecordreader () _, Serialized_example = Reader.read (filename_queue) # get feature from serialized example F Eatures = Tf.parse_single_example (serialized_example, features={' a ': TF. Fixedlenfeature ([], tf.float32), ' B ': TF. Fixedlenfeature ([2], Tf.int64), ' C ': TF.
    Fixedlenfeature ([], tf.string)}) A = Features[' a '] b = features[' B '] C_raw = features[' C ']
    c = Tf.decode_raw (C_raw, tf.uint8) c = Tf.reshape (c, [2, 3]) return A, B, C #-----Main function-----If 1: Write_binary () Else: # Create tensor A, B, C = read_single_sample ('/tmp/data.tfrecord ') A_batch, B_batch, C_b Atch = Tf.train.shuffle_batch ([A, B, C], batch_size=3, capacity=200, min_after_dequeue=100, num_threads=2) queues = t F.get_collectiOn (TF. graphkeys.queue_runners) # Sess sess = tf. Session () init = Tf.initialize_all_variables () sess.run (init) tf.train.start_queue_runners (sess=sess) a_v Al, B_val, c_val = Sess.run ([A_batch, B_batch, C_batch]) print (A_val, B_val, C_val) A_val, b_val, c_val = Sess.run
 ([A_batch, B_batch, C_batch]) print (A_val, B_val, C_val)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.