TensorFlow Read Data

Source: Internet
Author: User
Tags glob

This article describes how to use TensorFlow to read picture data, mainly to write Tfrecord file and read and directly use the queue to read two ways. Suppose our picture directory structure is as follows:

|---a
|   | ---1.jpg
|   | ---2.jpg
|   | ---3.jpg
|
| ---b
|   | ---1.jpg
|   | ---2.jpg
|   | ---3.jpg
|
| ---c
|   | ---1.jpg
|   | ---2.jpg
|   | ---3.jpg
1 Using Tfrecoder

Train of thought: thinking: Using Tfrecod is mainly to write each picture and its corresponding label into a tfrecode file. Tfrecode is saved in binary form, where the PROTOBUF definition protocol is used internally, which means that the definition format is serialized as binary. We can use the tf.train.Example provided by TF to specify the serialization format. The label for all files in the A directory is specified as a, and the other two directories B and C are the same.

The code is as follows:

def build_data (dir,file_str,map_str): "':p Aram dir: root directory, dir. All subdirectory names are label:p Aram File_str: exported tfrecorde files
    :p Aram Map_str: Digital serial 0~n and label mapping Relationship Save path: return: ' Files=os.listdir ' (dir); writer = Tf.python_io. Tfrecordwriter (FILE_STR) # The file to generate # because tf.train.Feature can only take float, int, and bytes, you need to map the label to int, save to file Map_file = open (M Ap_str, ' W ') for Index,label in Enumerate (files): #遍历文件夹 Data_dir = Os.path.join (Dir,label) Map_fil E.write (str (index) + ":" + label + "\ n") for Img_name in Os.listdir (data_dir): #遍历图片 Img_path=os.path       . Join (data_dir,img_name) img = Image.open (img_path) #读取图片 img = img.resize ((256, 256)) #将图片宽高转为256 *256 img_raw=img.tobytes () #图片转为字节 example=tf.train.example (features=t
                F.train.features (feature={' label ': Tf.train.Feature (Int64_list=tf.train.int64list (Value=[index)), ' img ': Tf.train.FeaturE (Bytes_list=tf.train.byteslist (Value=[img_raw]))) Writer.write (example. Serializetostring ()) # is serialized as a string and written to the file Writer.close () map_file.close ();

The next step is to read the Tfrecord file. Note that the label, IMG name, and type are consistent when read:

def read_data (file_str):
    # generates a queue based on the filename
    file_path_queue = Tf.train.string_input_producer ([file_str])

    reader = TF. Tfrecordreader ()
    _, Serialized_example = Reader.read (file_path_queue)  # Returns the file name and file
    features = Tf.parse_ Single_example (serialized_example,
                                       features={
                                           ' label ': TF. Fixedlenfeature ([], Tf.int64),
                                           ' img ': TF. Fixedlenfeature ([], tf.string),
                                       })
    label = tf.cast (features[' label ', Tf.int64)       # Read label
    img = Tf.decode_raw (features[' img '), tf.uint8)
    img = Tf.reshape (IMG, [256, 256, 3])               #将维度转为256 *256 3-Channel
    img = Tf.cast (IMG, tf.float32) * (1/255)-0.5  #将图片中的数据转为 [ -0.5,0.5] Return

    IMG, label

Next look at how to use:

Build_data ("D:/test", "D:/data/tf.tfrecorde", "D:/data/map.txt")
img, label =read_data ("D:/data/tf.tfrecorde")



#使用shuffle_batch可以随机打乱输入
img_batch, Label_batch = Tf.train.shuffle_batch ([img, label],
                                                batch_size=30, capacity=2000,
                                                min_after_dequeue=1000)
init = Tf.initialize_all_variables () with

TF. Session () as Sess:
    sess.run (init)
    threads = tf.train.start_queue_runners (sess=sess) for
    I in range (3):
        IMGs, labels= sess.run ([Img_batch, Label_batch])
        #我们也可以根据需要对val, L handle 
        print (imgs.shape, labels)

The results of the operation are as follows:

(30, 256, 256, 3) [1 2 2 1 1 2 2 1 0 1 0 1 0 0 2 0 0 0 2 1 1 1 1 0 0 1 2 1 2 0]
(30, 256, 256, 3) [2 1 1 0 0 1 1 0 2 2 2 0 0 0 0 2 1 0 0 2 0 0 2 2 2 1 0 1 0 2]
(30, 256, 256, 3) [2 0 2 0 1 2 1 2 2 1 0 2 0 0 2 2 2 1 1 1 1 1 0 0 2 0 2 2 0 0]

From the results, we can see that although we have only 9 pictures available. Each class each 3, but can read 30*30*30 Zhang out, this is mainly through the circular read. In other words, although the number of increase, but in fact that is the 9 pictures. 2 Do not use Tfrecord

Tfrecord is suitable for encapsulating tags, picture data, and other related data into an object, and then reading it individually. Sometimes, we don't need labels, we just need to read the pictures. Then you can consider reading from the path queue without having to go to the Tfrecord file.

Directly on the code:

def read_data (dir):
    '
    :p aram dir: Picture Root '
    input_paths = Glob.glob (Os.path.join (dir, "*.jpg")
    decode = Tf.image.decode_jpeg
    If len (input_paths) = = 0:    #如果不存在jpg图片, then traverse the png picture
        input_paths = Glob.glob (Os.path.join (dir, "*.png"))
        decode = Tf.image.decode_png
    If len (input_paths) = = 0:    #如果png图片不存在, throwing an exception
        raise Exception ("Input_dir Contains no image files ")

    #产生文件路径队列, and upset order
    Path_queue = Tf.train.string_input_producer (input_paths, shuffle=true)
    reader = tf. Wholefilereader ()   #创建读取文件对象
    paths, contents = Reader.read (path_queue) #从队列中读取
    Img_raw = decode ( Contents)
    # reduces the picture to 256*256, if the image is preprocessed (indented) before this, then this step can be omitted
    Img_raw = Tf.image.resize_images (Img_raw, [256, 256])
    Img_raw = Tf.image.convert_image_dtype (Img_raw, Dtype=tf.float32)
    img_raw.set_shape ([256, 256, 3]) # Set shape return
    Img_raw

Next look at how to use:

img = Read_data ("d:/test/*")
Img_batch = Tf.train.batch ([img], batch_size=30)

init = Tf.initialize_all_ Variables () with

TF. Session () as Sess:
    sess.run (init)
    threads = tf.train.start_queue_runners (sess=sess) for
    I in range (3):
        IMGs = Sess.run (img_batch)
        print (Imgs.shape)

Look at the results of the operation:

(256, 256, 3)
(256, 256, 3)
(30, 256, 256, 3)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.