Generating tfrecords format data and using the DataSet API to use Tfrecords data

Source: Internet
Author: User
Tags shuffle

Tfrecords is a built-in file format designed in TensorFlow, which is a binary file with the following advantages:

    • Framework for unifying different input files
    • It is better to use memory, more convenient to copy and move (Tfrecord compressed binary file, protocal buffer serialization)
    • is used to store binary data and tags (training category labels) data in the same file

When storing other data as tfrecords files, two steps are required:

Build Tfrecord Memory

  在tensorflow中使用下面语句来简历tfrecord存储器:

Tf.python_io. Tfrecordwriter (PATH)

Path: Paths to the Tfrecords files created

Method:

    • Write: Writes a string record (that is, a sample) to a file
    • Close (): Closes the file writer after all files have been written.

Note: The string here is a serialized example, which Example.SerializeToString() is implemented by compressing the map in example into binary, saving a lot of space.

Construct the example module for each sample

The example module is defined as follows:

message Example {  = 1;}; Message Features {  map<string, feature> Feature = 1;}; Message Feature {  oneof kind {    = 1;     = 2;     = 3;  }};

As you can see, example can include data in three formats: Tf.int64,tf.float32 and binary types.

Features are saved in the form of key-value pairs. The sample code is as follows:

Example =Tf.train.Example (Features=tf.train.features (feature={                "label": Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[1]])),                'Img_raw': Tf.train.Feature (Bytes_list=tf.train.byteslist (value=[Img_raw])),'X1_offset': Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[2]])),                'Y1_offset': Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[3]])),                'X2_offset': Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[4]])),                'Y2_offset': Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[5]])),                'Beta_det': Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[6]])),                'Beta_bbox': Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[7]]))            }))

Once the example module is constructed, we can write the sample to the file:

Writer.write (example. Serializetostring ())

Do not forget to close the file writer after all the files have been written.

Second, after creating our own Tfrecords file, we can use it in training. TensorFlow provides us with a dataset for this API to make it easy to use the Tfrecords file.

First, we define a function that parses tfrecords, which is used to parse a binary file into tensor. The sample code is as follows:

defPARES_TF (Example_proto):#define a dictionary of parsingDics = {        'label': TF. Fixedlenfeature ([], tf.float32),'Img_raw': TF. Fixedlenfeature ([], tf.string),'X1_offset': TF. Fixedlenfeature ([], tf.float32),'Y1_offset': TF. Fixedlenfeature ([], tf.float32),'X2_offset': TF. Fixedlenfeature ([], tf.float32),'Y2_offset': TF. Fixedlenfeature ([], tf.float32),'Beta_det': TF. Fixedlenfeature ([], tf.float32),'Beta_bbox': TF. Fixedlenfeature ([], Tf.float32)}#calling interface parsing a row of samplesParsed_example = Tf.parse_single_example (serialized=example_proto,features=dics) Image= Tf.decode_raw (parsed_example['Img_raw'],out_type=tf.uint8) Image= Tf.reshape (image,shape=[12,12,3])    #This is a normalization of the image data.Image = (Tf.cast (image,tf.float32)/255.0) Label= parsed_example['label'] Label=tf.reshape (label,shape=[1]) Label=tf.cast (label,tf.float32) X1_offset=parsed_example['X1_offset'] X1_offset= Tf.reshape (X1_offset, shape=[1]) Y1_offset=parsed_example['Y1_offset'] Y1_offset= Tf.reshape (Y1_offset, shape=[1]) X2_offset=parsed_example['X2_offset'] X2_offset= Tf.reshape (X2_offset, shape=[1]) Y2_offset=parsed_example['Y2_offset'] Y2_offset= Tf.reshape (Y2_offset, shape=[1]) Beta_det=parsed_example['Beta_det'] Beta_det=tf.reshape (beta_det,shape=[1]) Beta_bbox=parsed_example['Beta_bbox'] Beta_bbox=tf.reshape (beta_bbox,shape=[1])    returnImage,label,x1_offset,y1_offset,x2_offset,y2_offset,beta_det,beta_bbox

Next, we need to read the Tfrecords file using Tf.data.TFRecordDataset (filenames).

A dataset becomes a new dataset through transformation. Usually we can complete the data transformation through the transformation, disrupt, compose batch, generate epoch and so on a series of operations.

Common transformation are: map, batch, shuffle, repeat.

Map

  Map receives a function, each element in the dataset is treated as input to the function, and the function return value is used as the new dataset

Batch

Batch is the combination of multiple elements into batch

Repeat

The function of repeat is to repeat the entire sequence several times, primarily to deal with the epoch in machine learning, assuming that the original data is an epoch and that using repeat (5) can be used to turn it into 5 epochs.

Shuffle

The function of shuffle is to disrupt the elements in the dataset, and it has a parameter buffersize that indicates the size to use when scrambling.

Example code:

DataSet = Tf.data.TFRecordDataset (filenames=== Dataset.batch (+). Repeat (1) #整个序列只使用一次, Make a batch of 16 samples per use

Now that the sample of this batch is ready, how do you take it out for training? The answer is to use iterators, where the statements in TensorFlow are as follows:

iterator = Dataset.make_one_shot_iterator ()

The so-called one_shot means that it can only be read from beginning to end, so how do you take a different sample in each training round? Iterator's Get_netxt () method can be implemented. It should be noted that the use of Get_next () here is only a tensor, not a specific value, when training to use this value, we need to be in the session to obtain.

The complete code for reading the Tfrecords file using the dataset is as follows:

defPARES_TF (Example_proto):#define a dictionary of parsingDics = {        'label': TF. Fixedlenfeature ([], tf.float32),'Img_raw': TF. Fixedlenfeature ([], tf.string),'X1_offset': TF. Fixedlenfeature ([], tf.float32),'Y1_offset': TF. Fixedlenfeature ([], tf.float32),'X2_offset': TF. Fixedlenfeature ([], tf.float32),'Y2_offset': TF. Fixedlenfeature ([], tf.float32),'Beta_det': TF. Fixedlenfeature ([], tf.float32),'Beta_bbox': TF. Fixedlenfeature ([], Tf.float32)}#calling interface parsing a row of samplesParsed_example = Tf.parse_single_example (serialized=example_proto,features=dics) Image= Tf.decode_raw (parsed_example['Img_raw'],out_type=tf.uint8) Image= Tf.reshape (image,shape=[12,12,3])    #This is a normalization of the image data.Image = (Tf.cast (image,tf.float32)/255.0) Label= parsed_example['label'] Label=tf.reshape (label,shape=[1]) Label=tf.cast (label,tf.float32) X1_offset=parsed_example['X1_offset'] X1_offset= Tf.reshape (X1_offset, shape=[1]) Y1_offset=parsed_example['Y1_offset'] Y1_offset= Tf.reshape (Y1_offset, shape=[1]) X2_offset=parsed_example['X2_offset'] X2_offset= Tf.reshape (X2_offset, shape=[1]) Y2_offset=parsed_example['Y2_offset'] Y2_offset= Tf.reshape (Y2_offset, shape=[1]) Beta_det=parsed_example['Beta_det'] Beta_det=tf.reshape (beta_det,shape=[1]) Beta_bbox=parsed_example['Beta_bbox'] Beta_bbox=tf.reshape (beta_bbox,shape=[1])    returnImage,label,x1_offset,y1_offset,x2_offset,y2_offset,beta_det,beta_bboxdataset= Tf.data.TFRecordDataset (filenames=[filename]) DataSet=Dataset.map (PARES_TF) DataSet= Dataset.batch (+). Repeat (1) Iterator=dataset.make_one_shot_iterator () next_element=Iterator.get_next () with TF. Session () as sess:img, label, X1_offset, Y1_offset, X2_offset, Y2_offset, Beta_det, Beta_bbox= Sess.run (fetches=next_element)

Generating tfrecords format data and using the DataSet API to use Tfrecords data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.