Generating tfrecords format data and using the DataSet API to use Tfrecords data

Last Update:2018-09-03 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Tfrecords is a built-in file format designed in TensorFlow, which is a binary file with the following advantages:

Framework for unifying different input files
It is better to use memory, more convenient to copy and move (Tfrecord compressed binary file, protocal buffer serialization)
is used to store binary data and tags (training category labels) data in the same file

When storing other data as tfrecords files, two steps are required:

Build Tfrecord Memory

　　在tensorflow中使用下面语句来简历tfrecord存储器：

Tf.python_io. Tfrecordwriter (PATH)

Path: Paths to the Tfrecords files created

Method:

Write: Writes a string record (that is, a sample) to a file
Close (): Closes the file writer after all files have been written.

Note: The string here is a serialized example, which Example.SerializeToString() is implemented by compressing the map in example into binary, saving a lot of space.

Construct the example module for each sample

The example module is defined as follows:

message Example {  = 1;}; Message Features {  map<string, feature> Feature = 1;}; Message Feature {  oneof kind {    = 1;     = 2;     = 3;  }};

As you can see, example can include data in three formats: Tf.int64,tf.float32 and binary types.

Features are saved in the form of key-value pairs. The sample code is as follows:

Example =Tf.train.Example (Features=tf.train.features (feature={                "label": Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[1]])),                'Img_raw': Tf.train.Feature (Bytes_list=tf.train.byteslist (value=[Img_raw])),'X1_offset': Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[2]])),                'Y1_offset': Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[3]])),                'X2_offset': Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[4]])),                'Y2_offset': Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[5]])),                'Beta_det': Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[6]])),                'Beta_bbox': Tf.train.Feature (Float_list=tf.train.floatlist (value=[string[7]]))            }))

Once the example module is constructed, we can write the sample to the file:

Writer.write (example. Serializetostring ())

Do not forget to close the file writer after all the files have been written.

Second, after creating our own Tfrecords file, we can use it in training. TensorFlow provides us with a dataset for this API to make it easy to use the Tfrecords file.

First, we define a function that parses tfrecords, which is used to parse a binary file into tensor. The sample code is as follows:

defPARES_TF (Example_proto):#define a dictionary of parsingDics = {        'label': TF. Fixedlenfeature ([], tf.float32),'Img_raw': TF. Fixedlenfeature ([], tf.string),'X1_offset': TF. Fixedlenfeature ([], tf.float32),'Y1_offset': TF. Fixedlenfeature ([], tf.float32),'X2_offset': TF. Fixedlenfeature ([], tf.float32),'Y2_offset': TF. Fixedlenfeature ([], tf.float32),'Beta_det': TF. Fixedlenfeature ([], tf.float32),'Beta_bbox': TF. Fixedlenfeature ([], Tf.float32)}#calling interface parsing a row of samplesParsed_example = Tf.parse_single_example (serialized=example_proto,features=dics) Image= Tf.decode_raw (parsed_example['Img_raw'],out_type=tf.uint8) Image= Tf.reshape (image,shape=[12,12,3])    #This is a normalization of the image data.Image = (Tf.cast (image,tf.float32)/255.0) Label= parsed_example['label'] Label=tf.reshape (label,shape=[1]) Label=tf.cast (label,tf.float32) X1_offset=parsed_example['X1_offset'] X1_offset= Tf.reshape (X1_offset, shape=[1]) Y1_offset=parsed_example['Y1_offset'] Y1_offset= Tf.reshape (Y1_offset, shape=[1]) X2_offset=parsed_example['X2_offset'] X2_offset= Tf.reshape (X2_offset, shape=[1]) Y2_offset=parsed_example['Y2_offset'] Y2_offset= Tf.reshape (Y2_offset, shape=[1]) Beta_det=parsed_example['Beta_det'] Beta_det=tf.reshape (beta_det,shape=[1]) Beta_bbox=parsed_example['Beta_bbox'] Beta_bbox=tf.reshape (beta_bbox,shape=[1])    returnImage,label,x1_offset,y1_offset,x2_offset,y2_offset,beta_det,beta_bbox

Next, we need to read the Tfrecords file using Tf.data.TFRecordDataset (filenames).

A dataset becomes a new dataset through transformation. Usually we can complete the data transformation through the transformation, disrupt, compose batch, generate epoch and so on a series of operations.

Common transformation are: map, batch, shuffle, repeat.

Map

　　Map receives a function, each element in the dataset is treated as input to the function, and the function return value is used as the new dataset

Batch

Batch is the combination of multiple elements into batch

Repeat

The function of repeat is to repeat the entire sequence several times, primarily to deal with the epoch in machine learning, assuming that the original data is an epoch and that using repeat (5) can be used to turn it into 5 epochs.

Shuffle

The function of shuffle is to disrupt the elements in the dataset, and it has a parameter buffersize that indicates the size to use when scrambling.

Example code:

DataSet = Tf.data.TFRecordDataset (filenames=== Dataset.batch (+). Repeat (1) #整个序列只使用一次, Make a batch of 16 samples per use

Now that the sample of this batch is ready, how do you take it out for training? The answer is to use iterators, where the statements in TensorFlow are as follows:

iterator = Dataset.make_one_shot_iterator ()

The so-called one_shot means that it can only be read from beginning to end, so how do you take a different sample in each training round? Iterator's Get_netxt () method can be implemented. It should be noted that the use of Get_next () here is only a tensor, not a specific value, when training to use this value, we need to be in the session to obtain.

The complete code for reading the Tfrecords file using the dataset is as follows:

defPARES_TF (Example_proto):#define a dictionary of parsingDics = {        'label': TF. Fixedlenfeature ([], tf.float32),'Img_raw': TF. Fixedlenfeature ([], tf.string),'X1_offset': TF. Fixedlenfeature ([], tf.float32),'Y1_offset': TF. Fixedlenfeature ([], tf.float32),'X2_offset': TF. Fixedlenfeature ([], tf.float32),'Y2_offset': TF. Fixedlenfeature ([], tf.float32),'Beta_det': TF. Fixedlenfeature ([], tf.float32),'Beta_bbox': TF. Fixedlenfeature ([], Tf.float32)}#calling interface parsing a row of samplesParsed_example = Tf.parse_single_example (serialized=example_proto,features=dics) Image= Tf.decode_raw (parsed_example['Img_raw'],out_type=tf.uint8) Image= Tf.reshape (image,shape=[12,12,3])    #This is a normalization of the image data.Image = (Tf.cast (image,tf.float32)/255.0) Label= parsed_example['label'] Label=tf.reshape (label,shape=[1]) Label=tf.cast (label,tf.float32) X1_offset=parsed_example['X1_offset'] X1_offset= Tf.reshape (X1_offset, shape=[1]) Y1_offset=parsed_example['Y1_offset'] Y1_offset= Tf.reshape (Y1_offset, shape=[1]) X2_offset=parsed_example['X2_offset'] X2_offset= Tf.reshape (X2_offset, shape=[1]) Y2_offset=parsed_example['Y2_offset'] Y2_offset= Tf.reshape (Y2_offset, shape=[1]) Beta_det=parsed_example['Beta_det'] Beta_det=tf.reshape (beta_det,shape=[1]) Beta_bbox=parsed_example['Beta_bbox'] Beta_bbox=tf.reshape (beta_bbox,shape=[1])    returnImage,label,x1_offset,y1_offset,x2_offset,y2_offset,beta_det,beta_bboxdataset= Tf.data.TFRecordDataset (filenames=[filename]) DataSet=Dataset.map (PARES_TF) DataSet= Dataset.batch (+). Repeat (1) Iterator=dataset.make_one_shot_iterator () next_element=Iterator.get_next () with TF. Session () as sess:img, label, X1_offset, Y1_offset, X2_offset, Y2_offset, Beta_det, Beta_bbox= Sess.run (fetches=next_element)

Generating tfrecords format data and using the DataSet API to use Tfrecords data

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More