Data objects in the TensorFlow dataset

Source: Internet
Author: User
Tags generator random seed scalar shuffle


Basic concepts

The official document in TensorFlow describes the dataset data object in this way:

A dataset can be used to represent a collection of input pipeline elements (a nested structure of tensor) and a "logical plan" for translating these elements into action. In a dataset, an element can be a vector, tuple, or dictionary form.
In addition, the dataset needs to be used in conjunction with another class iterator, the iterator object is an iterator that can iterate over the elements in the dataset.

To see a simple example:

#创建一个Dataset对象dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4,5,6,7,8,9])#创建一个迭代器iterator = dataset.make_one_shot_iterator()#get_next()函数可以帮助我们从迭代器中获取元素element = iterator.get_next()#遍历迭代器,获取所有元素with tf.Session() as sess:   for i in range(9):       print(sess.run(element))

The above print results are: 1 2 3 4 5 6 7 8 9


DataSet method


1.from_tensor_slices

From_tensor_slices is used to create a dataset whose elements are elements of a slice of a given tensor.

function form: from_tensor_slices (tensors)

Parameter tensors: A nested structure of tensor, each of which has the same size in the No. 0 dimension.

Specific examples

#创建切片形式的datasetdataset = tf.data.Dataset.from_tensor_slices([1,2,3,4,5,6,7,8,9])#创建一个迭代器iterator = dataset.make_one_shot_iterator()#get_next()函数可以帮助我们从迭代器中获取元素element = iterator.get_next()#遍历迭代器,获取所有元素with tf.Session() as sess:   for i in range(3):       print(sess.run(element))

The above code runs the result: 1 2 3


2.from_tensors

Creates a dataset that contains a single element of the given tensor.

function form: from_tensors (tensors)

Parameter tensors: The nesting structure of the tensor.

Specific examples

dataset = tf.data.Dataset.from_tensors([1,2,3,4,5,6,7,8,9])iterator = concat_dataset.make_one_shot_iterator()element = iterator.get_next()with tf.Session() as sess:   for i in range(1):       print(sess.run(element))

The above code runs the result: [1,2,3,4,5,6,7,8,9]
That is, the from_tensors is to manipulate tensors as a whole, and from_tensor_slices can manipulate the elements inside the tensors.


3.from_generator

Creates a dataset by which the element generator is generated.

function form: From_generator (Generator,output_types,output_shapes=none,args=none)

Parameter generator: A callable object that returns an object that supports the ITER () protocol. If args is not specified, generator is not a parameter; Otherwise it must take args with as many parameters as there are values.
Parameter OUTPUT_TYPES:TF. The dtype corresponds to the nested structure generator of each component of the element that is generated by the element.
Parameter OUTPUT_SHAPES:TF. Tensorshape the nested structure of an object that corresponds to each component of an element generated by an element generator
Parameter ARGS:TF. Tensor will be computed and the generator will be passed as an object tuple for the numpy array parameter.

Specific examples

#定义一个生成器def data_generator():    dataset = np.array(range(9))    for i in dataset:        yield i#接收生成器,并生产dataset数据结构dataset = tf.data.Dataset.from_generator(data_generator, (tf.int32))iterator = concat_dataset.make_one_shot_iterator()element = iterator.get_next()with tf.Session() as sess:   for i in range(3):       print(sess.run(element))

The above code runs the result: 0 1 2


4.batch

Batch can synthesize successive elements of a dataset into batches.

function form: Batch (Batch_size,drop_remainder=false)

Parameter batch_size: Represents the number of contiguous elements of this dataset to be merged in a single batch.
Parameter Drop_remainder: Indicates whether the last batch should be deleted if there are less than batch_size elements; The default is not to delete.

Specific examples:

#创建一个Dataset对象dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4,5,6,7,8,9])'''合成批次'''dataset=dataset.batch(3)#创建一个迭代器iterator = dataset.make_one_shot_iterator()#get_next()函数可以帮助我们从迭代器中获取元素element = iterator.get_next()#遍历迭代器,获取所有元素with tf.Session() as sess:   for i in range(9):       print(sess.run(element))

The result of the above code operation is:
[1 2 3]
[4 5 6]
[7 8 9]

That is, the target object is synthesized into 3 batches, and the returned object is an incoming DataSet object.


5.concatenate

Concatenate can merge or concatenate two DataSet objects.

function form: Concatenate (DataSet)

Parameter DataSet: Represents the DataSet object that needs to be passed in.

Specific examples:

#创建dataset对象dataset_a=tf.data.Dataset.from_tensor_slices([1,2,3])dataset_b=tf.data.Dataset.from_tensor_slices([4,5,6])#合并datasetconcat_dataset=dataset_a.concatenate(dataset_b)iterator = concat_dataset.make_one_shot_iterator()element = iterator.get_next()with tf.Session() as sess:   for i in range(6):       print(sess.run(element))

Results of the above code run: 1 2 3 4 5 6


6.filter

Filter enables conditional filtering of incoming dataset data.

Function form: filter (predicate)

Parameter predicate: Conditional filter function

Specific examples

dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4,5,6,7,8,9])#对dataset内的数据进行条件过滤dataset=dataset.filter(lambda x:x>3)iterator = dataset.make_one_shot_iterator()element = iterator.get_next()with tf.Session() as sess:    for i in range(6):       print(sess.run(element))

Results of the above code run: 4 5 6 7 8 9


7.map

Map can map the Map_func function to a dataset

function form: Flat_map (Map_func,num_parallel_calls=none)

Parameter Map_func: mapping function
Parameter num_parallel_calls: Represents a numeric element to be processed in parallel. If not specified, the elements are processed sequentially.

Specific examples

dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4,5,6,7,8,9])#进行map操作dataset=dataset.map(lambda x:x+1)iterator = dataset.make_one_shot_iterator()element = iterator.get_next()with tf.Session() as sess:   for i in range(6):       print(sess.run(element))

Results of the above code run: 2 3 4 5 6 7


8.flat_map

Flat_map can map the Map_func function to a dataset (unlike the map, the Flat_map incoming data must be a dataset).

function form: Flat_map (Map_func)

Parameter Map_func: mapping function

Specific examples

dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4,5,6,7,8,9])#进行flat_map操作dataset=dataset.flat_map(lambda x:tf.data.Dataset.from_tensor_slices(x+[1]))iterator = dataset.make_one_shot_iterator()element = iterator.get_next()with tf.Session() as sess:   for i in range(6):       print(sess.run(element))

Results of the above code run: 2 3 4 5 6 7


9.make_one_shot_iterator

Creates an element that iterator is used to enumerate this dataset. (can be initialized automatically)

function form: Make_one_shot_iterator ()

Specific examples

dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4,5,6,7,8,9])iterator = dataset.make_one_shot_iterator()element = iterator.get_next()with tf.Session() as sess:   for i in range(6):       print(sess.run(element))


10.make_initializable_iterator

Creates an element that iterator is used to enumerate this dataset. (You need to initialize the iterator before using this function.)

function form: Make_initializable_iterator (Shared_name=none)

Parameter shared_name: (optional) if not NULL, the returned iterator will share multiple sessions of the same device under the given name (for example, when using a remote server)

Specific examples

dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4,5,6,7,8,9])iterator = dataset.make_initializable_iterator()element = iterator.get_next()with tf.Session() as sess:   #对迭代器进行初始化操作   sess.run(iterator.initializer)   for i in range(5):       print(sess.run(element))


11.padded_batch

Combines successive elements of a dataset into a filled batch, which combines multiple contiguous elements of an input dataset into a single element.

function form: Padded_batch (Batch_size,padded_shapes,padding_values=none,drop_remainder=false)

Parameter batch_size: Represents the number of contiguous elements of this dataset to be merged in a single batch.
Parameter padded_shapes: nested structure TF. Tensorshape or Tf.int64 objects that resemble vector tensor, representing the shape of the corresponding component of each input element that should be populated before the batch is processed. Any unknown size (for example, TF. Dimension (None) in a TF. Tensorshape or-1 of similar tensor objects) will be filled to the maximum size of the dimension in each batch.
Parameter padding_values: (optional) A scalar-shaped nested structure TF. A Tensor that represents the padding value for each component. The default value of 0 is used for numeric types, and an empty string for string types.
Parameter drop_remainder: (optional) a tf.bool scalar tf.tensor that indicates whether the last batch should be deleted if it is less than the batch_size element; The default behavior is to not delete smaller batches.

Specific examples

dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4,5,6,7,8,9])dataset=dataset.padded_batch(2,padded_shapes=[])iterator = dataset.make_one_shot_iterator()element = iterator.get_next()with tf.Session() as sess:   for i in range(6):       print(sess.run(element))

The above code runs the result:
[1 2]
[3 4]


12.repeat

Repeat this data set count number of times

function form: Repeat (count=none)

Parameter count: (optional) indicates the number of times the dataset should be duplicated. The default behavior (if Count is None or-1) is a data set that repeats indefinitely.

Specific examples

dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4,5,6,7,8,9])#无限次重复dataset数据集dataset=dataset.repeat()iterator = dataset.make_one_shot_iterator()element = iterator.get_next()with tf.Session() as sess:   for i in range(30,35):       print(sess.run(element))

Results of the above code run: 1 2 3 4 5


13.shard

Splits a dataset into a num_shards sub-dataset. This function is useful in distributed training, which allows each device to read a unique subset.

function form: Shard (Num_shards,index)

Parameter num_shards: Represents the number of shards running in parallel.
Argument index: Indicates the worker index.


14.shuffle

The elements of a randomly mixed data set.

function form: Shuffle (Buffer_size,seed=none,reshuffle_each_iteration=none)

Parameter buffer_size: Represents the number of elements in the dataset from which the new data set will be sampled.
Parameter seed: (optional) represents the random seed that will be used to create the distribution.
Parameter reshuffle_each_iteration: (optional) A Boolean value that, if true, indicates that a pseudo-random reassembly of the dataset should be performed for each iteration. (The default is true.) )

Specific examples

dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4,5,6,7,8,9])#随机混洗数据dataset=dataset.shuffle(3)iterator = dataset.make_one_shot_iterator()element = iterator.get_next()with tf.Session() as sess:   for i in range(30,35):       print(sess.run(element))

The above code runs the result: 3 2 4


15.skip

Generates a data set that skips the Count element.

function form: Skip (count)

Parameter count: Represents the number of elements of this dataset that should be skipped to form a new dataset. If count is greater than the size of this dataset, the new dataset will not contain any elements. If Count is-1, the entire data set is skipped.

Specific examples

dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4,5,6,7,8,9])#跳过前5个元素dataset=dataset.skip(5)iterator = dataset.make_one_shot_iterator()element = iterator.get_next()with tf.Session() as sess:   for i in range(30,35):       print(sess.run(element))

The above code runs the result: 6 7 8


16.take

Extract the first count elements of a formative dataset

function form: Take (count)

Parameter count: Represents the number of elements of this dataset that should be used to form a new dataset. If Count is-1, or count is greater than the size of this dataset, the new dataset will contain all the elements of this dataset.

Specific examples

dataset = tf.data.Dataset.from_tensor_slices([1,2,2,3,4,5,6,7,8,9])#提取前5个元素形成新数据dataset=dataset.take(5)iterator = dataset.make_one_shot_iterator()element = iterator.get_next()with tf.Session() as sess:   for i in range(30,35):       print(sess.run(element))

The above code runs the result: 1 2 2


17.zip

Compress a given dataset together

Function form: Zip (Datasets)

Parameter datesets: The nested structure of the dataset.

Specific examples

dataset_a=tf.data.Dataset.from_tensor_slices([1,2,3])dataset_b=tf.data.Dataset.from_tensor_slices([2,6,8])zip_dataset=tf.data.Dataset.zip((dataset_a,dataset_b))iterator = dataset.make_one_shot_iterator()element = iterator.get_next()with tf.Session() as sess:   for i in range(30,35):       print(sess.run(element))

The above code runs the result:
(1, 2)
(2, 6)
(3, 8)

Most of the methods in this dataset are explained here, but the use of these methods can play a big role in the modeling process.



See TensorFlow official documentation for more information

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.