fluid.data_feeder¶
DataFeeder¶
-
class
paddle.fluid.data_feeder.
DataFeeder
(feed_list, place, program=None)[source] DataFeeder converts the data that returned by a reader into a data structure that can feed into Executor and ParallelExecutor. The reader usually returns a list of mini-batch data entries. Each data entry in the list is one sample. Each sample is a list or a tuple with one feature or multiple features.
The simple usage shows below:
import paddle.fluid as fluid place = fluid.CPUPlace() img = fluid.layers.data(name='image', shape=[1, 28, 28]) label = fluid.layers.data(name='label', shape=[1], dtype='int64') feeder = fluid.DataFeeder([img, label], fluid.CPUPlace()) result = feeder.feed([([0] * 784, [9]), ([1] * 784, [1])])
If you want to feed data into GPU side separately in advance when you use multi-GPU to train a model, you can use decorate_reader function.
import paddle import paddle.fluid as fluid place=fluid.CUDAPlace(0) data = fluid.layers.data(name='data', shape=[3, 224, 224], dtype='float32') label = fluid.layers.data(name='label', shape=[1], dtype='int64') feeder = fluid.DataFeeder(place=place, feed_list=[data, label]) reader = feeder.decorate_reader( paddle.batch(paddle.dataset.flowers.train(), batch_size=16), multi_devices=False)
- Parameters
feed_list (list) – The Variables or Variables’name that will feed into model.
place (Place) – place indicates feed data into CPU or GPU, if you want to feed data into GPU, please using fluid.CUDAPlace(i) (i represents the GPU id), or if you want to feed data into CPU, please using fluid.CPUPlace().
program (Program) – The Program that will feed data into, if program is None, it will use default_main_program(). Default None.
- Raises
ValueError
– If some Variable is not in this Program.
Examples
import numpy as np import paddle import paddle.fluid as fluid place = fluid.CPUPlace() def reader(): yield [np.random.random([4]).astype('float32'), np.random.random([3]).astype('float32')], main_program = fluid.Program() startup_program = fluid.Program() with fluid.program_guard(main_program, startup_program): data_1 = fluid.layers.data(name='data_1', shape=[1, 2, 2]) data_2 = fluid.layers.data(name='data_2', shape=[1, 1, 3]) out = fluid.layers.fc(input=[data_1, data_2], size=2) # ... feeder = fluid.DataFeeder([data_1, data_2], place) exe = fluid.Executor(place) exe.run(startup_program) for data in reader(): outs = exe.run(program=main_program, feed=feeder.feed(data), fetch_list=[out])
-
feed
(iterable) According to feed_list and iterable, converters the input into a data structure that can feed into Executor and ParallelExecutor.
- Parameters
iterable (list|tuple) – the input data.
- Returns
the result of conversion.
- Return type
dict
Examples
import numpy.random as random import paddle.fluid as fluid def reader(limit=5): for i in range(limit): yield random.random([784]).astype('float32'), random.random([1]).astype('int64'), random.random([256]).astype('float32') data_1 = fluid.layers.data(name='data_1', shape=[1, 28, 28]) data_2 = fluid.layers.data(name='data_2', shape=[1], dtype='int64') data_3 = fluid.layers.data(name='data_3', shape=[16, 16], dtype='float32') feeder = fluid.DataFeeder(['data_1','data_2', 'data_3'], fluid.CPUPlace()) result = feeder.feed(reader())
-
feed_parallel
(iterable, num_places=None) Takes multiple mini-batches. Each mini-batch will be feed on each device in advance.
- Parameters
iterable (list|tuple) – the input data.
num_places (int) – the number of devices. Default None.
- Returns
the result of conversion.
- Return type
dict
Notes
The number of devices and number of mini-batches must be same.
Examples
import numpy.random as random import paddle.fluid as fluid def reader(limit=10): for i in range(limit): yield [random.random([784]).astype('float32'), random.randint(10)], x = fluid.layers.data(name='x', shape=[1, 28, 28]) y = fluid.layers.data(name='y', shape=[1], dtype='int64') feeder = fluid.DataFeeder(['x','y'], fluid.CPUPlace()) place_num = 2 places = [fluid.CPUPlace() for x in range(place_num)] data = [] exe = fluid.Executor(fluid.CPUPlace()) exe.run(fluid.default_startup_program()) program = fluid.CompiledProgram(fluid.default_main_program()).with_data_parallel(places=places) for item in reader(): data.append(item) if place_num == len(data): exe.run(program=program, feed=list(feeder.feed_parallel(data, place_num)), fetch_list=[]) data = []
-
decorate_reader
(reader, multi_devices, num_places=None, drop_last=True) Converter the input data into a data that returned by reader into multiple mini-batches. Each mini-batch will be feed on each device.
- Parameters
reader (function) – the reader is the function which can generate data.
multi_devices (bool) – whether to use multiple devices or not.
num_places (int) – if multi_devices is True, you can specify the number of GPU to use, if multi_devices is None, the function will use all the GPU of the current machine. Default None.
drop_last (bool) – whether to drop the last batch if the size of the last batch is less than batch_size. Default True.
- Returns
the result of conversion.
- Return type
dict
- Raises
ValueError
– If drop_last is False and the data batch cannot fit for devices.
Examples
import numpy.random as random import paddle import paddle.fluid as fluid def reader(limit=5): for i in range(limit): yield (random.random([784]).astype('float32'), random.random([1]).astype('int64')), place=fluid.CUDAPlace(0) data = fluid.layers.data(name='data', shape=[1, 28, 28], dtype='float32') label = fluid.layers.data(name='label', shape=[1], dtype='int64') feeder = fluid.DataFeeder(place=place, feed_list=[data, label]) reader = feeder.decorate_reader(reader, multi_devices=False) exe = fluid.Executor(place) exe.run(fluid.default_startup_program()) for data in reader(): exe.run(feed=data)