fluid.data_feeder¶

DataFeeder¶

class paddle.fluid.data_feeder.DataFeeder(feed_list, place, program=None)[source]

DataFeeder converts the data that returned by a reader into a data structure that can feed into Executor and ParallelExecutor. The reader usually returns a list of mini-batch data entries. Each data entry in the list is one sample. Each sample is a list or a tuple with one feature or multiple features.

The simple usage shows below:

import paddle.fluid as fluid
place = fluid.CPUPlace()
img = fluid.layers.data(name='image', shape=[1, 28, 28])
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
feeder = fluid.DataFeeder([img, label], fluid.CPUPlace())
result = feeder.feed([([0] * 784, [9]), ([1] * 784, [1])])

If you want to feed data into GPU side separately in advance when you use multi-GPU to train a model, you can use decorate_reader function.

import paddle
import paddle.fluid as fluid

place=fluid.CUDAPlace(0)
data = fluid.layers.data(name='data', shape=[3, 224, 224], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')

feeder = fluid.DataFeeder(place=place, feed_list=[data, label])
reader = feeder.decorate_reader(
        paddle.batch(paddle.dataset.flowers.train(), batch_size=16), multi_devices=False)

Parameters

feed_list (list) – The Variables or Variables’name that will feed into model.
place (Place) – place indicates feed data into CPU or GPU, if you want to feed data into GPU, please using fluid.CUDAPlace(i) (i represents the GPU id), or if you want to feed data into CPU, please using fluid.CPUPlace().
program (Program) – The Program that will feed data into, if program is None, it will use default_main_program(). Default None.

Raises

ValueError – If some Variable is not in this Program.

Examples

import numpy as np
import paddle
import paddle.fluid as fluid

place = fluid.CPUPlace()

def reader():
    yield [np.random.random([4]).astype('float32'), np.random.random([3]).astype('float32')],

main_program = fluid.Program()
startup_program = fluid.Program()

with fluid.program_guard(main_program, startup_program):
    data_1 = fluid.layers.data(name='data_1', shape=[1, 2, 2])
    data_2 = fluid.layers.data(name='data_2', shape=[1, 1, 3])
    out = fluid.layers.fc(input=[data_1, data_2], size=2)
    # ...

feeder = fluid.DataFeeder([data_1, data_2], place)

exe = fluid.Executor(place)
exe.run(startup_program)
for data in reader():
    outs = exe.run(program=main_program,
                   feed=feeder.feed(data),
                   fetch_list=[out])

feed(iterable)

According to feed_list and iterable, converters the input into a data structure that can feed into Executor and ParallelExecutor.

Parameters: iterable (list|tuple) – the input data.
Returns: the result of conversion.
Return type: dict

Examples

import numpy.random as random
import paddle.fluid as fluid

def reader(limit=5):
    for i in range(limit):
        yield random.random([784]).astype('float32'), random.random([1]).astype('int64'), random.random([256]).astype('float32')

data_1 = fluid.layers.data(name='data_1', shape=[1, 28, 28])
data_2 = fluid.layers.data(name='data_2', shape=[1], dtype='int64')
data_3 = fluid.layers.data(name='data_3', shape=[16, 16], dtype='float32')
feeder = fluid.DataFeeder(['data_1','data_2', 'data_3'], fluid.CPUPlace())

result = feeder.feed(reader())

feed_parallel(iterable, num_places=None)

Takes multiple mini-batches. Each mini-batch will be feed on each device in advance.

Parameters

iterable (list|tuple) – the input data.
num_places (int) – the number of devices. Default None.

Returns

the result of conversion.

Return type

dict

Notes

The number of devices and number of mini-batches must be same.

Examples

import numpy.random as random
import paddle.fluid as fluid

def reader(limit=10):
    for i in range(limit):
        yield [random.random([784]).astype('float32'), random.randint(10)],

x = fluid.layers.data(name='x', shape=[1, 28, 28])
y = fluid.layers.data(name='y', shape=[1], dtype='int64')

feeder = fluid.DataFeeder(['x','y'], fluid.CPUPlace())
place_num = 2
places = [fluid.CPUPlace() for x in range(place_num)]
data = []
exe = fluid.Executor(fluid.CPUPlace())
exe.run(fluid.default_startup_program())
program = fluid.CompiledProgram(fluid.default_main_program()).with_data_parallel(places=places)
for item in reader():
    data.append(item)
    if place_num == len(data):
        exe.run(program=program, feed=list(feeder.feed_parallel(data, place_num)), fetch_list=[])
        data = []

decorate_reader(reader, multi_devices, num_places=None, drop_last=True)

Converter the input data into a data that returned by reader into multiple mini-batches. Each mini-batch will be feed on each device.

Parameters

reader (function) – the reader is the function which can generate data.
multi_devices (bool) – whether to use multiple devices or not.
num_places (int) – if multi_devices is True, you can specify the number of GPU to use, if multi_devices is None, the function will use all the GPU of the current machine. Default None.
drop_last (bool) – whether to drop the last batch if the size of the last batch is less than batch_size. Default True.

Returns

the result of conversion.

Return type

dict

Raises

ValueError – If drop_last is False and the data batch cannot fit for devices.

Examples

import numpy.random as random
import paddle
import paddle.fluid as fluid

def reader(limit=5):
    for i in range(limit):
        yield (random.random([784]).astype('float32'), random.random([1]).astype('int64')),

place=fluid.CUDAPlace(0)
data = fluid.layers.data(name='data', shape=[1, 28, 28], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')

feeder = fluid.DataFeeder(place=place, feed_list=[data, label])
reader = feeder.decorate_reader(reader, multi_devices=False)

exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
for data in reader():
    exe.run(feed=data)