io

batch

paddle.fluid.layers.batch(reader, batch_size)[source]

This layer is a reader decorator. It takes a reader and adds ‘batching’ decoration on it. When reading with the result decorated reader, output data will be automatically organized to the form of batches.

Parameters
  • reader (Variable) – The reader to be decorated with ‘batching’.

  • batch_size (int) – The batch size.

Returns

The reader which has been decorated with ‘batching’.

Return type

Variable

Examples

import paddle.fluid as fluid
raw_reader = fluid.layers.io.open_files(filenames=['./data1.recordio',
                                               './data2.recordio'],
                                        shapes=[(3,224,224), (1,)],
                                        lod_levels=[0, 0],
                                        dtypes=['float32', 'int64'],
                                        thread_num=2,
                                        buffer_size=2)
batch_reader = fluid.layers.batch(reader=raw_reader, batch_size=5)

# If we read data with the raw_reader:
#     data = fluid.layers.read_file(raw_reader)
# We can only get data instance by instance.
#
# However, if we read data with the batch_reader:
#     data = fluid.layers.read_file(batch_reader)
# Each 5 adjacent instances will be automatically combined together
# to become a batch. So what we get('data') is a batch data instead
# of an instance.

create_py_reader_by_data

paddle.fluid.layers.create_py_reader_by_data(capacity, feed_list, name=None, use_double_buffer=True)[source]

Create a Python reader for data feeding in Python

This layer returns a Reader Variable.

Works much like py_reader except that it’s input is feed_list instead of shapes, dtypes and lod_levels

Parameters
  • capacity (int) – The buffer capacity maintained by py_reader.

  • feed_list (list(Variable)) – The data feed list.

  • name (basestring) – The prefix Python queue name and Reader name. None will be generated automatically.

  • use_double_buffer (bool) – Whether use double buffer or not.

Returns

A Reader from which we can get feeding data.

Return type

Variable

Examples

import paddle
import paddle.fluid as fluid
import paddle.dataset.mnist as mnist

def network(img, label):
    # User defined network. Here a simple regression as example
    predict = fluid.layers.fc(input=img, size=10, act='softmax')
    loss = fluid.layers.cross_entropy(input=predict, label=label)
    return fluid.layers.mean(loss)

image = fluid.layers.data(name='image', shape=[1, 28, 28], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
reader = fluid.layers.create_py_reader_by_data(capacity=64,
                                               feed_list=[image, label])
reader.decorate_paddle_reader(
    paddle.reader.shuffle(paddle.batch(mnist.train(), batch_size=5),
                          buf_size=500))

img, label = fluid.layers.read_file(reader)
loss = network(img, label)  # some network definition

fluid.Executor(fluid.CUDAPlace(0)).run(fluid.default_startup_program())

exe = fluid.ParallelExecutor(use_cuda=True, loss_name=loss.name)
for epoch_id in range(10):
    reader.start()
    try:
        while True:
            exe.run(fetch_list=[loss.name])
    except fluid.core.EOFException:
        reader.reset()

data

paddle.fluid.layers.data(name, shape, append_batch_size=True, dtype='float32', lod_level=0, type=VarType.LOD_TENSOR, stop_gradient=True)[source]

Data Layer

This function takes in the input and based on whether data has to be returned back as a minibatch, it creates the global variable by using the helper functions. The global variables can be accessed by all the following operators in the graph.

All the input variables of this function are passed in as local variables to the LayerHelper constructor.

Notice that paddle would only use shape to infer the shapes of following variables in the network during compile-time. During run-time, paddle would not check whether the shape of the feeded data matches the shape settings in this function.

Parameters
  • name (str) – The name/alias of the function

  • shape (list) – Tuple declaring the shape. If append_batch_size is True and there is no -1 inside shape, it should be considered as the shape of the each sample. Otherwise, it should be considered as the shape of the batched data.

  • append_batch_size (bool) –

    1. If true, it prepends -1 to the shape.

    For example if shape=[1], the resulting shape is [-1, 1]. This will be useful to set different batch size at run time.

    1. If shape contains -1, such as shape=[1, -1].

    append_batch_size will be enforced to be be False (ineffective) because PaddlePaddle cannot set more than 1 unknown number on the shape.

  • dtype (np.dtype|VarType|str) – The type of data : float32, float16, int etc

  • type (VarType) – The output type. By default it is LOD_TENSOR.

  • lod_level (int) – The LoD Level. 0 means the input data is not a sequence.

  • stop_gradient (bool) – A boolean that mentions whether gradient should flow.

Returns

The global variable that gives access to the data.

Return type

Variable

Examples

import paddle.fluid as fluid
data = fluid.layers.data(name='x', shape=[784], dtype='float32')

double_buffer

paddle.fluid.layers.double_buffer(reader, place=None, name=None)[source]

Wrap a double buffer reader. The data will copy to target place with a double buffer queue. If the target place is None, the place that executor perform on will be used.

Parameters
  • reader (Variable) – the reader variable need to be wrapped.

  • place (Place) – the place of target data. Default is the sample place of executor perform.

  • name (str) – Variable name. None if the user does not care.

Returns

wrapped reader with double buffer.

Examples

>>> import paddle.fluid as fluid
>>> reader = fluid.layers.open_files(filenames=['mnist.recordio'],
>>>                                  shapes=[[-1, 784], [-1, 1]],
>>>                                  dtypes=['float32', 'int64'])
>>> reader = fluid.layers.double_buffer(reader)
>>> img, label = fluid.layers.read_file(reader)

load

paddle.fluid.layers.load(out, file_path, load_as_fp16=None)[source]

Load operator will load a LoDTensor / SelectedRows variable from disk file.

>>> import paddle.fluid as fluid
>>> tmp_tensor = fluid.layers.create_tensor(dtype='float32')
>>> fluid.layers.load(tmp_tensor, "./tmp_tensor.bin")
Parameters
  • out (Variable) – The LoDTensor / SelectedRows need to be loaded.

  • file_path (STRING) – Variable will be loaded from “file_path”.

  • load_as_fp16 (BOOLEAN) – If true, the tensor will be first loaded and then converted to float16 data type. Otherwise, the tensor will be directly loaded without data type conversion. Default is false.

Returns

None

open_files

paddle.fluid.layers.open_files(filenames, shapes, lod_levels, dtypes, thread_num=None, buffer_size=None, pass_num=1, is_test=None)[source]

Open files

This layer takes a list of files to read from and returns a Reader Variable. Via the Reader Variable, we can get data from given files. All files must have name suffixs to indicate their formats, e.g., ‘*.recordio’.

Parameters
  • filenames (list) – The list of file names.

  • shapes (list) – List of tuples which declaring data shapes.

  • lod_levels (list) – List of ints which declaring data lod_level.

  • dtypes (list) – List of strs which declaring data type.

  • thread_num (None) – The number of thread to read files. Default: min(len(filenames), cpu_number).

  • buffer_size (None) – The buffer size of reader. Default: 3 * thread_num

  • pass_num (int) – Number of passes to run.

  • is_test (bool|None) – Whether open_files used for testing or not. If it is used for testing, the order of data generated is same as the file order. Otherwise, it is not guaranteed the order of data is same between every epoch. [Default: False].

Returns

A Reader Variable via which we can get file data.

Return type

Variable

Examples

import paddle.fluid. as fluid
reader = fluid.layers.io.open_files(filenames=['./data1.recordio',
                                            './data2.recordio'],
                                    shapes=[(3,224,224), (1,)],
                                    lod_levels=[0, 0],
                                    dtypes=['float32', 'int64'])

# Via the reader, we can use 'read_file' layer to get data:
image, label = fluid.layers.io.read_file(reader)

Preprocessor

class paddle.fluid.layers.Preprocessor(reader, name=None)[source]

A block for data pre-processing in reader.

Parameters
  • reader (Variable) – A reader variable.

  • name (str, default None) – The name of the reader.

Examples

reader = fluid.layers.io.open_files(
    filenames=['./data1.recordio', './data2.recordio'],
    shapes=[(3, 224, 224), (1, )],
    lod_levels=[0, 0],
    dtypes=['float32', 'int64'])

 preprocessor = fluid.layers.io.Preprocessor(reader=reader)
 with preprocessor.block():
     img, lbl = preprocessor.inputs()
     img_out = img / 2
     lbl_out = lbl + 1
     preprocessor.outputs(img_out, lbl_out)

 data_file = fluid.layers.io.double_buffer(preprocessor())

py_reader

paddle.fluid.layers.py_reader(capacity, shapes, dtypes, lod_levels=None, name=None, use_double_buffer=True)[source]

Create a Python reader for data feeding in Python

This layer returns a Reader Variable. The Reader provides decorate_paddle_reader() and decorate_tensor_provider() to set a Python generator as the data source. More details Use PyReader to read training and test data . When Executor::Run() is invoked in C++ side, the data from the generator would be read automatically. Unlike DataFeeder.feed(), the data reading process and Executor::Run() process can run in parallel using py_reader. The start() method of the Reader should be called when each pass begins, while the reset() method should be called when the pass ends and fluid.core.EOFException raises. Note that Program.clone() method cannot clone py_reader.

Parameters
  • capacity (int) – The buffer capacity maintained by py_reader.

  • shapes (list|tuple) – List of tuples which declaring data shapes.

  • dtypes (list|tuple) – List of strs which declaring data type.

  • lod_levels (list|tuple) – List of ints which declaring data lod_level.

  • name (basestring) – The prefix Python queue name and Reader name. None will be generated automatically.

  • use_double_buffer (bool) – Whether use double buffer or not.

Returns

A Reader from which we can get feeding data.

Return type

Variable

Examples

  1. The basic usage of py_reader is as follows:

import paddle
import paddle.fluid as fluid
import paddle.dataset.mnist as mnist

def network(image, label):
    # user defined network, here a softmax regresssion example
    predict = fluid.layers.fc(input=image, size=10, act='softmax')
    return fluid.layers.cross_entropy(input=predict, label=label)

reader = fluid.layers.py_reader(capacity=64,
                                shapes=[(-1, 1, 28, 28), (-1, 1)],
                                dtypes=['float32', 'int64'])
reader.decorate_paddle_reader(
    paddle.reader.shuffle(paddle.batch(mnist.train(), batch_size=5),
                          buf_size=1000))

img, label = fluid.layers.read_file(reader)
loss = network(img, label)

fluid.Executor(fluid.CUDAPlace(0)).run(fluid.default_startup_program())
exe = fluid.ParallelExecutor(use_cuda=True)
for epoch_id in range(10):
    reader.start()
        try:
            while True:
                exe.run(fetch_list=[loss.name])
        except fluid.core.EOFException:
            reader.reset()

fluid.io.save_inference_model(dirname='./model',
                              feeded_var_names=[img.name, label.name],
                              target_vars=[loss],
                              executor=fluid.Executor(fluid.CUDAPlace(0)))

2. When training and testing are both performed, two different py_reader should be created with different names, e.g.:

import paddle
import paddle.fluid as fluid
import paddle.dataset.mnist as mnist

def network(reader):
    img, label = fluid.layers.read_file(reader)
    # User defined network. Here a simple regression as example
    predict = fluid.layers.fc(input=img, size=10, act='softmax')
    loss = fluid.layers.cross_entropy(input=predict, label=label)
    return fluid.layers.mean(loss)

# Create train_main_prog and train_startup_prog
train_main_prog = fluid.Program()
train_startup_prog = fluid.Program()
with fluid.program_guard(train_main_prog, train_startup_prog):
    # Use fluid.unique_name.guard() to share parameters with test program
    with fluid.unique_name.guard():
        train_reader = fluid.layers.py_reader(capacity=64,
                                              shapes=[(-1, 1, 28, 28),
                                                      (-1, 1)],
                                              dtypes=['float32', 'int64'],
                                              name='train_reader')
        train_reader.decorate_paddle_reader(
        paddle.reader.shuffle(paddle.batch(mnist.train(), batch_size=5),
                              buf_size=500))
        train_loss = network(train_reader)  # some network definition
        adam = fluid.optimizer.Adam(learning_rate=0.01)
        adam.minimize(train_loss)

# Create test_main_prog and test_startup_prog
test_main_prog = fluid.Program()
test_startup_prog = fluid.Program()
with fluid.program_guard(test_main_prog, test_startup_prog):
    # Use fluid.unique_name.guard() to share parameters with train program
    with fluid.unique_name.guard():
        test_reader = fluid.layers.py_reader(capacity=32,
                                             shapes=[(-1, 1, 28, 28), (-1, 1)],
                                             dtypes=['float32', 'int64'],
                                             name='test_reader')
        test_reader.decorate_paddle_reader(paddle.batch(mnist.test(), 512))
        test_loss = network(test_reader)

fluid.Executor(fluid.CUDAPlace(0)).run(train_startup_prog)
fluid.Executor(fluid.CUDAPlace(0)).run(test_startup_prog)

train_exe = fluid.ParallelExecutor(use_cuda=True,
                                   loss_name=train_loss.name,
                                   main_program=train_main_prog)
test_exe = fluid.ParallelExecutor(use_cuda=True,
                                  loss_name=test_loss.name,
                                  main_program=test_main_prog)
for epoch_id in range(10):
    train_reader.start()
    try:
        while True:
           train_exe.run(fetch_list=[train_loss.name])
    except fluid.core.EOFException:
        train_reader.reset()

test_reader.start()
try:
    while True:
        test_exe.run(fetch_list=[test_loss.name])
except fluid.core.EOFException:
    test_reader.reset()

random_data_generator

paddle.fluid.layers.random_data_generator(low, high, shapes, lod_levels, for_parallel=True)[source]

Create a uniform random data generator

This layer returns a Reader Variable. Instead of opening a file and reading data from it, this Reader Variable generates float uniform random data by itself. It can be used as a dummy reader to test a network without opening a real file.

Parameters
  • low (float) – The lower bound of data’s uniform distribution.

  • high (float) – The upper bound of data’s uniform distribution.

  • shapes (list) – List of tuples which declaring data shapes.

  • lod_levels (list) – List of ints which declaring data lod_level.

  • for_parallel (Bool) – Set it as True if you are going to run subsequent operators in parallel.

Returns

A Reader Variable from which we can get random data.

Return type

Variable

Examples

import paddle.fluid as fluid
reader = fluid.layers.random_data_generator(
                                 low=0.0,
                                 high=1.0,
                                 shapes=[[3,224,224], [1]],
                                 lod_levels=[0, 0])
# Via the reader, we can use 'read_file' layer to get data:
image, label = fluid.layers.read_file(reader)

read_file

paddle.fluid.layers.read_file(reader)[source]

Execute the given reader and get data via it.

A reader is also a Variable. It can be a raw reader generated by fluid.layers.open_files() or a decorated one generated by fluid.layers.double_buffer() and so on.

Parameters

reader (Variable) – The reader to execute.

Returns

Data read via the given reader.

Return type

Tuple[Variable]

Examples

import paddle.fluid as fluid
data_file = fluid.layers.open_files(
     filenames=['mnist.recordio'],
     shapes=[(-1, 748), (-1, 1)],
     lod_levels=[0, 0],
     dtypes=["float32", "int64"])
data_file = fluid.layers.double_buffer(
     fluid.layers.batch(data_file, batch_size=64))
input, label = fluid.layers.read_file(data_file)

shuffle

paddle.fluid.layers.shuffle(reader, buffer_size)[source]

Creates a data reader whose data output is shuffled. Output from the iterator that created by original reader will be buffered into shuffle buffer, and then shuffled. The size of shuffle buffer is determined by argument buf_size.

Parameters
  • reader (callable) – the original reader whose output will be shuffled.

  • buf_size (int) – shuffle buffer size.

Returns

the new reader whose output is shuffled.

Return type

callable

Examples

import paddle.fluid as fluid
raw_reader = fluid.layers.io.open_files(filenames=['./data1.recordio',
                                               './data2.recordio'],
                                        shapes=[(3,224,224), (1,)],
                                        lod_levels=[0, 0],
                                        dtypes=['float32', 'int64'],
                                        thread_num=2,
                                        buffer_size=2)
batch_reader = fluid.layers.batch(reader=raw_reader, batch_size=5)
shuffle_reader = fluid.layers.shuffle(reader=batch_reader, buffer_size=5000)