Prepare Data

PaddlePaddle Fluid supports two methods to feed data into networks:

  1. Synchronous method - Python Reader:Firstly, use fluid.layers.data to set up data input layer. Then, feed in the training data through executor.run(feed=...) in fluid.Executor or fluid.ParallelExecutor .

  2. Asynchronous method - py_reader:Firstly, use fluid.layers.py_reader to set up data input layer. Then configure the data source with functions decorate_paddle_reader or decorate_tensor_provider of py_reader . After that, call fluid.layers.read_file to read data.

Comparisons of the two methods:

Aspects

Synchronous Python Reader

Asynchronous py_reader

API interface

executor.run(feed=...)

fluid.layers.py_reader

data type

Numpy Array

Numpy Array or LoDTensor

data augmentation

carried out by other libraries on Python end

carried out by other libraries on Python end

velocity

slow

rapid

recommended applications

model debugging

industrial training

Synchronous Python Reader

Fluid provides Python Reader to feed in data.

Python Reader is a pure Python-side interface, and data feeding is synchronized with the model training/prediction process. Users can pass in data through Numpy Array. For specific operations, please refer to:

Python Reader supports advanced functions like group batch, shuffle. For specific operations, please refer to:

Asynchronous py_reader

Fluid provides asynchronous data feeding method PyReader. It is more efficient as data feeding is not synchronized with the model training/prediction process. For specific operations, please refer to: