Prepare Data¶

PaddlePaddle Fluid supports two methods to feed data into networks:

Synchronous method - Python Reader：Firstly, use fluid.layers.data to set up data input layer. Then, feed in the training data through executor.run(feed=...) in fluid.Executor or fluid.ParallelExecutor .
Asynchronous method - py_reader：Firstly, use fluid.layers.py_reader to set up data input layer. Then configure the data source with functions decorate_paddle_reader or decorate_tensor_provider of py_reader . After that, call fluid.layers.read_file to read data.

Comparisons of the two methods:

Aspects	Synchronous Python Reader	Asynchronous py_reader
API interface	`executor.run(feed=...)`	`fluid.layers.py_reader`
data type	Numpy Array	Numpy Array or LoDTensor
data augmentation	carried out by other libraries on Python end	carried out by other libraries on Python end
velocity	slow	rapid
recommended applications	model debugging	industrial training

Synchronous Python Reader¶

Fluid provides Python Reader to feed in data.

Python Reader is a pure Python-side interface, and data feeding is synchronized with the model training/prediction process. Users can pass in data through Numpy Array. For specific operations, please refer to:

Take Numpy Array as Training Data

Python Reader supports advanced functions like group batch, shuffle. For specific operations, please refer to：

Python Reader

Asynchronous py_reader¶

Fluid provides asynchronous data feeding method PyReader. It is more efficient as data feeding is not synchronized with the model training/prediction process. For specific operations, please refer to：

Use PyReader to read training and test data