Prepare Data¶
PaddlePaddle Fluid supports two methods to feed data into networks:
Synchronous method - Python Reader:Firstly, use
fluid.layers.data
to set up data input layer. Then, feed in the training data throughexecutor.run(feed=...)
influid.Executor
orfluid.ParallelExecutor
.Asynchronous method - py_reader:Firstly, use
fluid.layers.py_reader
to set up data input layer. Then configure the data source with functionsdecorate_paddle_reader
ordecorate_tensor_provider
ofpy_reader
. After that, callfluid.layers.read_file
to read data.
Comparisons of the two methods:
Aspects |
Synchronous Python Reader |
Asynchronous py_reader |
---|---|---|
API interface |
|
|
data type |
Numpy Array |
Numpy Array or LoDTensor |
data augmentation |
carried out by other libraries on Python end |
carried out by other libraries on Python end |
velocity |
slow |
rapid |
recommended applications |
model debugging |
industrial training |
Synchronous Python Reader¶
Fluid provides Python Reader to feed in data.
Python Reader is a pure Python-side interface, and data feeding is synchronized with the model training/prediction process. Users can pass in data through Numpy Array. For specific operations, please refer to:
Python Reader supports advanced functions like group batch, shuffle. For specific operations, please refer to:
Asynchronous py_reader¶
Fluid provides asynchronous data feeding method PyReader. It is more efficient as data feeding is not synchronized with the model training/prediction process. For specific operations, please refer to: