io¶
batch¶
-
paddle.fluid.layers.
batch
(reader, batch_size)[source] This layer is a reader decorator. It takes a reader and adds ‘batching’ decoration on it. When reading with the result decorated reader, output data will be automatically organized to the form of batches.
- Parameters
reader (Variable) – The reader to be decorated with ‘batching’.
batch_size (int) – The batch size.
- Returns
The reader which has been decorated with ‘batching’.
- Return type
Variable
Examples
import paddle.fluid as fluid raw_reader = fluid.layers.io.open_files(filenames=['./data1.recordio', './data2.recordio'], shapes=[(3,224,224), (1,)], lod_levels=[0, 0], dtypes=['float32', 'int64'], thread_num=2, buffer_size=2) batch_reader = fluid.layers.batch(reader=raw_reader, batch_size=5) # If we read data with the raw_reader: # data = fluid.layers.read_file(raw_reader) # We can only get data instance by instance. # # However, if we read data with the batch_reader: # data = fluid.layers.read_file(batch_reader) # Each 5 adjacent instances will be automatically combined together # to become a batch. So what we get('data') is a batch data instead # of an instance.
create_py_reader_by_data¶
-
paddle.fluid.layers.
create_py_reader_by_data
(capacity, feed_list, name=None, use_double_buffer=True)[source] Create a Python reader for data feeding in Python
This layer returns a Reader Variable.
Works much like py_reader except that it’s input is feed_list instead of shapes, dtypes and lod_levels
- Parameters
capacity (int) – The buffer capacity maintained by
py_reader
.feed_list (list(Variable)) – The data feed list.
name (basestring) – The prefix Python queue name and Reader name. None will be generated automatically.
use_double_buffer (bool) – Whether use double buffer or not.
- Returns
A Reader from which we can get feeding data.
- Return type
Variable
Examples
import paddle import paddle.fluid as fluid import paddle.dataset.mnist as mnist def network(img, label): # User defined network. Here a simple regression as example predict = fluid.layers.fc(input=img, size=10, act='softmax') loss = fluid.layers.cross_entropy(input=predict, label=label) return fluid.layers.mean(loss) image = fluid.layers.data(name='image', shape=[1, 28, 28], dtype='float32') label = fluid.layers.data(name='label', shape=[1], dtype='int64') reader = fluid.layers.create_py_reader_by_data(capacity=64, feed_list=[image, label]) reader.decorate_paddle_reader( paddle.reader.shuffle(paddle.batch(mnist.train(), batch_size=5), buf_size=500)) img, label = fluid.layers.read_file(reader) loss = network(img, label) # some network definition fluid.Executor(fluid.CUDAPlace(0)).run(fluid.default_startup_program()) exe = fluid.ParallelExecutor(use_cuda=True, loss_name=loss.name) for epoch_id in range(10): reader.start() try: while True: exe.run(fetch_list=[loss.name]) except fluid.core.EOFException: reader.reset()
data¶
-
paddle.fluid.layers.
data
(name, shape, append_batch_size=True, dtype='float32', lod_level=0, type=VarType.LOD_TENSOR, stop_gradient=True)[source] Data Layer
This function takes in the input and based on whether data has to be returned back as a minibatch, it creates the global variable by using the helper functions. The global variables can be accessed by all the following operators in the graph.
All the input variables of this function are passed in as local variables to the LayerHelper constructor.
Notice that paddle would only use
shape
to infer the shapes of following variables in the network during compile-time. During run-time, paddle would not check whether the shape of the feeded data matches theshape
settings in this function.- Parameters
name (str) – The name/alias of the function
shape (list) – Tuple declaring the shape. If
append_batch_size
is True and there is no -1 insideshape
, it should be considered as the shape of the each sample. Otherwise, it should be considered as the shape of the batched data.append_batch_size (bool) –
If true, it prepends -1 to the shape.
For example if shape=[1], the resulting shape is [-1, 1]. This will be useful to set different batch size at run time.
If shape contains -1, such as shape=[1, -1].
append_batch_size will be enforced to be be False (ineffective) because PaddlePaddle cannot set more than 1 unknown number on the shape.
dtype (np.dtype|VarType|str) – The type of data : float32, float16, int etc
type (VarType) – The output type. By default it is LOD_TENSOR.
lod_level (int) – The LoD Level. 0 means the input data is not a sequence.
stop_gradient (bool) – A boolean that mentions whether gradient should flow.
- Returns
The global variable that gives access to the data.
- Return type
Variable
Examples
import paddle.fluid as fluid data = fluid.layers.data(name='x', shape=[784], dtype='float32')
double_buffer¶
-
paddle.fluid.layers.
double_buffer
(reader, place=None, name=None)[source] Wrap a double buffer reader. The data will copy to target place with a double buffer queue. If the target place is None, the place that executor perform on will be used.
- Parameters
reader (Variable) – the reader variable need to be wrapped.
place (Place) – the place of target data. Default is the sample place of executor perform.
name (str) – Variable name. None if the user does not care.
- Returns
wrapped reader with double buffer.
Examples
>>> import paddle.fluid as fluid >>> reader = fluid.layers.open_files(filenames=['mnist.recordio'], >>> shapes=[[-1, 784], [-1, 1]], >>> dtypes=['float32', 'int64']) >>> reader = fluid.layers.double_buffer(reader) >>> img, label = fluid.layers.read_file(reader)
load¶
-
paddle.fluid.layers.
load
(out, file_path, load_as_fp16=None)[source] Load operator will load a LoDTensor / SelectedRows variable from disk file.
>>> import paddle.fluid as fluid >>> tmp_tensor = fluid.layers.create_tensor(dtype='float32') >>> fluid.layers.load(tmp_tensor, "./tmp_tensor.bin")
- Parameters
out (Variable) – The LoDTensor / SelectedRows need to be loaded.
file_path (STRING) – Variable will be loaded from “file_path”.
load_as_fp16 (BOOLEAN) – If true, the tensor will be first loaded and then converted to float16 data type. Otherwise, the tensor will be directly loaded without data type conversion. Default is false.
- Returns
None
open_files¶
-
paddle.fluid.layers.
open_files
(filenames, shapes, lod_levels, dtypes, thread_num=None, buffer_size=None, pass_num=1, is_test=None)[source] Open files
This layer takes a list of files to read from and returns a Reader Variable. Via the Reader Variable, we can get data from given files. All files must have name suffixs to indicate their formats, e.g., ‘*.recordio’.
- Parameters
filenames (list) – The list of file names.
shapes (list) – List of tuples which declaring data shapes.
lod_levels (list) – List of ints which declaring data lod_level.
dtypes (list) – List of strs which declaring data type.
thread_num (None) – The number of thread to read files. Default: min(len(filenames), cpu_number).
buffer_size (None) – The buffer size of reader. Default: 3 * thread_num
pass_num (int) – Number of passes to run.
is_test (bool|None) – Whether open_files used for testing or not. If it is used for testing, the order of data generated is same as the file order. Otherwise, it is not guaranteed the order of data is same between every epoch. [Default: False].
- Returns
A Reader Variable via which we can get file data.
- Return type
Variable
Examples
import paddle.fluid. as fluid reader = fluid.layers.io.open_files(filenames=['./data1.recordio', './data2.recordio'], shapes=[(3,224,224), (1,)], lod_levels=[0, 0], dtypes=['float32', 'int64']) # Via the reader, we can use 'read_file' layer to get data: image, label = fluid.layers.io.read_file(reader)
Preprocessor¶
-
class
paddle.fluid.layers.
Preprocessor
(reader, name=None)[source] A block for data pre-processing in reader.
- Parameters
reader (Variable) – A reader variable.
name (str, default None) – The name of the reader.
Examples
reader = fluid.layers.io.open_files( filenames=['./data1.recordio', './data2.recordio'], shapes=[(3, 224, 224), (1, )], lod_levels=[0, 0], dtypes=['float32', 'int64']) preprocessor = fluid.layers.io.Preprocessor(reader=reader) with preprocessor.block(): img, lbl = preprocessor.inputs() img_out = img / 2 lbl_out = lbl + 1 preprocessor.outputs(img_out, lbl_out) data_file = fluid.layers.io.double_buffer(preprocessor())
py_reader¶
-
paddle.fluid.layers.
py_reader
(capacity, shapes, dtypes, lod_levels=None, name=None, use_double_buffer=True)[source] Create a Python reader for data feeding in Python
This layer returns a Reader Variable. The Reader provides
decorate_paddle_reader()
anddecorate_tensor_provider()
to set a Python generator as the data source. More details Use PyReader to read training and test data . WhenExecutor::Run()
is invoked in C++ side, the data from the generator would be read automatically. UnlikeDataFeeder.feed()
, the data reading process andExecutor::Run()
process can run in parallel usingpy_reader
. Thestart()
method of the Reader should be called when each pass begins, while thereset()
method should be called when the pass ends andfluid.core.EOFException
raises. Note thatProgram.clone()
method cannot clonepy_reader
.- Parameters
capacity (int) – The buffer capacity maintained by
py_reader
.shapes (list|tuple) – List of tuples which declaring data shapes.
dtypes (list|tuple) – List of strs which declaring data type.
lod_levels (list|tuple) – List of ints which declaring data lod_level.
name (basestring) – The prefix Python queue name and Reader name. None will be generated automatically.
use_double_buffer (bool) – Whether use double buffer or not.
- Returns
A Reader from which we can get feeding data.
- Return type
Variable
Examples
The basic usage of
py_reader
is as follows:
import paddle import paddle.fluid as fluid import paddle.dataset.mnist as mnist def network(image, label): # user defined network, here a softmax regresssion example predict = fluid.layers.fc(input=image, size=10, act='softmax') return fluid.layers.cross_entropy(input=predict, label=label) reader = fluid.layers.py_reader(capacity=64, shapes=[(-1, 1, 28, 28), (-1, 1)], dtypes=['float32', 'int64']) reader.decorate_paddle_reader( paddle.reader.shuffle(paddle.batch(mnist.train(), batch_size=5), buf_size=1000)) img, label = fluid.layers.read_file(reader) loss = network(img, label) fluid.Executor(fluid.CUDAPlace(0)).run(fluid.default_startup_program()) exe = fluid.ParallelExecutor(use_cuda=True) for epoch_id in range(10): reader.start() try: while True: exe.run(fetch_list=[loss.name]) except fluid.core.EOFException: reader.reset() fluid.io.save_inference_model(dirname='./model', feeded_var_names=[img.name, label.name], target_vars=[loss], executor=fluid.Executor(fluid.CUDAPlace(0)))
2. When training and testing are both performed, two different
py_reader
should be created with different names, e.g.:import paddle import paddle.fluid as fluid import paddle.dataset.mnist as mnist def network(reader): img, label = fluid.layers.read_file(reader) # User defined network. Here a simple regression as example predict = fluid.layers.fc(input=img, size=10, act='softmax') loss = fluid.layers.cross_entropy(input=predict, label=label) return fluid.layers.mean(loss) # Create train_main_prog and train_startup_prog train_main_prog = fluid.Program() train_startup_prog = fluid.Program() with fluid.program_guard(train_main_prog, train_startup_prog): # Use fluid.unique_name.guard() to share parameters with test program with fluid.unique_name.guard(): train_reader = fluid.layers.py_reader(capacity=64, shapes=[(-1, 1, 28, 28), (-1, 1)], dtypes=['float32', 'int64'], name='train_reader') train_reader.decorate_paddle_reader( paddle.reader.shuffle(paddle.batch(mnist.train(), batch_size=5), buf_size=500)) train_loss = network(train_reader) # some network definition adam = fluid.optimizer.Adam(learning_rate=0.01) adam.minimize(train_loss) # Create test_main_prog and test_startup_prog test_main_prog = fluid.Program() test_startup_prog = fluid.Program() with fluid.program_guard(test_main_prog, test_startup_prog): # Use fluid.unique_name.guard() to share parameters with train program with fluid.unique_name.guard(): test_reader = fluid.layers.py_reader(capacity=32, shapes=[(-1, 1, 28, 28), (-1, 1)], dtypes=['float32', 'int64'], name='test_reader') test_reader.decorate_paddle_reader(paddle.batch(mnist.test(), 512)) test_loss = network(test_reader) fluid.Executor(fluid.CUDAPlace(0)).run(train_startup_prog) fluid.Executor(fluid.CUDAPlace(0)).run(test_startup_prog) train_exe = fluid.ParallelExecutor(use_cuda=True, loss_name=train_loss.name, main_program=train_main_prog) test_exe = fluid.ParallelExecutor(use_cuda=True, loss_name=test_loss.name, main_program=test_main_prog) for epoch_id in range(10): train_reader.start() try: while True: train_exe.run(fetch_list=[train_loss.name]) except fluid.core.EOFException: train_reader.reset() test_reader.start() try: while True: test_exe.run(fetch_list=[test_loss.name]) except fluid.core.EOFException: test_reader.reset()
random_data_generator¶
-
paddle.fluid.layers.
random_data_generator
(low, high, shapes, lod_levels, for_parallel=True)[source] Create a uniform random data generator
This layer returns a Reader Variable. Instead of opening a file and reading data from it, this Reader Variable generates float uniform random data by itself. It can be used as a dummy reader to test a network without opening a real file.
- Parameters
low (float) – The lower bound of data’s uniform distribution.
high (float) – The upper bound of data’s uniform distribution.
shapes (list) – List of tuples which declaring data shapes.
lod_levels (list) – List of ints which declaring data lod_level.
for_parallel (Bool) – Set it as True if you are going to run subsequent operators in parallel.
- Returns
A Reader Variable from which we can get random data.
- Return type
Variable
Examples
import paddle.fluid as fluid reader = fluid.layers.random_data_generator( low=0.0, high=1.0, shapes=[[3,224,224], [1]], lod_levels=[0, 0]) # Via the reader, we can use 'read_file' layer to get data: image, label = fluid.layers.read_file(reader)
read_file¶
-
paddle.fluid.layers.
read_file
(reader)[source] Execute the given reader and get data via it.
A reader is also a Variable. It can be a raw reader generated by fluid.layers.open_files() or a decorated one generated by fluid.layers.double_buffer() and so on.
- Parameters
reader (Variable) – The reader to execute.
- Returns
Data read via the given reader.
- Return type
Tuple[Variable]
Examples
import paddle.fluid as fluid data_file = fluid.layers.open_files( filenames=['mnist.recordio'], shapes=[(-1, 748), (-1, 1)], lod_levels=[0, 0], dtypes=["float32", "int64"]) data_file = fluid.layers.double_buffer( fluid.layers.batch(data_file, batch_size=64)) input, label = fluid.layers.read_file(data_file)
shuffle¶
-
paddle.fluid.layers.
shuffle
(reader, buffer_size)[source] Creates a data reader whose data output is shuffled. Output from the iterator that created by original reader will be buffered into shuffle buffer, and then shuffled. The size of shuffle buffer is determined by argument buf_size.
- Parameters
reader (callable) – the original reader whose output will be shuffled.
buf_size (int) – shuffle buffer size.
- Returns
the new reader whose output is shuffled.
- Return type
callable
Examples
import paddle.fluid as fluid raw_reader = fluid.layers.io.open_files(filenames=['./data1.recordio', './data2.recordio'], shapes=[(3,224,224), (1,)], lod_levels=[0, 0], dtypes=['float32', 'int64'], thread_num=2, buffer_size=2) batch_reader = fluid.layers.batch(reader=raw_reader, batch_size=5) shuffle_reader = fluid.layers.shuffle(reader=batch_reader, buffer_size=5000)