fluid¶
BuildStrategy¶
-
class
paddle.fluid.
BuildStrategy
BuildStrategy allows the user to more preciously control how to build the SSA Graph in ParallelExecutor by setting the property.
Examples
import paddle.fluid as fluid build_strategy = fluid.BuildStrategy() build_strategy.reduce_strategy = fluid.BuildStrategy.ReduceStrategy.Reduce
-
debug_graphviz_path
The type is STR, debug_graphviz_path indicate the path that writing the SSA Graph to file in the form of graphviz. It is useful for debugging. Default “”
Examples
import paddle.fluid as fluid build_strategy = fluid.BuildStrategy() build_strategy.debug_graphviz_path = ""
-
enable_sequential_execution
The type is BOOL. If set True, the execution order of ops would be the same as what is in the program. Default False.
Examples
import paddle.fluid as fluid build_strategy = fluid.BuildStrategy() build_strategy.enable_sequential_execution = True
-
fuse_broadcast_ops
The type is BOOL, fuse_broadcast_op indicates whether to fuse the broadcast ops. Note that, in Reduce mode, fusing broadcast ops may make the program faster. Because fusing broadcast OP equals delaying the execution of all broadcast Ops, in this case, all nccl streams are used only for NCCLReduce operations for a period of time. Default False.
-
fuse_elewise_add_act_ops
The type is BOOL, fuse_elewise_add_act_ops indicate whether to fuse elementwise_add_op and activation_op, it may make the execution faster. Default False
Examples
import paddle.fluid as fluid build_strategy = fluid.BuildStrategy() build_strategy.fuse_elewise_add_act_ops = True
-
fuse_relu_depthwise_conv
The type is BOOL, fuse_relu_depthwise_conv indicate whether to fuse relu and depthwise_conv2d, it will save GPU memory and may make the execution faster. This options is only available in GPU devices. Default False.
Examples
import paddle.fluid as fluid build_strategy = fluid.BuildStrategy() build_strategy.fuse_relu_depthwise_conv = True
-
gradient_scale_strategy
The type is STR, there are three ways of defining \(loss@grad\) in ParallelExecutor, ‘CoeffNumDevice’, ‘One’ and ‘Customized’. By default, ParallelExecutor sets the \(loss@grad\) according to the number of devices. If you want to customize \(loss@grad\), you can choose ‘Customized’. Default ‘CoeffNumDevice’.
Examples
import paddle.fluid as fluid build_strategy = fluid.BuildStrategy() build_strategy.gradient_scale_strategy = True
-
memory_optimize
[source] The type is BOOL, memory opitimize aims to save total memory consumption, set to True to enable it.
Memory Optimize is our experimental feature, some variables may be reused/removed by optimize strategy. If you need to fetch some variable values when using this feature, please set the persistable property of the variables to True.
Default False
-
reduce_strategy
The type is STR, there are two reduce strategies in ParallelExecutor, ‘AllReduce’ and ‘Reduce’. If you want that all the parameters’ optimization are done on all devices independently, you should choose ‘AllReduce’; if you choose ‘Reduce’, all the parameters’ optimization will be evenly distributed to different devices, and then broadcast the optimized parameter to other devices. In some models, Reduce is faster. Default ‘AllReduce’.
Examples
import paddle.fluid as fluid build_strategy = fluid.BuildStrategy() build_strategy.reduce_strategy = fluid.BuildStrategy.ReduceStrategy.Reduce
-
remove_unnecessary_lock
The type is BOOL. If set True, some locks in GPU ops would be released and ParallelExecutor would run faster. Default True.
Examples
import paddle.fluid as fluid build_strategy = fluid.BuildStrategy() build_strategy.remove_unnecessary_lock = True
-
sync_batch_norm
The type is BOOL, sync_batch_norm indicates whether to use synchronous batch normalization which synchronizes the mean and variance through multi-devices in training phase.
Current implementation doesn’t support FP16 training and CPU. And only synchronous on one machine, not all machines.
Default False
Examples
import paddle.fluid as fluid build_strategy = fluid.BuildStrategy() build_strategy.sync_batch_norm = True
-
CompiledProgram¶
-
class
paddle.fluid.
CompiledProgram
(program_or_graph)[source] Compiles to Graph for execution.
Users first create the program with layers.
Optionally, users use CompiledProgram to optimize the program before run.
The original program or CompiledProgram is run by executor.
The CompiledProgram is used to transform a program for various optimizations, for example.
Pre-compute some logic once so that each run is faster.
Transform the program so that it can run in multiple devices.
Transform the program for optimized inference or distributed training. Note that: this part is not finished.
Example
import paddle.fluid as fluid import paddle.fluid.compiler as compiler import numpy import os place = fluid.CUDAPlace(0) # fluid.CPUPlace() exe = fluid.Executor(place) data = fluid.layers.data(name='X', shape=[1], dtype='float32') hidden = fluid.layers.fc(input=data, size=10) loss = fluid.layers.mean(hidden) fluid.optimizer.SGD(learning_rate=0.01).minimize(loss) fluid.default_startup_program().random_seed=1 exe.run(fluid.default_startup_program()) compiled_prog = compiler.CompiledProgram( fluid.default_main_program()) x = numpy.random.random(size=(10, 1)).astype('float32') loss_data, = exe.run(compiled_prog, feed={"X": x}, fetch_list=[loss.name])
- Parameters
program_or_graph (Graph|Program) – If it’s Program, it will be first lowered to a graph for further optimizations. If it’s a graph (potentially optimized before), it will be directly used for further optimizations. Note: graph is only supported when compiled with with_data_parallel option.
-
with_data_parallel
(loss_name=None, build_strategy=None, exec_strategy=None, share_vars_from=None, places=None) Configs the program to run in data parallel way.
Example
import paddle.fluid as fluid import paddle.fluid.compiler as compiler import numpy import os use_cuda = True place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() # NOTE: If you use CPU to run the program, you need # to specify the CPU_NUM, otherwise, fluid will use # all the number of the logic core as the CPU_NUM, # in that case, the batch size of the input should be # greater than CPU_NUM, if not, the process will be # failed by an exception. if not use_cuda: os.environ['CPU_NUM'] = str(2) exe = fluid.Executor(place) data = fluid.layers.data(name='X', shape=[1], dtype='float32') hidden = fluid.layers.fc(input=data, size=10) loss = fluid.layers.mean(hidden) fluid.optimizer.SGD(learning_rate=0.01).minimize(loss) fluid.default_startup_program().random_seed=1 exe.run(fluid.default_startup_program()) compiled_prog = compiler.CompiledProgram( fluid.default_main_program()).with_data_parallel( loss_name=loss.name) x = numpy.random.random(size=(10, 1)).astype('float32') loss_data, = exe.run(compiled_prog, feed={"X": x}, fetch_list=[loss.name])
- Parameters
loss_name (str) – The loss name must set in training. Default None.
build_strategy (BuildStrategy) – build_strategy is used to build the graph so it can run on multiple devices/cores with optimized topology. For more information, please refer to fluid.BuildStrategy. Default None.
exec_strategy (ExecutionStrategy) – exec_strategy is used to to select the a way to execute the graph, for example how many threads are used, how many iterations to clean up the temp variables. For more information, please refer to fluid.ExecutionStrategy. Default None.
share_vars_from (CompiledProgram) – If provided, this CompiledProgram will share variables from share_vars_from. share_vars_from must be run by the executor before this CompiledProgram so that vars are ready.
places (list(CUDAPlace)|list(CPUPlace)|None) – If provided, only compile program in the given places. Otherwise, the places used when compiled is determined by the Executor, and the places used are controlled by environment variables: FLAGS_selected_gpus or CUDA_VISIBLE_DEVICES if using GPU; or CPU_NUM if using CPU. For example, if you want to run on GPU 0 and 1, set places=[fluid.CUDAPlace(0), fluid.CUDAPlace(1)]. If you want to run on 2 CPU cores, set places=[fluid.CPUPlace()]*2.
- Returns
self
-
with_inference_optimize
(config) Add inference optimize
- Parameters
config – instance of NativeConfig or AnalysisConfig to create predictor
- Returns
self
cpu_places¶
-
paddle.fluid.
cpu_places
(device_count=None)[source] Create a list of
fluid.CPUPlace
objects.If
device_count
is None, the device count would be determined by environment variableCPU_NUM
. IfCPU_NUM
is not set, the default value is 1, i.e. CPU_NUM=1.- Parameters
device_count (None|int) – device number.
- Returns
cpu place list.
- Return type
out (list(fluid.CPUPlace))
Examples
import paddle.fluid as fluid cpu_places = fluid.cpu_places()
CPUPlace¶
-
class
paddle.fluid.
CPUPlace
CPUPlace is a descriptor of a device. It represents a CPU, and the memory CPUPlace can be accessed by CPU.
Examples
import paddle.fluid as fluid cpu_place = fluid.CPUPlace()
create_lod_tensor¶
-
paddle.fluid.
create_lod_tensor
(data, recursive_seq_lens, place)[source] Create a lod tensor from a numpy array, a list, or an existing lod tensor.
Create a lod tensor by doing the following:
Check that the length-based level of detail (LoD) also known as recursive_sequence_lengths of the input is valid.
Convert recursive_sequence_lengths to a offset-based LoD.
Copy the data from a numpy array, a list or a existing lod tensor to CPU or GPU device (based on input place).
Set the level of detail (LoD) using the offset-based LoD.
Examples
Suppose we want LoDTensor to hold data for sequences of word, where each word is represented by an integer. If we want to create a LoDTensor to represent two sentences, one of 2 words, and one of 3 words.
Then
data
can be a numpy array of integers with shape (5, 1).recursive_seq_lens
will be [[2, 3]], indicating the length(# of words) in each sentence. This length-basedrecursive_seq_lens
[[2, 3]] will be converted to offset-based LoD [[0, 2, 5]] inside the function call.import paddle.fluid as fluid import numpy as np t = fluid.create_lod_tensor(np.ndarray([5, 30]), [[2, 3]], fluid.CPUPlace())
Please reference api_guide_low_level_lod_tensor for more details regarding LoD.
- Parameters
data (numpy.ndarray|list|LoDTensor) – a numpy array or a LoDTensor or a list holding the data to be copied.
recursive_seq_lens (list) – a list of lists indicating the length-based level of detail info specified by the user.
place (Place) – CPU or GPU place indicating where the data in the new LoDTensor will be stored.
- Returns
A fluid LoDTensor object with tensor data and recursive_seq_lens info.
create_random_int_lodtensor¶
-
paddle.fluid.
create_random_int_lodtensor
(recursive_seq_lens, base_shape, place, low, high)[source] Create a LoDTensor containing random integers.
This function is frequently used in the book examples. So we revised it based on the new create_lod_tensor API and put it here in the lod_tensor module to simplify the code.
The function does the following:
Calculate the overall shape of the LoDTensor based on the length-based
recursive_seq_lens
input and the shape of the basic element inbase_shape
.Create a numpy array of this shape.
Create the LoDTensor using create_lod_tensor API.
Suppose we want LoDTensor to hold data for sequences of word, where each word is represented by an integer. If we want to create a LoDTensor to represent two sentences, one of 2 words, and one of 3 words. Then ‘base_shape’ is [1], input length-based ‘recursive_seq_lens’ is [[2, 3]]. Then the overall shape of the LoDTensor would be [5, 1], holding 5 words for two sentences.
- Parameters
recursive_seq_lens (list) – a list of lists indicating the length-based level of detail info specified by the user.
base_shape (list) – the shape of the basic element to be held by the LoDTensor.
place (Place) – CPU or GPU place indicating where the data in the new LoDTensor will be stored.
low (int) – the lower bound of the random integers.
high (int) – the upper bound of the random integers.
- Returns
A fluid LoDTensor object with tensor data and recursive_seq_lens info.
Examples
import paddle.fluid as fluid t = fluid.create_random_int_lodtensor(recursive_seq_lens=[[2, 3]], base_shape=[30], place=fluid.CPUPlace(), low=0, high=10)
cuda_pinned_places¶
-
paddle.fluid.
cuda_pinned_places
(device_count=None)[source] Create a list of
fluid.CUDAPinnedPlace
objects.If
device_count
is None, the device count would be determined by environment variableCPU_NUM
. IfCPU_NUM
is not set, the device count would be determined bymultiprocessing.cpu_count()
.- Parameters
device_count (None|int) – device number.
- Returns
cuda pinned place list.
- Return type
out (list(fluid.CUDAPinnedPlace))
Examples
import paddle.fluid as fluid cuda_pinned_places_cpu_num = fluid.cuda_pinned_places() # or cuda_pinned_places = fluid.cuda_pinned_places(1)
cuda_places¶
-
paddle.fluid.
cuda_places
(device_ids=None)[source] Create a list of
fluid.CUDAPlace
objects.If
device_ids
is None, environment variable ofFLAGS_selected_gpus
would be checked first. IfFLAGS_selected_gpus=0,1,2
, the returned list would be [fluid.CUDAPlace(0), fluid.CUDAPlace(1), fluid.CUDAPlace(2)]. IfFLAGS_selected_gpus
is not set, all visible gpu places would be returned.If
device_ids
is not None, it should be the device ids of gpus. For example, ifdevice_ids=[0,1,2]
, the returned list would be [fluid.CUDAPlace(0), fluid.CUDAPlace(1), fluid.CUDAPlace(2)].- Parameters
device_ids (None|list(int)|tuple(int)) – gpu device id list.
- Returns
gpu place list.
- Return type
out (list(fluid.CUDAPlace))
Examples
import paddle.fluid as fluid cuda_places = fluid.cuda_places()
CUDAPinnedPlace¶
-
class
paddle.fluid.
CUDAPinnedPlace
CUDAPinnedPlace is a descriptor of a device. The memory of CUDAPinnedPlace can be accessed by GPU and CPU.
Examples
import paddle.fluid as fluid place = fluid.CUDAPinnedPlace()
CUDAPlace¶
-
class
paddle.fluid.
CUDAPlace
CUDAPlace is a descriptor of a device. It represents a GPU, and each CUDAPlace has a dev_id to indicate the number of cards represented by the current CUDAPlace. The memory of CUDAPlace with different dev_id is not accessible.
Examples
import paddle.fluid as fluid gpu_place = fluid.CUDAPlace(0)
DataFeedDesc¶
-
class
paddle.fluid.
DataFeedDesc
(proto_file)[source] Datafeed descriptor, describing input training data format. This class is currently only used for AsyncExecutor (See comments for class AsyncExecutor for a brief introduction)
DataFeedDesc shall be initialized from a valid protobuf message from disk.
See
paddle/fluid/framework/data_feed.proto
for message definition. A typical message might look like:import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto')
However, users usually shouldn’t care about the message format; instead, they are encouragd to use
Data Generator
as a tool to generate a valid data description, in the process of converting their raw log files to training files acceptable to AsyncExecutor.DataFeedDesc can also be changed during runtime. Once you got familiar with what each field mean, you can modify it to better suit your need. E.g.:
import paddle.fluid as fluid data_feed = fluid.DataFeedDesc('data.proto') data_feed.set_batch_size(128) data_feed.set_dense_slots('wd') # The slot named 'wd' will be dense data_feed.set_use_slots('wd') # The slot named 'wd' will be used
Finally, the content can be dumped out for debugging purpose:
print(data_feed.desc())
- Parameters
proto_file (string) – Disk file containing a data feed description.
-
set_batch_size
(batch_size) Set batch size. Will be effective during training
Example
import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto') data_feed.set_batch_size(128)
- Parameters
batch_size – batch size
-
set_dense_slots
(dense_slots_name) Set if a specific slot will be dense. Will be effective during training. features for a dense slot will be fed into a Tensor, while those for a sparse slot will be fed into a LoDTensor
Example
import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto') data_feed.set_dense_slots(['words'])
- Parameters
dense_slots_name – a list of slot names which will be set dense
Note
Default is sparse for all slots
-
set_use_slots
(use_slots_name) Set if a specific slot will be used for training. A dataset shall contain a lot of features, through this function one can select which ones will be used for a specific model.
Example
import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto') data_feed.set_use_slots(['words'])
- Parameters
use_slots_name – a list of slot names which will be used in training
Note
Default is not used for all slots
-
desc
() Returns a protobuf message for this DataFeedDesc
Example
import paddle.fluid as fluid f = open("data.proto", "w") print >> f, 'name: "MultiSlotDataFeed"' print >> f, 'batch_size: 2' print >> f, 'multi_slot_desc {' print >> f, ' slots {' print >> f, ' name: "words"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, ' slots {' print >> f, ' name: "label"' print >> f, ' type: "uint64"' print >> f, ' is_dense: false' print >> f, ' is_used: true' print >> f, ' }' print >> f, '}' f.close() data_feed = fluid.DataFeedDesc('data.proto') print(data_feed.desc())
- Returns
A string message
DataFeeder¶
-
class
paddle.fluid.
DataFeeder
(feed_list, place, program=None)[source] DataFeeder converts the data that returned by a reader into a data structure that can feed into Executor and ParallelExecutor. The reader usually returns a list of mini-batch data entries. Each data entry in the list is one sample. Each sample is a list or a tuple with one feature or multiple features.
The simple usage shows below:
import paddle.fluid as fluid place = fluid.CPUPlace() img = fluid.layers.data(name='image', shape=[1, 28, 28]) label = fluid.layers.data(name='label', shape=[1], dtype='int64') feeder = fluid.DataFeeder([img, label], fluid.CPUPlace()) result = feeder.feed([([0] * 784, [9]), ([1] * 784, [1])])
If you want to feed data into GPU side separately in advance when you use multi-GPU to train a model, you can use decorate_reader function.
import paddle import paddle.fluid as fluid place=fluid.CUDAPlace(0) data = fluid.layers.data(name='data', shape=[3, 224, 224], dtype='float32') label = fluid.layers.data(name='label', shape=[1], dtype='int64') feeder = fluid.DataFeeder(place=place, feed_list=[data, label]) reader = feeder.decorate_reader( paddle.batch(paddle.dataset.flowers.train(), batch_size=16), multi_devices=False)
- Parameters
feed_list (list) – The Variables or Variables’name that will feed into model.
place (Place) – place indicates feed data into CPU or GPU, if you want to feed data into GPU, please using fluid.CUDAPlace(i) (i represents the GPU id), or if you want to feed data into CPU, please using fluid.CPUPlace().
program (Program) – The Program that will feed data into, if program is None, it will use default_main_program(). Default None.
- Raises
ValueError
– If some Variable is not in this Program.
Examples
import numpy as np import paddle import paddle.fluid as fluid place = fluid.CPUPlace() def reader(): yield [np.random.random([4]).astype('float32'), np.random.random([3]).astype('float32')], main_program = fluid.Program() startup_program = fluid.Program() with fluid.program_guard(main_program, startup_program): data_1 = fluid.layers.data(name='data_1', shape=[1, 2, 2]) data_2 = fluid.layers.data(name='data_2', shape=[1, 1, 3]) out = fluid.layers.fc(input=[data_1, data_2], size=2) # ... feeder = fluid.DataFeeder([data_1, data_2], place) exe = fluid.Executor(place) exe.run(startup_program) for data in reader(): outs = exe.run(program=main_program, feed=feeder.feed(data), fetch_list=[out])
-
feed
(iterable) According to feed_list and iterable, converters the input into a data structure that can feed into Executor and ParallelExecutor.
- Parameters
iterable (list|tuple) – the input data.
- Returns
the result of conversion.
- Return type
dict
Examples
import numpy.random as random import paddle.fluid as fluid def reader(limit=5): for i in range(limit): yield random.random([784]).astype('float32'), random.random([1]).astype('int64'), random.random([256]).astype('float32') data_1 = fluid.layers.data(name='data_1', shape=[1, 28, 28]) data_2 = fluid.layers.data(name='data_2', shape=[1], dtype='int64') data_3 = fluid.layers.data(name='data_3', shape=[16, 16], dtype='float32') feeder = fluid.DataFeeder(['data_1','data_2', 'data_3'], fluid.CPUPlace()) result = feeder.feed(reader())
-
feed_parallel
(iterable, num_places=None) Takes multiple mini-batches. Each mini-batch will be feed on each device in advance.
- Parameters
iterable (list|tuple) – the input data.
num_places (int) – the number of devices. Default None.
- Returns
the result of conversion.
- Return type
dict
Notes
The number of devices and number of mini-batches must be same.
Examples
import numpy.random as random import paddle.fluid as fluid def reader(limit=10): for i in range(limit): yield [random.random([784]).astype('float32'), random.randint(10)], x = fluid.layers.data(name='x', shape=[1, 28, 28]) y = fluid.layers.data(name='y', shape=[1], dtype='int64') feeder = fluid.DataFeeder(['x','y'], fluid.CPUPlace()) place_num = 2 places = [fluid.CPUPlace() for x in range(place_num)] data = [] exe = fluid.Executor(fluid.CPUPlace()) exe.run(fluid.default_startup_program()) program = fluid.CompiledProgram(fluid.default_main_program()).with_data_parallel(places=places) for item in reader(): data.append(item) if place_num == len(data): exe.run(program=program, feed=list(feeder.feed_parallel(data, place_num)), fetch_list=[]) data = []
-
decorate_reader
(reader, multi_devices, num_places=None, drop_last=True) Converter the input data into a data that returned by reader into multiple mini-batches. Each mini-batch will be feed on each device.
- Parameters
reader (function) – the reader is the function which can generate data.
multi_devices (bool) – whether to use multiple devices or not.
num_places (int) – if multi_devices is True, you can specify the number of GPU to use, if multi_devices is None, the function will use all the GPU of the current machine. Default None.
drop_last (bool) – whether to drop the last batch if the size of the last batch is less than batch_size. Default True.
- Returns
the result of conversion.
- Return type
dict
- Raises
ValueError
– If drop_last is False and the data batch cannot fit for devices.
Examples
import numpy.random as random import paddle import paddle.fluid as fluid def reader(limit=5): for i in range(limit): yield (random.random([784]).astype('float32'), random.random([1]).astype('int64')), place=fluid.CUDAPlace(0) data = fluid.layers.data(name='data', shape=[1, 28, 28], dtype='float32') label = fluid.layers.data(name='label', shape=[1], dtype='int64') feeder = fluid.DataFeeder(place=place, feed_list=[data, label]) reader = feeder.decorate_reader(reader, multi_devices=False) exe = fluid.Executor(place) exe.run(fluid.default_startup_program()) for data in reader(): exe.run(feed=data)
default_main_program¶
-
paddle.fluid.
default_main_program
()[source] Get default/global main program. The main program is used for training or testing.
All layer function in
fluid.layers
will append operators and variables to thedefault_main_program
.The
default_main_program
is the default program in a lot of APIs. For example, theExecutor.run()
will execute thedefault_main_program
when the program is not specified.- Returns
main program
- Return type
Program
Examples
import paddle.fluid as fluid # Sample Network: data = fluid.layers.data(name='image', shape=[3, 224, 224], dtype='float32') label = fluid.layers.data(name='label', shape=[1], dtype='int64') conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None) bn1 = fluid.layers.batch_norm(conv1, act='relu') pool1 = fluid.layers.pool2d(bn1, 2, 'max', 2) conv2 = fluid.layers.conv2d(pool1, 16, 5, 1, act=None) bn2 = fluid.layers.batch_norm(conv2, act='relu') pool2 = fluid.layers.pool2d(bn2, 2, 'max', 2) fc1 = fluid.layers.fc(pool2, size=50, act='relu') fc2 = fluid.layers.fc(fc1, size=102, act='softmax') loss = fluid.layers.cross_entropy(input=fc2, label=label) loss = fluid.layers.mean(loss) opt = fluid.optimizer.Momentum( learning_rate=0.1, momentum=0.9, regularization=fluid.regularizer.L2Decay(1e-4)) opt.minimize(loss) print(fluid.default_main_program())
default_startup_program¶
-
paddle.fluid.
default_startup_program
()[source] Get default/global startup program.
The layer function in
fluid.layers
will create parameters, readers, NCCL handles as global variables. Thestartup_program
will initialize them by the operators in startup program. The layer function will append these initialization operators into startup program.This method will return the
default
or thecurrent
startup program. Users can usefluid.program_guard
to switch program.- Returns
startup program
- Return type
Program
Examples
import paddle.fluid as fluid main_program = fluid.Program() startup_program = fluid.Program() with fluid.program_guard(main_program=main_program, startup_program=startup_program): x = fluid.layers.data(name="x", shape=[-1, 784], dtype='float32') y = fluid.layers.data(name="y", shape=[-1, 1], dtype='int32') z = fluid.layers.fc(name="fc", input=x, size=10, act="relu") print("main program is: {}".format(fluid.default_main_program())) print("start up program is: {}".format(fluid.default_startup_program()))
DistributeTranspiler¶
-
class
paddle.fluid.
DistributeTranspiler
(config=None)[source] DistributeTranspiler
Convert the fluid program to distributed data-parallelism programs. Supports two modes: pserver mode and nccl2 mode.
In pserver mode, the main_program will be transformed to use a remote parameter server to do parameter optimization. And the optimization graph will be put into a parameter server program.
In nccl2 mode, the transpiler will append a NCCL_ID broadcasting op in startup_program to share the NCCL_ID across the job nodes. After transpile_nccl2 called, you *must* pass trainer_id and num_trainers argument to ParallelExecutor to enable NCCL2 distributed mode.
Examples
x = fluid.layers.data(name='x', shape=[13], dtype='float32') y = fluid.layers.data(name='y', shape=[1], dtype='float32') y_predict = fluid.layers.fc(input=x, size=1, act=None) cost = fluid.layers.square_error_cost(input=y_predict, label=y) avg_loss = fluid.layers.mean(cost) sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001) sgd_optimizer.minimize(avg_loss) # for pserver mode pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174" trainer_endpoints = "192.168.0.1:6174,192.168.0.2:6174" current_endpoint = "192.168.0.1:6174" trainer_id = 0 trainers = 4 role = "PSERVER" t = fluid.DistributeTranspiler() t.transpile( trainer_id, pservers=pserver_endpoints, trainers=trainers) if role == "PSERVER": pserver_program = t.get_pserver_program(current_endpoint) pserver_startup_program = t.get_startup_program(current_endpoint, pserver_program) elif role == "TRAINER": trainer_program = t.get_trainer_program() # for nccl2 mode trainer_num = 2 trainer_id = 0 config = fluid.DistributeTranspilerConfig() config.mode = "nccl2" trainer_endpoints = "192.168.0.1:6174,192.168.0.2:6174" t = fluid.DistributeTranspiler(config=config) t.transpile(trainer_id=trainer_id, trainers=trainer_endpoints, current_endpoint="192.168.0.1:6174") exe = fluid.ParallelExecutor( use_cuda=True, loss_name=avg_loss.name, num_trainers=trainer_num, trainer_id=trainer_id )
-
transpile
(trainer_id, program=None, pservers='127.0.0.1:6174', trainers=1, sync_mode=True, startup_program=None, current_endpoint='127.0.0.1:6174') Run the transpiler. Transpile the input program.
- Parameters
trainer_id (int) – id for current trainer worker, if you have n workers, the id may range from 0 ~ n-1
program (Program|None) – program to transpile, default is fluid.default_main_program().
startup_program (Program|None) – startup_program to transpile, default is fluid.default_startup_program().
pservers (str) – comma separated ip:port string for the pserver list.
trainers (int|str) – in pserver mode this is the number of trainers, in nccl2 mode this is a string of trainer endpoints.
sync_mode (bool) – Do sync training or not, default is True.
startup_program – startup_program to transpile, default is fluid.default_main_program().
current_endpoint (str) – need pass current endpoint when transpile as nccl2 distributed mode. In pserver mode this argument is not used.
Examples
transpiler = fluid.DistributeTranspiler() t.transpile( trainer_id=0, pservers="127.0.0.1:7000,127.0.0.1:7001", trainers=2, sync_mode=False, current_endpoint="127.0.0.1:7000")
-
get_trainer_program
(wait_port=True) Get transpiled trainer side program.
- Returns
trainer side program.
- Return type
Program
Examples
import paddle.fluid as fluid #this is an example, find available endpoints in your case pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174" trainer_id = 0 trainers = 4 t = fluid.DistributeTranspiler() t.transpile(trainer_id, trainers=trainers, pservers=pserver_endpoints) trainer_program = t.get_trainer_program()
-
get_pserver_program
(endpoint) Get parameter server side program.
- Parameters
endpoint (str) – current parameter server endpoint.
- Returns
the program for current parameter server to run.
- Return type
Program
Examples
import paddle.fluid as fluid #this is an example, find available endpoints in your case pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174" current_endpoint = "192.168.0.1:6174" trainer_id = 0 trainers = 4 t = fluid.DistributeTranspiler() t.transpile( trainer_id, pservers=pserver_endpoints, trainers=trainers) pserver_program = t.get_pserver_program(current_endpoint)
-
get_pserver_programs
(endpoint) Get pserver side main program and startup program for distributed training.
- Parameters
endpoint (str) – current pserver endpoint.
- Returns
(main_program, startup_program), of type “Program”
- Return type
tuple
Examples
import paddle.fluid as fluid #this is an example, find available endpoints in your case pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174" current_endpoint = "192.168.0.1:6174" trainer_id = 0 trainers = 4 t = fluid.DistributeTranspiler() t.transpile( trainer_id, pservers=pserver_endpoints, trainers=trainers) pserver_program, pserver_startup_program = t.get_pserver_programs(current_endpoint)
-
get_startup_program
(endpoint, pserver_program=None, startup_program=None) Deprecated
Get startup program for current parameter server. Modify operator input variables if there are variables that were split to several blocks.
- Parameters
endpoint (str) – current pserver endpoint.
pserver_program (Program) – deprecated, call get_pserver_program first.
startup_program (Program) – deprecated, should pass startup_program when initalizing
- Returns
parameter server side startup program.
- Return type
Program
Examples
pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174" trainer_endpoints = "192.168.0.1:6174,192.168.0.2:6174" current_endpoint = "192.168.0.1:6174" trainer_id = 0 trainers = 4 t = fluid.DistributeTranspiler() t.transpile(trainer_id, pservers=pserver_endpoints, trainers=trainers) pserver_program = t.get_pserver_program(current_endpoint) pserver_startup_program = t.get_startup_program(current_endpoint, pserver_program)
-
DistributeTranspilerConfig¶
-
class
paddle.fluid.
DistributeTranspilerConfig
[source] -
slice_var_up
(bool)¶ Do Tensor slice for pservers, default is True.
-
split_method
(PSDispatcher)¶ RoundRobin or HashName can be used. Try to choose the best method to balance loads for pservers.
-
min_block_size
(int)¶ Minimum number of splitted elements in block.
According to : https://github.com/PaddlePaddle/Paddle/issues/8638#issuecomment-369912156 We can use bandwidth effiently when data size is larger than 2MB.If you want to change it, please be sure you have read the slice_variable function.
Examples
config = fluid.DistributeTranspilerConfig() config.slice_var_up = True
-
ExecutionStrategy¶
-
class
paddle.fluid.
ExecutionStrategy
ExecutionStrategy allows the user to more preciously control how to run the program in ParallelExecutor by setting the property.
Examples
import paddle.fluid as fluid x = fluid.layers.data(name='x', shape=[13], dtype='float32') y = fluid.layers.data(name='y', shape=[1], dtype='float32') y_predict = fluid.layers.fc(input=x, size=1, act=None) cost = fluid.layers.square_error_cost(input=y_predict, label=y) avg_loss = fluid.layers.mean(cost) sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001) sgd_optimizer.minimize(avg_loss) exec_strategy = fluid.ExecutionStrategy() exec_strategy.num_threads = 4 train_exe = fluid.ParallelExecutor(use_cuda=False, loss_name=avg_loss.name, exec_strategy=exec_strategy)
-
allow_op_delay
The type is BOOL, allow_op_delay represents whether to delay the communication operators to run, it may make the execution faster. Note that this option is invalid now, and it will be removed in next version. Default False.
-
num_iteration_per_drop_scope
The type is INT, num_iteration_per_drop_scope indicates how many iterations to clean up the temp variables which is generated during execution. It may make the execution faster, because the temp variable’s shape maybe the same between two iterations. Default 1.
Notes
If you fetch data when calling the ‘run’, the ParallelExecutor will clean up the temp variables at the end of the current iteration.
In some NLP model, it may cause the GPU memory is insufficient, in this case, you should reduce num_iteration_per_drop_scope.
-
num_iteration_per_run
This config that how many iteration the executor will run when user call pe.run() in python
-
num_threads
The type is INT, num_threads represents the size of thread pool that used to run the operators of the current program in ParallelExecutor. If \(num\_threads=1\), all the operators will execute one by one, but the order maybe difference between iterations. If it is not set, it will be set in ParallelExecutor according to the device type and device count, for GPU, \(num\_threads=device\_count*4\), for CPU, \(num\_threads=CPU\_NUM*4\), the explanation of:math:CPU_NUM is in ParallelExecutor. if it is not set, ParallelExecutor will get the cpu count by calling multiprocessing.cpu_count(). Default 0.
-
Executor¶
-
class
paddle.fluid.
Executor
(place)[source] An Executor in Python, supports single/multiple-GPU running, and single/multiple-CPU running. Python executor takes a program, adds feed operators and fetch operators to this program according to feed map and fetch_list. Feed map provides input data for the program. fetch_list provides the variables(or names) that user wants to get after program runs. Note: the executor will run all operators in the program but not only the operators dependent by the fetch_list. It stores the global variables into the global scope, and creates a local scope for the temporary variables. The contents in local scope may be discarded after every minibatch forward/backward finished. But the global scope variables will be persistent through different runs.
Examples
import paddle.fluid as fluid import paddle.fluid.compiler as compiler import numpy import os use_cuda = True place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() exe = fluid.Executor(place) train_program = fluid.Program() startup_program = fluid.Program() with fluid.program_guard(train_program, startup_program): data = fluid.layers.data(name='X', shape=[1], dtype='float32') hidden = fluid.layers.fc(input=data, size=10) loss = fluid.layers.mean(hidden) fluid.optimizer.SGD(learning_rate=0.01).minimize(loss) # Run the startup program once and only once. # Not need to optimize/compile the startup program. startup_program.random_seed=1 exe.run(startup_program) # Run the main program directly without compile. x = numpy.random.random(size=(10, 1)).astype('float32') loss_data, = exe.run(train_program, feed={"X": x}, fetch_list=[loss.name]) # Or, compiled the program and run. See `CompiledProgram` # for more detail. # NOTE: If you use CPU to run the program, you need # to specify the CPU_NUM, otherwise, fluid will use # all the number of the logic core as the CPU_NUM, # in that case, the batch size of the input should be # greater than CPU_NUM, if not, the process will be # failed by an exception. if not use_cuda: os.environ['CPU_NUM'] = str(2) compiled_prog = compiler.CompiledProgram( train_program).with_data_parallel( loss_name=loss.name) loss_data, = exe.run(compiled_prog, feed={"X": x}, fetch_list=[loss.name])
- Parameters
place (fluid.CPUPlace|fluid.CUDAPlace(n)) – indicate the executor run on which device.
-
close
() Close this executor.
You can no longer use this executor after calling this method. For the distributed training, this method would free the resource on PServers related to the current Trainer.
Examples
import paddle.fluid as fluid cpu = fluid.CPUPlace() exe = fluid.Executor(cpu) # execute training or testing exe.close()
-
run
(program=None, feed=None, fetch_list=None, feed_var_name='feed', fetch_var_name='fetch', scope=None, return_numpy=True, use_program_cache=False) Run program by this Executor. Feed data by feed map, fetch result by fetch_list. Python executor takes a program, add feed operators and fetch operators to this program according to feed map and fetch_list. Feed map provides input data for the program. fetch_list provides the variables(or names) that user want to get after program run.
Note: the executor will run all operators in the program but not only the operators dependent by the fetch_list.
Examples
import paddle.fluid as fluid import numpy # First create the Executor. place = fluid.CPUPlace() # fluid.CUDAPlace(0) exe = fluid.Executor(place) data = fluid.layers.data(name='X', shape=[1], dtype='float32') hidden = fluid.layers.fc(input=data, size=10) loss = fluid.layers.mean(hidden) adam = fluid.optimizer.Adam() adam.minimize(loss) # Run the startup program once and only once. exe.run(fluid.default_startup_program()) x = numpy.random.random(size=(10, 1)).astype('float32') outs = exe.run(feed={'X': x}, fetch_list=[loss.name])
- Parameters
program (Program|CompiledProgram) – the program that need to run, if not provided, then default_main_program (not compiled) will be used.
feed (dict) – feed variable map, e.g. {“image”: ImageData, “label”: LabelData}
fetch_list (list) – a list of variable or variable names that user wants to get, this method will return them according to this list.
feed_var_name (str) – the name for the input variable of feed Operator.
fetch_var_name (str) – the name for the output variable of fetch Operator.
scope (Scope) – the scope used to run this program, you can switch it to different scope. default is global_scope
return_numpy (bool) – if convert the fetched tensor to numpy
use_program_cache (bool) – whether to use the cached program settings across batches. Setting it be true would be faster only when (1) the program is not compiled with data parallel, and (2) program, feed variable names and fetch_list variable names do not changed compared to the last step.
- Returns
fetch result according to fetch_list.
- Return type
list(numpy.array)
-
infer_from_dataset
(program=None, dataset=None, scope=None, thread=0, debug=False, fetch_list=None, fetch_info=None, print_period=100) The document of infer_from_dataset is almost the same as train_from_dataset, except that in distributed training, push gradients will be disabled in infer_from_dataset. infer_from_dataset() can be used for evaluation in multi-thread very easily.
- Parameters
program (Program|CompiledProgram) – the program that needs to be run, if not provided, then default_main_program (not compiled) will be used.
dataset (paddle.fluid.Dataset) – dataset created outside this function, a user should provide a well-defined dataset before calling this function. Please check the document of Dataset if needed. default is None
scope (Scope) – the scope used to run this program, you can switch it to different scope for each run. default is global_scope
thread (int) – number of thread a user wants to run in this function. The actual number of thread will be min(Dataset.thread_num, thread) if thread > 0, default is 0
debug (bool) – whether a user wants to run infer_from_dataset, default is False
fetch_list (Variable List) – fetch variable list, each variable will be printed during training, default is None
fetch_info (String List) – print information for each variable, default is None
print_period (int) – the number of mini-batches for each print, default is 100
- Returns
None
Examples
import paddle.fluid as fluid place = fluid.CPUPlace() # you can set place = fluid.CUDAPlace(0) to use gpu exe = fluid.Executor(place) x = fluid.layers.data(name="x", shape=[10, 10], dtype="int64") y = fluid.layers.data(name="y", shape=[1], dtype="int64", lod_level=1) dataset = fluid.DatasetFactory().create_dataset() dataset.set_use_var([x, y]) dataset.set_thread(1) filelist = [] # you should set your own filelist, e.g. filelist = ["dataA.txt"] dataset.set_filelist(filelist) exe.run(fluid.default_startup_program()) exe.infer_from_dataset(program=fluid.default_main_program(), dataset=dataset)
-
train_from_dataset
(program=None, dataset=None, scope=None, thread=0, debug=False, fetch_list=None, fetch_info=None, print_period=100) Train from a pre-defined Dataset. Dataset is defined in paddle.fluid.dataset. Given a program, either a program or compiled program, train_from_dataset will consume all data samples in dataset. Input scope can be given by users. By default, scope is global_scope(). The total number of thread run in training is thread. Thread number used in training will be minimum value of threadnum in Dataset and the value of thread in this interface. Debug can be set so that executor will display Run-Time for all operators and the throughputs of current training task.
Note: train_from_dataset will destroy all resources created within executor for each run.
- Parameters
program (Program|CompiledProgram) – the program that needs to be run, if not provided, then default_main_program (not compiled) will be used.
dataset (paddle.fluid.Dataset) – dataset created outside this function, a user should provide a well-defined dataset before calling this function. Please check the document of Dataset if needed.
scope (Scope) – the scope used to run this program, you can switch it to different scope for each run. default is global_scope
thread (int) – number of thread a user wants to run in this function. The actual number of thread will be min(Dataset.thread_num, thread)
debug (bool) – whether a user wants to run train_from_dataset
fetch_list (Variable List) – fetch variable list, each variable will be printed during training
fetch_info (String List) – print information for each variable
print_period (int) – the number of mini-batches for each print
- Returns
None
Examples
import paddle.fluid as fluid place = fluid.CPUPlace() # you can set place = fluid.CUDAPlace(0) to use gpu exe = fluid.Executor(place) x = fluid.layers.data(name="x", shape=[10, 10], dtype="int64") y = fluid.layers.data(name="y", shape=[1], dtype="int64", lod_level=1) dataset = fluid.DatasetFactory().create_dataset() dataset.set_use_var([x, y]) dataset.set_thread(1) filelist = [] # you should set your own filelist, e.g. filelist = ["dataA.txt"] dataset.set_filelist(filelist) exe.run(fluid.default_startup_program()) exe.train_from_dataset(program=fluid.default_main_program(), dataset=dataset)
global_scope¶
-
paddle.fluid.
global_scope
()[source] Get the global/default scope instance. There are a lot of APIs use
global_scope
as its default value, e.g.,Executor.run
Examples
import paddle.fluid as fluid import numpy fluid.global_scope().var("data").get_tensor().set(numpy.ones((2, 2)), fluid.CPUPlace()) numpy.array(fluid.global_scope().find_var("data").get_tensor())
- Returns
The global/default scope instance.
- Return type
Scope
gradients¶
-
paddle.fluid.
gradients
(targets, inputs, target_gradients=None, no_grad_set=None)[source] Backpropagate the gradients of targets to inputs.
- Parameters
targets (Variable|list[Variable]) – The target variables.
inputs (Variable|list[Variable]) – The input variables.
target_gradients (Variable|list[Variable]|None) – The gradient variables of targets which has the same shape with targets, If None, ones will be created for them.
no_grad_set (set[string]) – The names of variables that have no gradients in Block 0. All variables with stop_gradient=True from all blocks will be automatically added.
- Returns
A list of gradients for inputs If an input does not affect targets, the corresponding gradient variable will be None.
- Return type
(list[Variable])
Examples
import paddle.fluid as fluid x = fluid.layers.data(name='x', shape=[2,8,8], dtype='float32') x.stop_gradient=False y = fluid.layers.conv2d(x, 4, 1, bias_attr=False) y = fluid.layers.relu(y) y = fluid.layers.conv2d(y, 4, 1, bias_attr=False) y = fluid.layers.relu(y) z = fluid.gradients([y], x) print(z)
in_dygraph_mode¶
-
paddle.fluid.
in_dygraph_mode
()[source] Check program status(tracer), Whether it runs in dygraph mode or not
- Returns
True if the program is running in dynamic graph mode
- Return type
out (boolean)
Examples
import paddle.fluid as fluid if fluid.in_dygraph_mode(): pass
LoDTensor¶
-
class
paddle.fluid.
LoDTensor
LoDTensor is a Tensor with optional LoD information.
np.array(lod_tensor) can convert LoDTensor to numpy array. lod_tensor.lod() can retrieve the LoD information.
LoD is short for Level of Details and is usually used for varied sequence length. You can skip the following comment if you don’t need optional LoD.
For example, a LoDTensor X can look like the example below. It contains 2 sequences. The first has length 2 and the second has length 3, as described by x.lod.
The first tensor dimension 5=2+3 is calculated from LoD if it’s available. It means the total number of sequence element. In X, each element has 2 columns, hence [5, 2].
x.lod = [[2, 3]]
x.data = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
x.shape = [5, 2]
LoD can have multiple levels (for example, a paragraph can have multiple sentences and a sentence can have multiple words). In the following LodTensor Y, the lod_level is 2. It means there are 2 sequence, the first sequence length is 2 (has 2 sub-sequences), the second one’s length is 1. The first sequence’s 2 sub-sequences have length 2 and 2, respectively. And the second sequence’s 1 sub-sequence has length 3.
y.lod = [[2 1], [2 2 3]]
y.shape = [2+2+3, …]
- Examples:
import paddle.fluid as fluid t = fluid.LoDTensor()
Note
In above description, LoD is length-based. In Paddle internal implementation, lod is offset-based. Hence, internally, y.lod is represented as [[0, 2, 3], [0, 2, 4, 7]] (length-based equivlent would be [[2-0, 3-2], [2-0, 4-2, 7-4]]).
Sometimes LoD is called recursive_sequence_length to be more self-explanatory. In this case, it must be length-based. Due to history reasons. when LoD is called lod in public API, it might be offset-based. Users should be careful about it.
-
has_valid_recursive_sequence_lengths
(self: paddle.fluid.core_avx.LoDTensor) → bool Check whether the lod of the LoDTensor is valid.
- Returns
whether the lod is valid.
- Return type
out (bool)
Examples
import paddle.fluid as fluid import numpy as np t = fluid.LoDTensor() t.set(np.ndarray([5, 30]), fluid.CPUPlace()) t.set_recursive_sequence_lengths([[2, 3]]) print(t.has_valid_recursive_sequence_lengths()) # True
-
lod
(self: paddle.fluid.core_avx.LoDTensor) → List[List[int]] Return the LoD of the LoDTensor.
- Returns
the lod of the LoDTensor.
- Return type
out (List[List[int]])
Examples
import paddle.fluid as fluid import numpy as np t = fluid.LoDTensor() t.set(np.ndarray([5, 30]), fluid.CPUPlace()) t.set_lod([[0, 2, 5]]) print(t.lod()) # [[0, 2, 5]]
-
recursive_sequence_lengths
(self: paddle.fluid.core_avx.LoDTensor) → List[List[int]] Return the sequence length of the LoDTensor corresponding to LoD.
- Returns
the sequence lengths.
- Return type
out (List[List[int])
Examples
import paddle.fluid as fluid import numpy as np t = fluid.LoDTensor() t.set(np.ndarray([5, 30]), fluid.CPUPlace()) t.set_recursive_sequence_lengths([[2, 3]]) print(t.recursive_sequence_lengths()) # [[2, 3]]
-
set
(*args, **kwargs) Overloaded function.
set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[float32], arg1: paddle::platform::CPUPlace) -> None
set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[int32], arg1: paddle::platform::CPUPlace) -> None
set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[float64], arg1: paddle::platform::CPUPlace) -> None
set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[int64], arg1: paddle::platform::CPUPlace) -> None
set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[bool], arg1: paddle::platform::CPUPlace) -> None
set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[uint16], arg1: paddle::platform::CPUPlace) -> None
set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[uint8], arg1: paddle::platform::CPUPlace) -> None
set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[int8], arg1: paddle::platform::CPUPlace) -> None
-
set_lod
(self: paddle.fluid.core_avx.LoDTensor, lod: List[List[int]]) → None Set LoD of the LoDTensor.
- Parameters
lod (List[List[int]]) – the lod to be set.
Examples
import paddle.fluid as fluid import numpy as np t = fluid.LoDTensor() t.set(np.ndarray([5, 30]), fluid.CPUPlace()) t.set_lod([[0, 2, 5]])
-
set_recursive_sequence_lengths
(self: paddle.fluid.core_avx.LoDTensor, recursive_sequence_lengths: List[List[int]]) → None Set LoD of the LoDTensor according to recursive sequence length.
For example, if recursive_sequence_lengths=[[2, 3]], meaning that there are two sequences with length 2 and 3 respectively, the corresponding lod would be [[0, 2, 2+3]], i.e, [[0, 2, 5]].
- Parameters
recursive_sequence_lengths (List[List[int]]) – sequence lengths.
Examples
import paddle.fluid as fluid import numpy as np t = fluid.LoDTensor() t.set(np.ndarray([5, 30]), fluid.CPUPlace()) t.set_recursive_sequence_lengths([[2, 3]])
-
shape
(self: paddle.fluid.core_avx.Tensor) → List[int]
LoDTensorArray¶
-
class
paddle.fluid.
LoDTensorArray
Array of LoDTensor.
Examples
import paddle.fluid as fluid arr = fluid.LoDTensorArray()
-
append
(self: paddle.fluid.core_avx.LoDTensorArray, tensor: paddle.fluid.core_avx.LoDTensor) → None Append a LoDensor to LoDTensorArray.
Examples
import paddle.fluid as fluid import numpy as np arr = fluid.LoDTensorArray() t = fluid.LoDTensor() t.set(np.ndarray([5, 30]), fluid.CPUPlace()) arr.append(t)
-
memory_optimize¶
-
paddle.fluid.
memory_optimize
(input_program, skip_opt_set=None, print_log=False, level=0, skip_grads=True)[source] - Legacy memory optimization strategy, reduce total memory consumption by reuse variable memory between different operators.Simple sample to explain the algorithm:
c = a + b # assume this is the last time a is used d = b * c
since a will not be used anymore after “c = a + b”, and the size of a and d are the same, we can use variable a to replace variable d, so actually we can optimize the above code to below:c = a + b a = b * c
Please notice that, in this legacy design, we are using variable a to replace d directly, which means after you call this API, some variables may disappear, and some variables may hold unexpected values, like the above case, actually a holds the value of d after execution.So to protect important variables from being reused/removed in the optimization, we provide skip_opt_set to allow you specify a variable whitelist. The variables in the skip_opt_set will not be affected by memory_optimize API.Note
This API is deprecated, please avoid to use it in your new code.Does not support operators which will create sub-block like While, IfElse etc.- Parameters
input_program (str) – Input Program
skip_opt_set (set) – vars wil be skipped in memory optimze
print_log (bool) – whether to print debug log.
level (int) – 0 or 1, 0 means we replace a with b only when a.size == b.size, 1 means we can replace a with b if a.size <= b.size
- Returns
None
Examples
import paddle.fluid as fluid main_prog = fluid.Program() startup_prog = fluid.Program() place = fluid.CPUPlace() exe = fluid.Executor(place) exe.run(startup_prog) fluid.memory_optimize(main_prog)
name_scope¶
-
paddle.fluid.
name_scope
(prefix=None)[source] Generate hierarchical name prefix for the operators.
Note: This should only used for debugging and visualization purpose. Don’t use it for serious analysis such as graph/program transformations.
- Parameters
prefix (str) – prefix.
Examples
import paddle.fluid as fluid with fluid.name_scope("s1"): a = fluid.layers.data(name='data', shape=[1], dtype='int32') b = a + 1 with fluid.name_scope("s2"): c = b * 1 with fluid.name_scope("s3"): d = c / 1 with fluid.name_scope("s1"): f = fluid.layers.pow(d, 2.0) with fluid.name_scope("s4"): g = f - 1
ParallelExecutor¶
-
class
paddle.fluid.
ParallelExecutor
(use_cuda, loss_name=None, main_program=None, share_vars_from=None, exec_strategy=None, build_strategy=None, num_trainers=1, trainer_id=0, scope=None)[source] ParallelExecutor is designed for data parallelism, which focuses on distributing the data across different nodes and every node operates on the data in parallel. If you use ParallelExecutor to run the current program on GPU, the node means GPU device, and ParallelExecutor will get the available GPU device automatically on the current machine. If you use ParallelExecutor to run the current program on CPU, the node means the CPU device, and you can specify the CPU device number by adding ‘CPU_NUM’ environment variable, for example ‘CPU_NUM=4’, if the environment variable is not found, ParallelExecutor will call multiprocessing.cpu_count to get the number of CPUs in the system.
Examples
import paddle.fluid as fluid import numpy import os use_cuda = True place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() # NOTE: If you use CPU to run the program, you need # to specify the CPU_NUM, otherwise, fluid will use # all the number of the logic core as the CPU_NUM, # in that case, the batch size of the input should be # greater than CPU_NUM, if not, the process will be # failed by an exception. if not use_cuda: os.environ['CPU_NUM'] = str(2) exe = fluid.Executor(place) train_program = fluid.Program() startup_program = fluid.Program() with fluid.program_guard(train_program, startup_program): data = fluid.layers.data(name='X', shape=[1], dtype='float32') hidden = fluid.layers.fc(input=data, size=10) loss = fluid.layers.mean(hidden) test_program = fluid.default_main_program().clone(for_test=True) fluid.optimizer.SGD(learning_rate=0.01).minimize(loss) startup_program.random_seed=1 exe.run(startup_program) train_exe = fluid.ParallelExecutor(use_cuda=use_cuda, main_program=train_program, loss_name=loss.name) test_exe = fluid.ParallelExecutor(use_cuda=use_cuda, main_program=test_program, share_vars_from=train_exe) x = numpy.random.random(size=(10, 1)).astype('float32') loss_data, = train_exe.run(feed={"X": x}, fetch_list=[loss.name]) loss_data, = test_exe.run(feed={"X": x}, fetch_list=[loss.name])
- Parameters
use_cuda (bool) – Whether to use CUDA or not.
loss_name (str) – The loss name must set in training. Default None.
main_program (Program) – The program that need to run, if not provided, then default_main_program will be used. Default None.
share_vars_from (ParallelExecutor) – If provide, it will share variables from the specified ParallelExecutor. Default None.
exec_strategy (ExecutionStrategy) – exec_strategy is used to control how to run the program in ParallelExecutor, for example how many threads are used to execute the program, how many iterations to clean up the temp variables which is generated during execution. For more information, please refer to fluid.ExecutionStrategy. Default None.
build_strategy (BuildStrategy) – build_strategy is used to control how to build the SSA Graph in ParallelExecutor by setting the property, for example reduce_strategy, gradient_scale_strategy. For more information, please refer to fluid.BuildStrategy. Default None.
num_trainers (int) – If greater than 1, NCCL will be initialized with multiple rank of nodes, each node should have same number of GPUs. Distributed training will be enabled then. Default 1.
trainer_id (int) – Must use together with num_trainers. trainer_id is the “rank” of current node starts from 0. Default 0.
scope (Scope) – scope to run with, default use fluid.global_scope().
- Returns
The initialized ParallelExecutor object.
- Return type
ParallelExecutor
- Raises
TypeError
– If share_vars_from is provided, but not ParallelExecutor object.
-
run
(fetch_list, feed=None, feed_dict=None, return_numpy=True) Run a parallel executor with fetch_list.
The feed parameter can be a dict or a list. If feed is a dict, the feed data will be split into multiple devices. If feed is a list, we assume the data has been splitted into multiple devices, the each element in the list will be copied to each device directly.
Examples
import paddle.fluid as fluid import numpy import os use_cuda = True place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() # NOTE: If you use CPU to run the program, you need # to specify the CPU_NUM, otherwise, fluid will use # all the number of the logic core as the CPU_NUM, # in that case, the batch size of the input should be # greater than CPU_NUM, if not, the process will be # failed by an exception. if not use_cuda: os.environ['CPU_NUM'] = str(2) exe = fluid.Executor(place) train_program = fluid.Program() startup_program = fluid.Program() with fluid.program_guard(train_program, startup_program): data = fluid.layers.data(name='X', shape=[1], dtype='float32') hidden = fluid.layers.fc(input=data, size=10) loss = fluid.layers.mean(hidden) fluid.optimizer.SGD(learning_rate=0.01).minimize(loss) startup_program.random_seed=1 exe.run(startup_program) train_exe = fluid.ParallelExecutor(use_cuda=use_cuda, main_program=train_program, loss_name=loss.name) # If the feed is a dict: # the image will be splitted into devices. If there is two devices # each device will process an image with shape (5, 1) x = numpy.random.random(size=(10, 1)).astype('float32') loss_data, = train_exe.run(feed={"X": x}, fetch_list=[loss.name]) # If the feed is a list: # each device will process each element in the list. # the 1st device will process an image with shape (10, 1) # the 2nd device will process an image with shape (9, 1) # # you can use exe.device_count to get the device number. x2 = numpy.random.random(size=(9, 1)).astype('float32') loss_data, = train_exe.run(feed=[{"X": x}, {"X": x2}], fetch_list=[loss.name])
- Parameters
fetch_list (list) – The fetched variable names
feed (list|dict|None) – The feed variables. If the feed is a dict, tensors in that dict will be splitted into each devices. If the feed is a list, each element of the list will be copied to each device. Default None.
feed_dict – Alias for feed parameter, for backward compatibility. This parameter has been deprecated. Default None.
return_numpy (bool) – Whether converts the fetched tensor to numpy. Default: True.
- Returns
The fetched result list.
- Return type
List
- Raises
ValueError
– If the feed is a list, but its length is not equal the length of active places, or its element’s is not dict.
Notes
If the feed’s type is dict, the number of data that feeds to ParallelExecutor must be bigger than active places. Otherwise, it will throw exception from C++ side. Special attention should be paid to check whether the last batch of the dataset is bigger than active places.
If active places are more than one, the fetch results for each variable is a list, and each element of this list is the variable of respective active place.
Examples
pe = fluid.ParallelExecutor(use_cuda=use_cuda, loss_name=avg_cost.name, main_program=fluid.default_main_program()) loss = pe.run(feed=feeder.feed(cur_batch), fetch_list=[avg_cost.name]))
-
drop_local_exe_scopes
() Drop the local execution scope immediately.
During the execution of the Program, the generate intermediate results are placed in local execution scope, in some model the creation and deletion of those intermediate results are time-consuming. To resolve that problem, ParallelExecutor provides an option in ExecutionStrategy, i.g. num_iteration_per_drop_scope, this option indicates how many iterations to run before dropping the local execution scope. But in some situation, each iteration generates different intermediate results, it will lead to the result that the memory which is needed by local execution scope gradually increase. And if you want to run another program at this time, there may be insufficient storage, At this point you should drop the local execution scope of other Programs.
Examples
import paddle.fluid as fluid import numpy import os use_cuda = True # NOTE: If you use CPU to run the program, you need # to specify the CPU_NUM, otherwise, fluid will use # all the number of the logic core as the CPU_NUM, # in that case, the batch size of the input should be # greater than CPU_NUM, if not, the process will be # failed by an exception. if not use_cuda: os.environ['CPU_NUM'] = str(2) train_program = fluid.Program() startup_program = fluid.Program() with fluid.program_guard(train_program, startup_program): data = fluid.layers.data(name='X', shape=[1], dtype='float32') hidden = fluid.layers.fc(input=data, size=10) loss = fluid.layers.mean(hidden) place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() exe = fluid.Executor(place) exe.run(startup_program) parallel_exe = fluid.ParallelExecutor(use_cuda=use_cuda, main_program=train_program, loss_name=loss.name) x = numpy.random.random(size=(10, 1)).astype('float32') loss_data, = parallel_exe.run(feed={"X": x}, fetch_list=[loss.name]) parallel_exe.drop_local_exe_scopes()
ParamAttr¶
-
class
paddle.fluid.
ParamAttr
(name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)[source] Parameter attributes object. To fine-tuning network training process, user can set parameter’s attributes to control training details. Such as learning rate, regularization, trainable, do_model_average and the method to initialize param.
- Parameters
name (str) – The parameter’s name. Default None.
initializer (Initializer) – The method to initial this parameter. Default None.
learning_rate (float) – The parameter’s learning rate. The learning rate when optimize is \(global\_lr * parameter\_lr * scheduler\_factor\). Default 1.0.
regularizer (WeightDecayRegularizer) – Regularization factor. Default None.
trainable (bool) – Whether this parameter is trainable. Default True.
gradient_clip (BaseGradientClipAttr) – The method to clip this parameter’s gradient. Default None.
do_model_average (bool) – Whether this parameter should do model average. Default False.
Examples
import paddle.fluid as fluid w_param_attrs = fluid.ParamAttr(name="fc_weight", learning_rate=0.5, regularizer=fluid.regularizer.L2Decay(1.0), trainable=True) x = fluid.layers.data(name='X', shape=[1], dtype='float32') y_predict = fluid.layers.fc(input=x, size=10, param_attr=w_param_attrs)
Program¶
-
class
paddle.fluid.
Program
[source] Python Program. Beneath it is a ProgramDesc, which is used for create c++ Program. A program is a self-contained programing language like container. It has at least one Block, when the control flow op like conditional_block, while_op is included, it will contains nested block. Please reference the framework.proto for details.
Notes: we have default_startup_program and default_main_program by default, a pair of them will shared the parameters. The default_startup_program only run once to initialize parameters, default_main_program run in every mini batch and adjust the weights.
- Returns
A empty program.
Examples
import paddle.fluid as fluid main_program = fluid.Program() startup_program = fluid.Program() with fluid.program_guard(main_program=main_program, startup_program=startup_program): x = fluid.layers.data(name="x", shape=[-1, 784], dtype='float32') y = fluid.layers.data(name="y", shape=[-1, 1], dtype='int32') z = fluid.layers.fc(name="fc", input=x, size=10, act="relu") print("main program is: {}".format(main_program)) print("start up program is: {}".format(startup_program))
-
to_string
(throw_on_error, with_details=False) To debug string.
- Parameters
throw_on_error (bool) – raise Value error when any of required fields is not set.
with_details (bool) – True if more details about variables and parameters, e.g.,
trainable
,optimize_attr
, need to print.
- Returns
The debug string.
- Return type
str
- Raises
ValueError
– If any of required fields is not set and throw_on_error is True.
Examples
import paddle.fluid as fluid prog = fluid.default_main_program() prog_string = prog.to_string(throw_on_error=True, with_details=False) print(prog_string)
-
clone
(for_test=False) Create a new, duplicated program.
Some operators, e.g.,
batch_norm
, behave differently between training and testing. They have an attribute,is_test
, to control this behaviour. This method will change theis_test
attribute of them toTrue
whenfor_test=True
.Set for_test to False when we want to clone the program for training.
Set for_test to True when we want to clone the program for testing. We will not do any prune on program here, So if you just want an forward program for testing, please use
clone
before usingOpimizer.minimize
Notes: 1.
Program.clone()
method DOES NOT clonepy_reader
. 2. This API DOES NOT prune any operator. Useclone(for_test=True)
before backward and optimization please. E.g.test_program = fluid.default_main_program().clone(for_test=True) optimizer = fluid.optimizer.Momentum(learning_rate=0.01, momentum=0.9) optimizer.minimize()
- Parameters
for_test (bool) – True if change the
is_test
attribute of operators toTrue
.- Returns
The new, duplicated Program object.
- Return type
Program
Examples:
Notes: The Program Descs’ order maybe different after
clone
and this will not affect your training or testing progress. In the following example we give you an simple methodprint_prog(program)
to print Program Descs inorder to make sure you have same print result afterclone
:import paddle.fluid as fluid import six def print_prog(prog): for name, value in sorted(six.iteritems(prog.block(0).vars)): print(value) for op in prog.block(0).ops: print("op type is {}".format(op.type)) print("op inputs are {}".format(op.input_arg_names)) print("op outputs are {}".format(op.output_arg_names)) for key, value in sorted(six.iteritems(op.all_attrs())): if key not in ['op_callstack', 'op_role_var']: print(" [ attrs: {}: {} ]".format(key, value))
- To clone a test program, the sample code is:
import paddle.fluid as fluid import six def print_prog(prog): for name, value in sorted(six.iteritems(prog.block(0).vars)): print(value) for op in prog.block(0).ops: print("op type is {}".format(op.type)) print("op inputs are {}".format(op.input_arg_names)) print("op outputs are {}".format(op.output_arg_names)) for key, value in sorted(six.iteritems(op.all_attrs())): if key not in ['op_callstack', 'op_role_var']: print(" [ attrs: {}: {} ]".format(key, value)) train_program = fluid.Program() startup_program = fluid.Program() with fluid.program_guard(train_program, startup_program): with fluid.unique_name.guard(): img = fluid.layers.data(name='image', shape=[784]) hidden = fluid.layers.fc(input=img, size=200, act='relu') hidden = fluid.layers.dropout(hidden, dropout_prob=0.5) loss = fluid.layers.cross_entropy( input=fluid.layers.fc(hidden, size=10, act='softmax'), label=fluid.layers.data(name='label', shape=[1], dtype='int64')) avg_loss = fluid.layers.mean(loss) test_program = train_program.clone(for_test=False) print_prog(test_program) with fluid.program_guard(train_program, startup_program): with fluid.unique_name.guard(): sgd = fluid.optimizer.SGD(learning_rate=1e-3) sgd.minimize(avg_loss)
- The clone method can be avoid if you create program for training and program for testing individually.
import paddle.fluid as fluid import six def print_prog(prog): for name, value in sorted(six.iteritems(prog.block(0).vars)): print(value) for op in prog.block(0).ops: print("op type is {}".format(op.type)) print("op inputs are {}".format(op.input_arg_names)) print("op outputs are {}".format(op.output_arg_names)) for key, value in sorted(six.iteritems(op.all_attrs())): if key not in ['op_callstack', 'op_role_var']: print(" [ attrs: {}: {} ]".format(key, value)) def network(is_test): img = fluid.layers.data(name='image', shape=[784]) hidden = fluid.layers.fc(input=img, size=200, act='relu') hidden = fluid.layers.dropout(hidden, dropout_prob=0.5) loss = fluid.layers.cross_entropy( input=fluid.layers.fc(hidden, size=10, act='softmax'), label=fluid.layers.data(name='label', shape=[1], dtype='int64')) avg_loss = fluid.layers.mean(loss) return avg_loss train_program_2 = fluid.Program() startup_program_2 = fluid.Program() test_program_2 = fluid.Program() with fluid.program_guard(train_program_2, startup_program_2): with fluid.unique_name.guard(): sgd = fluid.optimizer.SGD(learning_rate=1e-3) sgd.minimize(avg_loss) # the test startup program is not used. with fluid.program_guard(test_program_2, fluid.Program()): with fluid.unique_name.guard(): loss = network(is_test=True) print(test_program_2)
The two code snippets above will generate and print same programs.
-
static
parse_from_string
(binary_str) Deserialize a program desc from protobuf binary string.
Notes: All information about parameters will be lost after serialization and deserialization.
- Parameters
binary_str_type (str) – The binary prootbuf string.
- Returns
A deserialized program desc.
- Return type
Program
-
num_blocks
The number of blocks in this program.
Examples
import paddle.fluid as fluid prog = fluid.default_main_program() num_blocks = prog.num_blocks print(num_blocks)
-
random_seed
The default random seed for random operators in Program. Zero means get the random seed from random device.
Notes: It must be set before the operators have been added.
Examples
import paddle.fluid as fluid prog = fluid.default_main_program() random_seed = prog.random_seed print(random_seed) prog.random_seed = 1 print(prog.random_seed)
-
global_block
() Get the first block of this program.
Examples
import paddle.fluid as fluid prog = fluid.default_main_program() gb_block = prog.global_block() print(gb_block)
-
block
(index) Get the
index
block of this program :param index: The index of block to get :type index: int- Returns
The
index
block- Return type
Block
Examples
import paddle.fluid as fluid prog = fluid.default_main_program() block_0 = prog.block(0) print(block_0)
-
current_block
() Get the current block. The
current
block is the block to append operators.Examples
import paddle.fluid as fluid prog = fluid.default_main_program() current_blk = prog.current_block() print(current_blk)
-
list_vars
() Get all variables from this Program. A iterable object is returned.
- Returns
The generator will yield every variable in this program.
- Return type
iterable
Examples
import paddle.fluid as fluid prog = fluid.default_main_program() img = fluid.layers.data(name='img', shape=[1,28,28], dtype='float32') label = fluid.layers.data(name='label', shape=[128,1], dtype='int64') for var in prog.list_vars(): print(var)
program_guard¶
-
paddle.fluid.
program_guard
(main_program, startup_program=None)[source] Change the global main program and startup program with “with” statement. Layer functions in the Python “with” block will append operators and variables to the new main programs.
Examples
import paddle.fluid as fluid main_program = fluid.Program() startup_program = fluid.Program() with fluid.program_guard(main_program, startup_program): data = fluid.layers.data(name='image', shape=[784, 784], dtype='float32') hidden = fluid.layers.fc(input=data, size=10, act='relu')
Notes: The temporary
Program
can be used if the user does not need to construct either of startup program or main program.Examples
import paddle.fluid as fluid main_program = fluid.Program() # does not care about startup program. Just pass a temporary value. with fluid.program_guard(main_program, fluid.Program()): data = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
- Parameters
main_program (Program) – New main program inside “with” statement.
startup_program (Program) – New startup program inside “with” statement. None means not changing startup program.
release_memory¶
-
paddle.fluid.
release_memory
(input_program, skip_opt_set=None)[source] Modify the input program and insert
delete_op
to early drop not used variables. The modification will be performed inplace.Notes: This is an experimental API and could be removed in next few releases. Users should not use this API.
- Parameters
input_program (Program) – The program will be inserted
delete_op
.skip_opt_set (set) – vars wil be skipped in memory optimze
- Returns
None
Examples
import paddle.fluid as fluid # build network # ... # deprecated API fluid.release_memory(fluid.default_main_program())
scope_guard¶
-
paddle.fluid.
scope_guard
(scope)[source] Change the global/default scope instance by Python with statement. All variable in runtime will assigned to the new scope.
- Parameters
scope – The new global/default scope.
Examples
import paddle.fluid as fluid import numpy new_scope = fluid.Scope() with fluid.scope_guard(new_scope): fluid.global_scope().var("data").get_tensor().set(numpy.ones((2, 2)), fluid.CPUPlace()) numpy.array(new_scope.find_var("data").get_tensor())
Tensor¶
-
paddle.fluid.
Tensor
alias of
paddle.fluid.core_avx.LoDTensor
WeightNormParamAttr¶
-
class
paddle.fluid.
WeightNormParamAttr
(dim=None, name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)[source] Used for weight Norm. Weight Norm is a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. Weight Norm has been implemented as discussed in this paper: Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks.
- Parameters
dim (list) – The parameter’s name. Default None.
name (str) – The parameter’s name. Default None.
initializer (Initializer) – The method to initial this parameter. Default None.
learning_rate (float) – The parameter’s learning rate. The learning rate when optimize is \(global\_lr * parameter\_lr * scheduler\_factor\). Default 1.0.
regularizer (WeightDecayRegularizer) – Regularization factor. Default None.
trainable (bool) – Whether this parameter is trainable. Default True.
gradient_clip (BaseGradientClipAttr) – The method to clip this parameter’s gradient. Default None.
do_model_average (bool) – Whether this parameter should do model average. Default False.
Examples
import paddle.fluid as fluid data = fluid.layers.data(name="data", shape=[3, 32, 32], dtype="float32") fc = fluid.layers.fc(input=data, size=1000, param_attr=fluid.WeightNormParamAttr( dim=None, name='weight_norm_param'))