fluid.transpiler¶
DistributeTranspiler¶
-
class
paddle.fluid.transpiler.
DistributeTranspiler
(config=None)[source] DistributeTranspiler
Convert the fluid program to distributed data-parallelism programs. Supports two modes: pserver mode and nccl2 mode.
In pserver mode, the main_program will be transformed to use a remote parameter server to do parameter optimization. And the optimization graph will be put into a parameter server program.
In nccl2 mode, the transpiler will append a NCCL_ID broadcasting op in startup_program to share the NCCL_ID across the job nodes. After transpile_nccl2 called, you *must* pass trainer_id and num_trainers argument to ParallelExecutor to enable NCCL2 distributed mode.
Examples
x = fluid.layers.data(name='x', shape=[13], dtype='float32') y = fluid.layers.data(name='y', shape=[1], dtype='float32') y_predict = fluid.layers.fc(input=x, size=1, act=None) cost = fluid.layers.square_error_cost(input=y_predict, label=y) avg_loss = fluid.layers.mean(cost) sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001) sgd_optimizer.minimize(avg_loss) # for pserver mode pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174" trainer_endpoints = "192.168.0.1:6174,192.168.0.2:6174" current_endpoint = "192.168.0.1:6174" trainer_id = 0 trainers = 4 role = "PSERVER" t = fluid.DistributeTranspiler() t.transpile( trainer_id, pservers=pserver_endpoints, trainers=trainers) if role == "PSERVER": pserver_program = t.get_pserver_program(current_endpoint) pserver_startup_program = t.get_startup_program(current_endpoint, pserver_program) elif role == "TRAINER": trainer_program = t.get_trainer_program() # for nccl2 mode trainer_num = 2 trainer_id = 0 config = fluid.DistributeTranspilerConfig() config.mode = "nccl2" trainer_endpoints = "192.168.0.1:6174,192.168.0.2:6174" t = fluid.DistributeTranspiler(config=config) t.transpile(trainer_id=trainer_id, trainers=trainer_endpoints, current_endpoint="192.168.0.1:6174") exe = fluid.ParallelExecutor( use_cuda=True, loss_name=avg_loss.name, num_trainers=trainer_num, trainer_id=trainer_id )
-
transpile
(trainer_id, program=None, pservers='127.0.0.1:6174', trainers=1, sync_mode=True, startup_program=None, current_endpoint='127.0.0.1:6174') Run the transpiler. Transpile the input program.
- Parameters
trainer_id (int) – id for current trainer worker, if you have n workers, the id may range from 0 ~ n-1
program (Program|None) – program to transpile, default is fluid.default_main_program().
startup_program (Program|None) – startup_program to transpile, default is fluid.default_startup_program().
pservers (str) – comma separated ip:port string for the pserver list.
trainers (int|str) – in pserver mode this is the number of trainers, in nccl2 mode this is a string of trainer endpoints.
sync_mode (bool) – Do sync training or not, default is True.
startup_program – startup_program to transpile, default is fluid.default_main_program().
current_endpoint (str) – need pass current endpoint when transpile as nccl2 distributed mode. In pserver mode this argument is not used.
Examples
transpiler = fluid.DistributeTranspiler() t.transpile( trainer_id=0, pservers="127.0.0.1:7000,127.0.0.1:7001", trainers=2, sync_mode=False, current_endpoint="127.0.0.1:7000")
-
get_trainer_program
(wait_port=True) Get transpiled trainer side program.
- Returns
trainer side program.
- Return type
Program
Examples
import paddle.fluid as fluid #this is an example, find available endpoints in your case pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174" trainer_id = 0 trainers = 4 t = fluid.DistributeTranspiler() t.transpile(trainer_id, trainers=trainers, pservers=pserver_endpoints) trainer_program = t.get_trainer_program()
-
get_pserver_program
(endpoint) Get parameter server side program.
- Parameters
endpoint (str) – current parameter server endpoint.
- Returns
the program for current parameter server to run.
- Return type
Program
Examples
import paddle.fluid as fluid #this is an example, find available endpoints in your case pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174" current_endpoint = "192.168.0.1:6174" trainer_id = 0 trainers = 4 t = fluid.DistributeTranspiler() t.transpile( trainer_id, pservers=pserver_endpoints, trainers=trainers) pserver_program = t.get_pserver_program(current_endpoint)
-
get_pserver_programs
(endpoint) Get pserver side main program and startup program for distributed training.
- Parameters
endpoint (str) – current pserver endpoint.
- Returns
(main_program, startup_program), of type “Program”
- Return type
tuple
Examples
import paddle.fluid as fluid #this is an example, find available endpoints in your case pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174" current_endpoint = "192.168.0.1:6174" trainer_id = 0 trainers = 4 t = fluid.DistributeTranspiler() t.transpile( trainer_id, pservers=pserver_endpoints, trainers=trainers) pserver_program, pserver_startup_program = t.get_pserver_programs(current_endpoint)
-
get_startup_program
(endpoint, pserver_program=None, startup_program=None) Deprecated
Get startup program for current parameter server. Modify operator input variables if there are variables that were split to several blocks.
- Parameters
endpoint (str) – current pserver endpoint.
pserver_program (Program) – deprecated, call get_pserver_program first.
startup_program (Program) – deprecated, should pass startup_program when initalizing
- Returns
parameter server side startup program.
- Return type
Program
Examples
pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174" trainer_endpoints = "192.168.0.1:6174,192.168.0.2:6174" current_endpoint = "192.168.0.1:6174" trainer_id = 0 trainers = 4 t = fluid.DistributeTranspiler() t.transpile(trainer_id, pservers=pserver_endpoints, trainers=trainers) pserver_program = t.get_pserver_program(current_endpoint) pserver_startup_program = t.get_startup_program(current_endpoint, pserver_program)
-
DistributeTranspilerConfig¶
-
class
paddle.fluid.transpiler.
DistributeTranspilerConfig
[source] -
slice_var_up
(bool)¶ Do Tensor slice for pservers, default is True.
-
split_method
(PSDispatcher)¶ RoundRobin or HashName can be used. Try to choose the best method to balance loads for pservers.
-
min_block_size
(int)¶ Minimum number of splitted elements in block.
According to : https://github.com/PaddlePaddle/Paddle/issues/8638#issuecomment-369912156 We can use bandwidth effiently when data size is larger than 2MB.If you want to change it, please be sure you have read the slice_variable function.
Examples
config = fluid.DistributeTranspilerConfig() config.slice_var_up = True
-
HashName¶
-
class
paddle.fluid.transpiler.
HashName
(pserver_endpoints)[source] Hash variable names to several endpoints using python “hash()” function.
- Parameters
pserver_endpoints (list) – list of endpoint(ip:port).
Examples
pserver_endpoints = [“127.0.0.1:6007”, “127.0.0.1:6008”] vars = [“var1”,”var2”,”var3”,”var4”,”var5”]
rr = RoundRobin(pserver_endpoints) rr.dispatch(vars)
-
dispatch
(varlist) - Parameters
varlist (list) – a list of Variables
- Returns
a map of pserver endpoint -> varname
memory_optimize¶
-
paddle.fluid.transpiler.
memory_optimize
(input_program, skip_opt_set=None, print_log=False, level=0, skip_grads=True)[source] - Legacy memory optimization strategy, reduce total memory consumption by reuse variable memory between different operators.Simple sample to explain the algorithm:
c = a + b # assume this is the last time a is used d = b * c
since a will not be used anymore after “c = a + b”, and the size of a and d are the same, we can use variable a to replace variable d, so actually we can optimize the above code to below:c = a + b a = b * c
Please notice that, in this legacy design, we are using variable a to replace d directly, which means after you call this API, some variables may disappear, and some variables may hold unexpected values, like the above case, actually a holds the value of d after execution.So to protect important variables from being reused/removed in the optimization, we provide skip_opt_set to allow you specify a variable whitelist. The variables in the skip_opt_set will not be affected by memory_optimize API.Note
This API is deprecated, please avoid to use it in your new code.Does not support operators which will create sub-block like While, IfElse etc.- Parameters
input_program (str) – Input Program
skip_opt_set (set) – vars wil be skipped in memory optimze
print_log (bool) – whether to print debug log.
level (int) – 0 or 1, 0 means we replace a with b only when a.size == b.size, 1 means we can replace a with b if a.size <= b.size
- Returns
None
Examples
import paddle.fluid as fluid main_prog = fluid.Program() startup_prog = fluid.Program() place = fluid.CPUPlace() exe = fluid.Executor(place) exe.run(startup_prog) fluid.memory_optimize(main_prog)
release_memory¶
-
paddle.fluid.transpiler.
release_memory
(input_program, skip_opt_set=None)[source] Modify the input program and insert
delete_op
to early drop not used variables. The modification will be performed inplace.Notes: This is an experimental API and could be removed in next few releases. Users should not use this API.
- Parameters
input_program (Program) – The program will be inserted
delete_op
.skip_opt_set (set) – vars wil be skipped in memory optimze
- Returns
None
Examples
import paddle.fluid as fluid # build network # ... # deprecated API fluid.release_memory(fluid.default_main_program())
RoundRobin¶
-
class
paddle.fluid.transpiler.
RoundRobin
(pserver_endpoints)[source] Distribute variables to serveral endpoints using RondRobin<https://en.wikipedia.org/wiki/Round-robin_scheduling> method.
- Parameters
pserver_endpoints (list) – list of endpoint(ip:port).
Examples
pserver_endpoints = [“127.0.0.1:6007”, “127.0.0.1:6008”] vars = [“var1”,”var2”,”var3”,”var4”,”var5”]
rr = RoundRobin(pserver_endpoints) rr.dispatch(vars)
-
dispatch
(varlist) - Parameters
varlist (list) – a list of Variables
- Returns
a map of pserver endpoint -> varname