tf.experimental.dtensor.initialize_accelerator_system

本页内容
Args
Returns

Initializes accelerators and communication fabrics for DTensor.

View aliases

Main aliases

tf.experimental.dtensor.initialize_multi_client, tf.experimental.dtensor.initialize_tpu_system

tf.experimental.dtensor.initialize_accelerator_system(
    device_type: Optional[str] = None,
    enable_coordination_service: Optional[bool] = False
) -> str

DTensor configures TensorFlow to run in the local mode or multi-client mode.

In local mode, a mesh can only use devices attached to the current process.
In multi-client mode, a mesh can span across devices from multiple clients.

If DTENSOR_JOBS is non-empty, DTensor configures TensorFlow to run in the multi-client mode using the distributed runtime. In multi-client mode devices on different clients can communicate with each other.

The following environment variables controls the behavior of this function.

DTENSOR_JOBS: string, a comma separated list. Each item in the list is of format {hostname}:{port}. If empty, DTensor runs in the local mode. Examples of valid DTENSOR_JOBS values:
- 4 clients on localhost: localhost:10000,localhost:10001,localhost:10002,localhost:10003
- 2 clients on host1, 2 clients on host2 host1:10000,host1:10001,host2:10000,host2:10003 If the hostnames are BNS addresses, the items must be sorted in alphabetical order.
DTENSOR_CLIENT_ID: integer, between 0 to num_clients - 1, to identify the client id of the current process. The default value is 0.
DTENSOR_JOB_NAME: string, a string for the name of the TensorFlow job. The job name controls the job name section of the TensorFlow DeviceSpecs, e.g., job:worker in /job:worker/replica:0/task:0/device:TPU:0 when the job name is worker. The default value is localhost in local mode, and worker when in the multi-client mode. All DTensor clients within the same multi-client cluster share the same job name.

Args
`device_type`	Type of accelerator to use, can be CPU, GPU, or TPU. If None, uses `tf.experimental.dtensor.preferred_device_type()`.
`enable_coordination_service`	If true, enable distributed coordination service to make sure that workers know the devices on each other, when there is more than 1 client.

Returns
`device_type`	the type of accelerator that was initialized.

tf.experimental.dtensor.initialize_accelerator_system

View aliases

Args

Returns