tff.simulation.datasets.TestClientData

A tff.simulation.datasets.ClientData intended for test purposes.

Inherits From: ClientData

tff.simulation.datasets.TestClientData(
    tensor_slices_dict
)

The implementation is based on tf.data.Dataset.from_tensor_slices. This class is intended only for constructing toy federated datasets, especially to support simulation tests. Using this for large datasets is not recommended, as it requires putting all client data into the underlying TensorFlow graph (which is memory intensive).

Args
`tensor_slices_dict`	A dictionary keyed by client_id, where values are lists, tuples, or dicts for passing to `tf.data.Dataset.from_tensor_slices`. Note that namedtuples and attrs classes are not explicitly supported, but a user can convert their data from those formats to a dict, and then use this class. The leaves of this dictionary must not be `tf.Tensor`s, in order to avoid putting eager tensors into graphs.

Raises
`ValueError`	If a client with no data is found.
`TypeError`	If `tensor_slices_dict` is not a dictionary, or its value structures are namedtuples, or its value structures are not either strictly lists, strictly (standard, non-named) tuples, or strictly dictionaries.
`TypeError`	If any leaf of `tensor_slices_dict` is a `tf.Tensor`.

Attributes
`client_ids`	A list of string identifiers for clients in this dataset.
`dataset_computation`	A `tff.Computation` accepting a client ID, returning a dataset. Note: the `dataset_computation` property is intended as a TFF-specific performance optimization for distributed execution.
`element_type_structure`	The element type information of the client datasets. elements returned by datasets in this `ClientData` object.
`serializable_dataset_fn`	A callable accepting a client ID and returning a `tf.data.Dataset`. Note that this callable must be traceable by TF, as it will be used in the context of a `tf.function`.

Methods

`create_tf_dataset_for_client`

View source

create_tf_dataset_for_client(
    client_id
)

Creates a new tf.data.Dataset containing the client training examples.

This function will create a dataset for a given client, given that client_id is contained in the client_ids property of the ClientData. Unlike create_dataset, this method need not be serializable.

Args
`client_id`	The string client_id for the desired client.

Returns
A `tf.data.Dataset` object.

`create_tf_dataset_from_all_clients`

View source

create_tf_dataset_from_all_clients(
    seed: Optional[Union[int, Sequence[int]]] = None
) -> tf.data.Dataset

Creates a new tf.data.Dataset containing all client examples.

This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.

Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.

Args
`seed`	Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any nonnegative 32-bit integer, an array of such integers, or `None`.

Returns
A `tf.data.Dataset` object.

`datasets`

View source

datasets(
    limit_count: Optional[int] = None,
    seed: Optional[Union[int, Sequence[int]]] = None
) -> Iterable[tf.data.Dataset]

Yields the tf.data.Dataset for each client in random order.

This function is intended for use building a static array of client data to be provided to the top-level federated computation.

Args
`limit_count`	Optional, a maximum number of datasets to return.
`seed`	Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any nonnegative 32-bit integer, an array of such integers, or `None`.

`from_clients_and_tf_fn`

View source

@classmethod
from_clients_and_tf_fn(
    client_ids: Iterable[str],
    serializable_dataset_fn: Callable[[str], tf.data.Dataset]
) -> 'ClientData'

Constructs a ClientData based on the given function.

Args
`client_ids`	A non-empty list of strings to use as input to `create_dataset_fn`.
`serializable_dataset_fn`	A function that takes a client_id from the above list, and returns a `tf.data.Dataset`. This function must be serializable and usable within the context of a `tf.function` and `tff.Computation`.

Raises
`TypeError`	If `serializable_dataset_fn` is a `tff.Computation`.

Returns
A `ClientData` object.

`preprocess`

View source

preprocess(
    preprocess_fn: Callable[[tf.data.Dataset], tf.data.Dataset]
) -> 'ClientData'

Applies preprocess_fn to each client's data.

Args
`preprocess_fn`	A callable accepting a `tf.data.Dataset` and returning a preprocessed `tf.data.Dataset`. This function must be traceable by TF.

Returns
A `tff.simulation.datasets.ClientData`.

Raises
`IncompatiblePreprocessFnError`	If `preprocess_fn` is a `tff.Computation`.

`train_test_client_split`

View source

@classmethod
train_test_client_split(
    client_data: 'ClientData',
    num_test_clients: int,
    seed: Optional[Union[int, Sequence[int]]] = None
) -> tuple['ClientData', 'ClientData']

Returns a pair of (train, test) ClientData.

This method partitions the clients of client_data into two ClientData objects with disjoint sets of ClientData.client_ids. All clients in the test ClientData are guaranteed to have non-empty datasets, but the training ClientData may have clients with no data.

Args
`client_data`	The base `ClientData` to split.
`num_test_clients`	How many clients to hold out for testing. This can be at most len(client_data.client_ids) - 1, since we don't want to produce empty `ClientData`.
`seed`	Optional seed to fix shuffling of clients before splitting. The seed can be any nonnegative 32-bit integer, an array of such integers, or `None`.

Returns
A pair (train_client_data, test_client_data), where test_client_data has `num_test_clients` selected at random, subject to the constraint they each have at least 1 batch in their dataset.

Raises
`ValueError`	If `num_test_clients` cannot be satistifed by `client_data`, or too many clients have empty datasets.

tff.simulation.datasets.TestClientData

Args

Raises

Attributes

Methods

create_tf_dataset_for_client

create_tf_dataset_from_all_clients

datasets

from_clients_and_tf_fn

preprocess

train_test_client_split

`create_tf_dataset_for_client`

`create_tf_dataset_from_all_clients`

`datasets`

`from_clients_and_tf_fn`

`preprocess`

`train_test_client_split`