View source on GitHub |
Mock tfds to generate random data.
@contextlib.contextmanager
tfds.testing.mock_data( num_examples: int = 1, num_sub_examples: int = 1, max_value: Optional[int] = None, *, policy: MockPolicy =
tfds.testing.MockPolicy.AUTO
, as_dataset_fn: Optional[Callable[..., tf.data.Dataset]] = None, data_dir: Optional[str] = None, mock_array_record_data_source: Optional[PickableDataSourceMock] = None ) -> Iterator[None]
Usage
- Usage (automated):
with tfds.testing.mock_data(num_examples=5):
ds = tfds.load('some_dataset', split='train')
for ex in ds: # ds will yield randomly generated examples.
ex
All calls to tfds.load
/tfds.data_source
within the context manager then
return deterministic mocked data.
- Usage (manual):
For more control over the generated examples, you can
manually overwrite the DatasetBuilder._as_dataset
method:
def as_dataset(self, *args, **kwargs):
return tf.data.Dataset.from_generator(
lambda: ({
'image': np.ones(shape=(28, 28, 1), dtype=np.uint8),
'label': i % 10,
} for i in range(num_examples)),
output_types=self.info.features.dtype,
output_shapes=self.info.features.shape,
)
with mock_data(as_dataset_fn=as_dataset):
ds = tfds.load('some_dataset', split='train')
for ex in ds: # ds will yield the fake data example of 'as_dataset'.
ex
Policy
For improved results, you can copy the true metadata files
(dataset_info.json
, label.txt
, vocabulary files) in
data_dir/dataset_name/version
. This will allow the mocked dataset to use
the true metadata computed during generation (split names,...).
If metadata files are not found, then info from the original class will be used, but the features computed during generation won't be available (e.g. unknown split names, so any splits are accepted).
Miscellaneous
- The examples are deterministically generated. Train and test split will yield the same examples.
- The actual examples will be randomly generated using
builder.info.features.get_tensor_info()
. - Download and prepare step will always be a no-op.
- Warning:
info.split['train'].num_examples
won't matchlen(list(ds_train))
Some of those points could be improved. If you have suggestions, issues with this functions, please open a new issue on our Github.
Args | |
---|---|
num_examples
|
Number of fake example to generate. |
num_sub_examples
|
Number of examples to generate in nested Dataset features. |
max_value
|
The maximum value present in generated tensors; if max_value is None or it is set to 0, then random numbers are generated from the range from 0 to 255. |
policy
|
Strategy to use to generate the fake examples. See
tfds.testing.MockPolicy .
|
as_dataset_fn
|
If provided, will replace the default random example
generator. This function mock the FileAdapterBuilder._as_dataset
|
data_dir
|
Folder containing the metadata file (searched in
data_dir/dataset_name/version ). Overwrite data_dir kwargs from
tfds.load . Used in MockPolicy.USE_FILES mode.
|
mock_array_record_data_source
|
Overwrite a mock for the underlying ArrayRecord data source if it is used. |
Yields | |
---|---|
None |