- Description:
The dataset first described in the "Stanford 3D Objects" section of the paper Disentangling by Subspace Diffusion. The data consists of 100,000 renderings each of the Bunny and Dragon objects from the Stanford 3D Scanning Repository. More objects may be added in the future, but only the Bunny and Dragon are used in the paper. Each object is rendered with a uniformly sampled illumination from a point on the 2-sphere, and a uniformly sampled 3D rotation. The true latent states are provided as NumPy arrays along with the images. The lighting is given as a 3-vector with unit norm, while the rotation is provided both as a quaternion and a 3x3 orthogonal matrix.
There are many similarities between S3O4D and existing ML benchmark datasets like NORB, 3D Chairs, 3D Shapes and many others, which also include renderings of a set of objects under different pose and illumination conditions. However, none of these existing datasets include the full manifold of rotations in 3D - most include only a subset of changes to elevation and azimuth. S3O4D images are sampled uniformly and independently from the full space of rotations and illuminations, meaning the dataset contains objects that are upside down and illuminated from behind or underneath. We believe that this makes S3O4D uniquely suited for research on generative models where the latent space has non-trivial topology, as well as for general manifold learning methods where the curvature of the manifold is important.
Additional Documentation: Explore on Papers With Code
Source code:
tfds.datasets.s3o4d.Builder
Versions:
1.0.0
(default): Initial release.
Download size:
911.68 MiB
Dataset size:
1.01 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'bunny_test' |
20,000 |
'bunny_train' |
80,000 |
'dragon_test' |
20,000 |
'dragon_train' |
80,000 |
- Feature structure:
FeaturesDict({
'illumination': Tensor(shape=(3,), dtype=float32),
'image': Image(shape=(256, 256, 3), dtype=uint8),
'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
'pose_mat': Tensor(shape=(3, 3), dtype=float32),
'pose_quat': Tensor(shape=(4,), dtype=float32),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
illumination | Tensor | (3,) | float32 | |
image | Image | (256, 256, 3) | uint8 | |
label | ClassLabel | int64 | ||
pose_mat | Tensor | (3, 3) | float32 | |
pose_quat | Tensor | (4,) | float32 |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples):
- Examples (tfds.as_dataframe):
- Citation:
@article{pfau2020disentangling,
title={Disentangling by Subspace Diffusion},
author={Pfau, David and Higgins, Irina and Botev, Aleksandar and Racani\`ere,
S{\'e}bastian},
journal={Advances in Neural Information Processing Systems (NeurIPS)},
year={2020}
}