- Description:
PASS is a large-scale image dataset that does not include any humans, human parts, or other personally identifiable information. It can be used for high-quality self-supervised pretraining while significantly reducing privacy concerns.
PASS contains 1,439,589 images without any labels sourced from YFCC-100M.
All images in this dataset are licenced under the CC-BY licence, as is the dataset itself. For YFCC-100M see http://www.multimediacommons.org/
Additional Documentation: Explore on Papers With Code
Source code:
tfds.datasets.pass.Builder
Versions:
1.0.0
: Initial release.2.0.0
: v2: Removed 472 images from v1 as they contained humans. Also added metadata: datetaken and GPS.3.0.0
(default): v3: Removed 131 images from v2 as they contained humans/tattos.
Download size:
167.30 GiB
Dataset size:
166.43 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
1,439,588 |
- Feature structure:
FeaturesDict({
'image': Image(shape=(None, None, 3), dtype=uint8),
'image/creator_uname': Text(shape=(), dtype=string),
'image/date_taken': Text(shape=(), dtype=string),
'image/gps_lat': float32,
'image/gps_lon': float32,
'image/hash': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
image | Image | (None, None, 3) | uint8 | |
image/creator_uname | Text | string | ||
image/date_taken | Text | string | ||
image/gps_lat | Tensor | float32 | ||
image/gps_lon | Tensor | float32 | ||
image/hash | Text | string |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples):
- Examples (tfds.as_dataframe):
- Citation:
@Article{asano21pass,
author = "Yuki M. Asano and Christian Rupprecht and Andrew Zisserman and Andrea Vedaldi",
title = "PASS: An ImageNet replacement for self-supervised pretraining without humans",
journal = "NeurIPS Track on Datasets and Benchmarks",
year = "2021"
}