- Description:
The LAION-400M dataset is completely openly, freely accessible.
Check https://laion.ai/laion-400-open-dataset/ for the full description of this dataset.
All images and texts in the LAION-400M dataset have been filtered with OpenAI‘s CLIP by calculating the cosine similarity between the text and image embeddings and dropping those with a similarity below 0.3. The threshold of 0.3 had been determined through human evaluations and seemed to be a good heuristic for estimating semantic image-text-content matching.
The image-text-pairs have been extracted from the Common Crawl web data dump and are from random web pages crawled between 2014 and 2021.
Additional Documentation: Explore on Papers With Code
Source code:
tfds.vision_language.laion400m.Laion400m
Versions:
1.0.0
(default): Initial release.
Download size:
Unknown size
Dataset size:
Unknown size
Manual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir
(defaults to~/tensorflow_datasets/downloads/manual/
):
Refer to "Download Information" section on https://laion.ai/blog/laion-400-open-dataset/Auto-cached (documentation): Unknown
Splits:
Split | Examples |
---|
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe): Missing.
Citation:
@article{DBLP:journals/corr/abs-2111-02114,
author = {Christoph Schuhmann and
Richard Vencu and
Romain Beaumont and
Robert Kaczmarczyk and
Clayton Mullis and
Aarush Katta and
Theo Coombes and
Jenia Jitsev and
Aran Komatsuzaki},
title = { {LAION-400M:} Open Dataset of CLIP-Filtered 400 Million Image-Text
Pairs},
journal = {CoRR},
volume = {abs/2111.02114},
year = {2021},
url = {https://arxiv.org/abs/2111.02114},
eprinttype = {arXiv},
eprint = {2111.02114},
timestamp = {Fri, 05 Nov 2021 15:25:54 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2111-02114.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
laion400m/images (default config)
- Feature structure:
FeaturesDict({
'caption': Text(shape=(), dtype=string),
'image': Image(shape=(None, None, 3), dtype=uint8, description=image),
'license': Text(shape=(), dtype=string),
'nsfw': ClassLabel(shape=(), dtype=int64, num_classes=4),
'original_height': Scalar(shape=(), dtype=int32, description=original height of the image),
'original_width': Scalar(shape=(), dtype=int32, description=original width of the image),
'similarity': Scalar(shape=(), dtype=float64, description=cosine similarity score between the text and image embedding. Missing values default to -1.0),
'url': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description | Value range |
---|---|---|---|---|---|
FeaturesDict | |||||
caption | Text | string | HTML alt-text attribute | ||
image | Image | (None, None, 3) | uint8 | image | |
license | Text | string | type of Creative Commons license (if applicable) | ||
nsfw | ClassLabel | int64 | NSFW tag (detected with CLIP). Incohesive and missing tags are replaced with UNTAGGED | ||
original_height | Scalar | int32 | original height of the image | ||
original_width | Scalar | int32 | original width of the image | ||
similarity | Scalar | float64 | cosine similarity score between the text and image embedding. Missing values default to -1.0 | [0.0, 1.0] | |
url | Text | string | image URL |
laion400m/embeddings
- Feature structure:
FeaturesDict({
'caption': Text(shape=(), dtype=string),
'image_embedding': Tensor(shape=(512,), dtype=float16, description=CLIP image embedding),
'license': Text(shape=(), dtype=string),
'nsfw': ClassLabel(shape=(), dtype=int64, num_classes=4),
'original_height': Scalar(shape=(), dtype=int32, description=original height of the image),
'original_width': Scalar(shape=(), dtype=int32, description=original width of the image),
'similarity': Scalar(shape=(), dtype=float64, description=cosine similarity score between the text and image embedding. Missing values default to -1.0),
'text_embedding': Tensor(shape=(512,), dtype=float16, description=CLIP text embedding),
'url': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description | Value range |
---|---|---|---|---|---|
FeaturesDict | |||||
caption | Text | string | HTML alt-text attribute | ||
image_embedding | Tensor | (512,) | float16 | CLIP image embedding | |
license | Text | string | type of Creative Commons license (if applicable) | ||
nsfw | ClassLabel | int64 | NSFW tag (detected with CLIP). Incohesive and missing tags are replaced with UNTAGGED | ||
original_height | Scalar | int32 | original height of the image | ||
original_width | Scalar | int32 | original width of the image | ||
similarity | Scalar | float64 | cosine similarity score between the text and image embedding. Missing values default to -1.0 | [0.0, 1.0] | |
text_embedding | Tensor | (512,) | float16 | CLIP text embedding | |
url | Text | string | image URL |