TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

laion400m

Description:

The LAION-400M dataset is completely openly, freely accessible.

Check https://laion.ai/laion-400-open-dataset/ for the full description of this dataset.

All images and texts in the LAION-400M dataset have been filtered with OpenAI‘s CLIP by calculating the cosine similarity between the text and image embeddings and dropping those with a similarity below 0.3. The threshold of 0.3 had been determined through human evaluations and seemed to be a good heuristic for estimating semantic image-text-content matching.

The image-text-pairs have been extracted from the Common Crawl web data dump and are from random web pages crawled between 2014 and 2021.

Additional Documentation: Explore on Papers With Code
Homepage: https://laion.ai/blog/laion-400-open-dataset/
Source code: tfds.vision_language.laion400m.Laion400m
Versions:
- 1.0.0 (default): Initial release.
Download size: Unknown size
Dataset size: Unknown size
Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
Refer to "Download Information" section on https://laion.ai/blog/laion-400-open-dataset/
Auto-cached (documentation): Unknown
Splits:

Split	Examples

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe): Missing.
Citation:

@article{DBLP:journals/corr/abs-2111-02114,
  author    = {Christoph Schuhmann and
               Richard Vencu and
               Romain Beaumont and
               Robert Kaczmarczyk and
               Clayton Mullis and
               Aarush Katta and
               Theo Coombes and
               Jenia Jitsev and
               Aran Komatsuzaki},
  title     = { {LAION-400M:} Open Dataset of CLIP-Filtered 400 Million Image-Text
               Pairs},
  journal   = {CoRR},
  volume    = {abs/2111.02114},
  year      = {2021},
  url       = {https://arxiv.org/abs/2111.02114},
  eprinttype = {arXiv},
  eprint    = {2111.02114},
  timestamp = {Fri, 05 Nov 2021 15:25:54 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2111-02114.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

laion400m/images (default config)

Feature structure:

FeaturesDict({
    'caption': Text(shape=(), dtype=string),
    'image': Image(shape=(None, None, 3), dtype=uint8, description=image),
    'license': Text(shape=(), dtype=string),
    'nsfw': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'original_height': Scalar(shape=(), dtype=int32, description=original height of the image),
    'original_width': Scalar(shape=(), dtype=int32, description=original width of the image),
    'similarity': Scalar(shape=(), dtype=float64, description=cosine similarity score between the text and image embedding. Missing values default to -1.0),
    'url': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype	Description	Value range
	FeaturesDict
caption	Text		string	HTML alt-text attribute
image	Image	(None, None, 3)	uint8	image
license	Text		string	type of Creative Commons license (if applicable)
nsfw	ClassLabel		int64	NSFW tag (detected with CLIP). Incohesive and missing tags are replaced with UNTAGGED
original_height	Scalar		int32	original height of the image
original_width	Scalar		int32	original width of the image
similarity	Scalar		float64	cosine similarity score between the text and image embedding. Missing values default to -1.0	[0.0, 1.0]
url	Text		string	image URL

laion400m/embeddings

Feature structure:

FeaturesDict({
    'caption': Text(shape=(), dtype=string),
    'image_embedding': Tensor(shape=(512,), dtype=float16, description=CLIP image embedding),
    'license': Text(shape=(), dtype=string),
    'nsfw': ClassLabel(shape=(), dtype=int64, num_classes=4),
    'original_height': Scalar(shape=(), dtype=int32, description=original height of the image),
    'original_width': Scalar(shape=(), dtype=int32, description=original width of the image),
    'similarity': Scalar(shape=(), dtype=float64, description=cosine similarity score between the text and image embedding. Missing values default to -1.0),
    'text_embedding': Tensor(shape=(512,), dtype=float16, description=CLIP text embedding),
    'url': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype	Description	Value range
	FeaturesDict
caption	Text		string	HTML alt-text attribute
image_embedding	Tensor	(512,)	float16	CLIP image embedding
license	Text		string	type of Creative Commons license (if applicable)
nsfw	ClassLabel		int64	NSFW tag (detected with CLIP). Incohesive and missing tags are replaced with UNTAGGED
original_height	Scalar		int32	original height of the image
original_width	Scalar		int32	original width of the image
similarity	Scalar		float64	cosine similarity score between the text and image embedding. Missing values default to -1.0	[0.0, 1.0]
text_embedding	Tensor	(512,)	float16	CLIP text embedding
url	Text		string	image URL