TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

deep1b

Description:

Pre-trained embeddings for approximate nearest neighbor search using the cosine distance. This dataset consists of two splits:

'database': consists of 9,990,000 data points, each has features: 'embedding' (96 floats), 'index' (int64), 'neighbors' (empty list).
'test': consists of 10,000 data points, each has features: 'embedding' (96 floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the nearest neighbors in the database.)

Homepage: http://sites.skoltech.ru/compvision/noimi/
Source code: tfds.nearest_neighbors.deep1b.Deep1b
Versions:
- 1.0.0 (default): Initial release.
Download size: 3.58 GiB
Dataset size: 4.46 GiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'database'`	9,990,000
`'test'`	10,000

Feature structure:

FeaturesDict({
    'embedding': Tensor(shape=(96,), dtype=float32),
    'index': Scalar(shape=(), dtype=int64, description=Index within the split.),
    'neighbors': Sequence({
        'distance': Scalar(shape=(), dtype=float32, description=Neighbor distance.),
        'index': Scalar(shape=(), dtype=int64, description=Neighbor index.),
    }),
})

Feature documentation:

Feature	Class	Shape	Dtype	Description
	FeaturesDict
embedding	Tensor	(96,)	float32
index	Scalar		int64	Index within the split.
neighbors	Sequence			The computed neighbors, which is only available for the test split.
neighbors/distance	Scalar		float32	Neighbor distance.
neighbors/index	Scalar		int64	Neighbor index.

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@inproceedings{babenko2016efficient,
  title={Efficient indexing of billion-scale datasets of deep descriptors},
  author={Babenko, Artem and Lempitsky, Victor},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={2055--2063},
  year={2016}
}