TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

natural_instructions

Description:

Description

A compilation of 1600+ tasks phrased as natural instructions. The original task collection can be found at: https://github.com/allenai/natural-instructions No preprocessing or changes were made to this original version.

Note that users of this task collection should consult the underlying licenses of the contained datasets, and cite them accordingly.

Homepage: https://github.com/allenai/natural-instructions
Source code: tfds.datasets.natural_instructions.Builder
Versions:
- 1.0.0: Initial release.
- 1.0.1 (default): Added task name field, and fixed ID used for shuffling to use stable IDs.
Download size: 3.08 GiB
Dataset size: 4.73 GiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'train'`	5,040,134

Feature structure:

FeaturesDict({
    'definition': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'input': Text(shape=(), dtype=string),
    'output': Text(shape=(), dtype=string),
    'source': Text(shape=(), dtype=string),
    'task_name': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
definition	Text	string
id	Text	string
input	Text	string
output	Text	string
source	Text	string
task_name	Text	string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@article{wang2022benchmarking,
  title={Benchmarking generalization via in-context instructions on 1,600+ language tasks},
  author={Wang, Yizhong and Mishra, Swaroop and Alipoormolabashi, Pegah and Kordi, Yeganeh and Mirzaei, Amirreza and Arunkumar, Anjana and Ashok, Arjun and Dhanasekaran, Arut Selvan and Naik, Atharva and Stap, David and others},
  journal={arXiv preprint arXiv:2204.07705},
  year={2022}
}