TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

trivia_qa

Description:

TriviaqQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaqQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.

Additional Documentation: Explore on Papers With Code
Homepage: http://nlp.cs.washington.edu/triviaqa/
Source code: tfds.datasets.trivia_qa.Builder
Versions:
- 1.1.0 (default): No release notes.
Feature structure:

FeaturesDict({
    'answer': FeaturesDict({
        'aliases': Sequence(Text(shape=(), dtype=string)),
        'matched_wiki_entity_name': Text(shape=(), dtype=string),
        'normalized_aliases': Sequence(Text(shape=(), dtype=string)),
        'normalized_matched_wiki_entity_name': Text(shape=(), dtype=string),
        'normalized_value': Text(shape=(), dtype=string),
        'type': Text(shape=(), dtype=string),
        'value': Text(shape=(), dtype=string),
    }),
    'entity_pages': Sequence({
        'doc_source': Text(shape=(), dtype=string),
        'filename': Text(shape=(), dtype=string),
        'title': Text(shape=(), dtype=string),
        'wiki_context': Text(shape=(), dtype=string),
    }),
    'question': Text(shape=(), dtype=string),
    'question_id': Text(shape=(), dtype=string),
    'question_source': Text(shape=(), dtype=string),
    'search_results': Sequence({
        'description': Text(shape=(), dtype=string),
        'filename': Text(shape=(), dtype=string),
        'rank': int32,
        'search_context': Text(shape=(), dtype=string),
        'title': Text(shape=(), dtype=string),
        'url': Text(shape=(), dtype=string),
    }),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
answer	FeaturesDict
answer/aliases	Sequence(Text)	(None,)	string
answer/matched_wiki_entity_name	Text		string
answer/normalized_aliases	Sequence(Text)	(None,)	string
answer/normalized_matched_wiki_entity_name	Text		string
answer/normalized_value	Text		string
answer/type	Text		string
answer/value	Text		string
entity_pages	Sequence
entity_pages/doc_source	Text		string
entity_pages/filename	Text		string
entity_pages/title	Text		string
entity_pages/wiki_context	Text		string
question	Text		string
question_id	Text		string
question_source	Text		string
search_results	Sequence
search_results/description	Text		string
search_results/filename	Text		string
search_results/rank	Tensor		int32
search_results/search_context	Text		string
search_results/title	Text		string
search_results/url	Text		string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Citation:

@article{2017arXivtriviaqa,
       author = { {Joshi}, Mandar and {Choi}, Eunsol and {Weld},
                 Daniel and {Zettlemoyer}, Luke},
        title = "{triviaqa: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension}",
      journal = {arXiv e-prints},
         year = 2017,
          eid = {arXiv:1705.03551},
        pages = {arXiv:1705.03551},
archivePrefix = {arXiv},
       eprint = {1705.03551},
}

trivia_qa/rc (default config)

Config description: Question-answer pairs where all documents for a given question contain the answer string(s). Includes context from Wikipedia and search results.
Download size: 2.48 GiB
Dataset size: 14.99 GiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'test'`	17,210
`'train'`	138,384
`'validation'`	18,669

Examples (tfds.as_dataframe):

trivia_qa/rc.nocontext

Config description: Question-answer pairs where all documents for a given question contain the answer string(s).
Download size: 2.48 GiB
Dataset size: 196.84 MiB
Auto-cached (documentation): Yes (test, validation), Only when shuffle_files=False (train)
Splits:

Split	Examples
`'test'`	17,210
`'train'`	138,384
`'validation'`	18,669

Examples (tfds.as_dataframe):

trivia_qa/unfiltered

Config description: 110k question-answer pairs for open domain QA where not all documents for a given question contain the answer string(s). This makes the unfiltered dataset more appropriate for IR-style QA. Includes context from Wikipedia and search results.
Download size: 3.07 GiB
Dataset size: 27.27 GiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'test'`	10,832
`'train'`	87,622
`'validation'`	11,313

Examples (tfds.as_dataframe):

trivia_qa/unfiltered.nocontext

Config description: 110k question-answer pairs for open domain QA where not all documents for a given question contain the answer string(s). This makes the unfiltered dataset more appropriate for IR-style QA.
Download size: 603.25 MiB
Dataset size: 119.78 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'test'`	10,832
`'train'`	87,622
`'validation'`	11,313

Examples (tfds.as_dataframe):