- Description:
TriviaqQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaqQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.
Additional Documentation: Explore on Papers With Code
Homepage: http://nlp.cs.washington.edu/triviaqa/
Source code:
tfds.datasets.trivia_qa.Builder
Versions:
1.1.0
(default): No release notes.
Feature structure:
FeaturesDict({
'answer': FeaturesDict({
'aliases': Sequence(Text(shape=(), dtype=string)),
'matched_wiki_entity_name': Text(shape=(), dtype=string),
'normalized_aliases': Sequence(Text(shape=(), dtype=string)),
'normalized_matched_wiki_entity_name': Text(shape=(), dtype=string),
'normalized_value': Text(shape=(), dtype=string),
'type': Text(shape=(), dtype=string),
'value': Text(shape=(), dtype=string),
}),
'entity_pages': Sequence({
'doc_source': Text(shape=(), dtype=string),
'filename': Text(shape=(), dtype=string),
'title': Text(shape=(), dtype=string),
'wiki_context': Text(shape=(), dtype=string),
}),
'question': Text(shape=(), dtype=string),
'question_id': Text(shape=(), dtype=string),
'question_source': Text(shape=(), dtype=string),
'search_results': Sequence({
'description': Text(shape=(), dtype=string),
'filename': Text(shape=(), dtype=string),
'rank': int32,
'search_context': Text(shape=(), dtype=string),
'title': Text(shape=(), dtype=string),
'url': Text(shape=(), dtype=string),
}),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
answer | FeaturesDict | |||
answer/aliases | Sequence(Text) | (None,) | string | |
answer/matched_wiki_entity_name | Text | string | ||
answer/normalized_aliases | Sequence(Text) | (None,) | string | |
answer/normalized_matched_wiki_entity_name | Text | string | ||
answer/normalized_value | Text | string | ||
answer/type | Text | string | ||
answer/value | Text | string | ||
entity_pages | Sequence | |||
entity_pages/doc_source | Text | string | ||
entity_pages/filename | Text | string | ||
entity_pages/title | Text | string | ||
entity_pages/wiki_context | Text | string | ||
question | Text | string | ||
question_id | Text | string | ||
question_source | Text | string | ||
search_results | Sequence | |||
search_results/description | Text | string | ||
search_results/filename | Text | string | ||
search_results/rank | Tensor | int32 | ||
search_results/search_context | Text | string | ||
search_results/title | Text | string | ||
search_results/url | Text | string |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Citation:
@article{2017arXivtriviaqa,
author = { {Joshi}, Mandar and {Choi}, Eunsol and {Weld},
Daniel and {Zettlemoyer}, Luke},
title = "{triviaqa: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension}",
journal = {arXiv e-prints},
year = 2017,
eid = {arXiv:1705.03551},
pages = {arXiv:1705.03551},
archivePrefix = {arXiv},
eprint = {1705.03551},
}
trivia_qa/rc (default config)
Config description: Question-answer pairs where all documents for a given question contain the answer string(s). Includes context from Wikipedia and search results.
Download size:
2.48 GiB
Dataset size:
14.99 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'test' |
17,210 |
'train' |
138,384 |
'validation' |
18,669 |
- Examples (tfds.as_dataframe):
trivia_qa/rc.nocontext
Config description: Question-answer pairs where all documents for a given question contain the answer string(s).
Download size:
2.48 GiB
Dataset size:
196.84 MiB
Auto-cached (documentation): Yes (test, validation), Only when
shuffle_files=False
(train)Splits:
Split | Examples |
---|---|
'test' |
17,210 |
'train' |
138,384 |
'validation' |
18,669 |
- Examples (tfds.as_dataframe):
trivia_qa/unfiltered
Config description: 110k question-answer pairs for open domain QA where not all documents for a given question contain the answer string(s). This makes the unfiltered dataset more appropriate for IR-style QA. Includes context from Wikipedia and search results.
Download size:
3.07 GiB
Dataset size:
27.27 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'test' |
10,832 |
'train' |
87,622 |
'validation' |
11,313 |
- Examples (tfds.as_dataframe):
trivia_qa/unfiltered.nocontext
Config description: 110k question-answer pairs for open domain QA where not all documents for a given question contain the answer string(s). This makes the unfiltered dataset more appropriate for IR-style QA.
Download size:
603.25 MiB
Dataset size:
119.78 MiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'test' |
10,832 |
'train' |
87,622 |
'validation' |
11,313 |
- Examples (tfds.as_dataframe):