- Description:
WikiDialog is a large dataset of synthetically generated information-seeking conversations. Each conversation in the dataset contains two speakers grounded in a passage from English Wikipedia: one speaker’s utterances consist of exact sentences from the passage; the other speaker is generated by a large language model.
Config description: WikiDialog generated from the dialog inpainter finetuned on OR-QuAC and QReCC.
OQ
stands for OR-QuAC and QReCC.Homepage: https://github.com/google-research/dialog-inpainting#wikidialog-oq
Source code:
tfds.text.wiki_dialog.WikiDialog
Versions:
1.0.0
(default): Initial release.
Download size:
7.04 GiB
Dataset size:
36.58 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
11,264,129 |
'validation' |
113,822 |
- Feature structure:
FeaturesDict({
'author_num': Sequence(int32),
'passage': Text(shape=(), dtype=string),
'pid': Text(shape=(), dtype=string),
'sentences': Sequence(Text(shape=(), dtype=string)),
'title': Text(shape=(), dtype=string),
'utterances': Sequence(Text(shape=(), dtype=string)),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
author_num | Sequence(Tensor) | (None,) | int32 | |
passage | Text | string | ||
pid | Text | string | ||
sentences | Sequence(Text) | (None,) | string | |
title | Text | string | ||
utterances | Sequence(Text) | (None,) | string |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@inproceedings{dai2022dialoginpainting,
title={Dialog Inpainting: Turning Documents to Dialogs},
author={Dai, Zhuyun and Chaganty, Arun Tejasvi and Zhao, Vincent and Amini, Aida and Green, Mike and Rashid, Qazi and Guu, Kelvin},
booktitle={International Conference on Machine Learning (ICML)},
year={2022},
organization={PMLR}
}