TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

wiki_dialog

Description:

WikiDialog is a large dataset of synthetically generated information-seeking conversations. Each conversation in the dataset contains two speakers grounded in a passage from English Wikipedia: one speaker’s utterances consist of exact sentences from the passage; the other speaker is generated by a large language model.

Config description: WikiDialog generated from the dialog inpainter finetuned on OR-QuAC and QReCC. OQ stands for OR-QuAC and QReCC.
Homepage: https://github.com/google-research/dialog-inpainting#wikidialog-oq
Source code: tfds.text.wiki_dialog.WikiDialog
Versions:
- 1.0.0 (default): Initial release.
Download size: 7.04 GiB
Dataset size: 36.58 GiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'train'`	11,264,129
`'validation'`	113,822

Feature structure:

FeaturesDict({
    'author_num': Sequence(int32),
    'passage': Text(shape=(), dtype=string),
    'pid': Text(shape=(), dtype=string),
    'sentences': Sequence(Text(shape=(), dtype=string)),
    'title': Text(shape=(), dtype=string),
    'utterances': Sequence(Text(shape=(), dtype=string)),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
author_num	Sequence(Tensor)	(None,)	int32
passage	Text		string
pid	Text		string
sentences	Sequence(Text)	(None,)	string
title	Text		string
utterances	Sequence(Text)	(None,)	string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@inproceedings{dai2022dialoginpainting,
  title={Dialog Inpainting: Turning Documents to Dialogs},
  author={Dai, Zhuyun and Chaganty, Arun Tejasvi and Zhao, Vincent and Amini, Aida and Green, Mike and Rashid, Qazi and Guu, Kelvin},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2022},
  organization={PMLR}
}

wiki_dialog

wiki_dialog/OQ (default config)