- Description:
The data contains sets of 1 to 7 triples of the form subject-predicate-object extracted from (DBpedia)[https://wiki.dbpedia.org/] and natural language text that's a verbalisation of these triples. The test data spans 15 different domains where only 10 appear in the training data. The dataset follows a standarized table format.
Additional Documentation: Explore on Papers With Code
Source code:
tfds.structured.web_nlg.WebNlg
Versions:
0.1.0
(default): No release notes.
Download size:
19.76 MiB
Dataset size:
13.78 MiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'test_all' |
4,928 |
'test_unseen' |
2,433 |
'train' |
18,102 |
'validation' |
2,268 |
- Feature structure:
FeaturesDict({
'input_text': FeaturesDict({
'context': string,
'table': Sequence({
'column_header': string,
'content': string,
'row_number': int16,
}),
}),
'target_text': string,
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
input_text | FeaturesDict | |||
input_text/context | Tensor | string | ||
input_text/table | Sequence | |||
input_text/table/column_header | Tensor | string | ||
input_text/table/content | Tensor | string | ||
input_text/table/row_number | Tensor | int16 | ||
target_text | Tensor | string |
Supervised keys (See
as_supervised
doc):('input_text', 'target_text')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@inproceedings{gardent2017creating,
title = ""Creating Training Corpora for {NLG} Micro-Planners"",
author = ""Gardent, Claire and
Shimorina, Anastasia and
Narayan, Shashi and
Perez-Beltrachini, Laura"",
booktitle = ""Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)"",
month = jul,
year = ""2017"",
address = ""Vancouver, Canada"",
publisher = ""Association for Computational Linguistics"",
doi = ""10.18653/v1/P17-1017"",
pages = ""179--188"",
url = ""https://www.aclweb.org/anthology/P17-1017.pdf""
}