TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

ted_hrlr_translate

Description:

Data sets derived from TED talk transcripts for comparing similar language pairs where one is high resource and the other is low resource.

Homepage: https://github.com/neulab/word-embeddings-for-nmt
Source code: tfds.datasets.ted_hrlr_translate.Builder
Versions:
- 1.0.0 (default): New split API (https://tensorflow.org/datasets/splits)
Download size: 124.94 MiB
Auto-cached (documentation): Yes
Figure (tfds.show_examples): Not supported.
Citation:

@inproceedings{Ye2018WordEmbeddings,
  author  = {Ye, Qi and Devendra, Sachan and Matthieu, Felix and Sarguna, Padmanabhan and Graham, Neubig},
  title   = {When and Why are pre-trained word embeddings useful for Neural Machine Translation},
  booktitle = {HLT-NAACL},
  year    = {2018},
  }

ted_hrlr_translate/az_to_en (default config)

Config description: Translation dataset from az to en in plain text.
Dataset size: 1.61 MiB
Splits:

Split	Examples
`'test'`	903
`'train'`	5,946
`'validation'`	671

Feature structure:

Translation({
    'az': Text(shape=(), dtype=string),
    'en': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
az	Text	string
en	Text	string

Supervised keys (See as_supervised doc): ('az', 'en')
Examples (tfds.as_dataframe):

ted_hrlr_translate/aztr_to_en

Config description: Translation dataset from az_tr to en in plain text.
Dataset size: 42.54 MiB
Splits:

Split	Examples
`'test'`	903
`'train'`	188,396
`'validation'`	671

Feature structure:

Translation({
    'az_tr': Text(shape=(), dtype=string),
    'en': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
az_tr	Text	string
en	Text	string

Supervised keys (See as_supervised doc): ('az_tr', 'en')
Examples (tfds.as_dataframe):

ted_hrlr_translate/be_to_en

Config description: Translation dataset from be to en in plain text.
Dataset size: 1.47 MiB
Splits:

Split	Examples
`'test'`	664
`'train'`	4,509
`'validation'`	248

Feature structure:

Translation({
    'be': Text(shape=(), dtype=string),
    'en': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
be	Text	string
en	Text	string

Supervised keys (See as_supervised doc): ('be', 'en')
Examples (tfds.as_dataframe):

ted_hrlr_translate/beru_to_en

Config description: Translation dataset from be_ru to en in plain text.
Dataset size: 62.45 MiB
Splits:

Split	Examples
`'test'`	664
`'train'`	212,614
`'validation'`	248

Feature structure:

Translation({
    'be_ru': Text(shape=(), dtype=string),
    'en': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
be_ru	Text	string
en	Text	string

Supervised keys (See as_supervised doc): ('be_ru', 'en')
Examples (tfds.as_dataframe):

ted_hrlr_translate/es_to_pt

Config description: Translation dataset from es to pt in plain text.
Dataset size: 9.62 MiB
Splits:

Split	Examples
`'test'`	1,763
`'train'`	44,938
`'validation'`	1,016

Feature structure:

Translation({
    'es': Text(shape=(), dtype=string),
    'pt': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
es	Text	string
pt	Text	string

Supervised keys (See as_supervised doc): ('es', 'pt')
Examples (tfds.as_dataframe):

ted_hrlr_translate/fr_to_pt

Config description: Translation dataset from fr to pt in plain text.
Dataset size: 9.74 MiB
Splits:

Split	Examples
`'test'`	1,494
`'train'`	43,873
`'validation'`	1,131

Feature structure:

Translation({
    'fr': Text(shape=(), dtype=string),
    'pt': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
fr	Text	string
pt	Text	string

Supervised keys (See as_supervised doc): ('fr', 'pt')
Examples (tfds.as_dataframe):

ted_hrlr_translate/gl_to_en

Config description: Translation dataset from gl to en in plain text.
Dataset size: 2.41 MiB
Splits:

Split	Examples
`'test'`	1,007
`'train'`	10,017
`'validation'`	682

Feature structure:

Translation({
    'en': Text(shape=(), dtype=string),
    'gl': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
en	Text	string
gl	Text	string

Supervised keys (See as_supervised doc): ('gl', 'en')
Examples (tfds.as_dataframe):

ted_hrlr_translate/glpt_to_en

Config description: Translation dataset from gl_pt to en in plain text.
Dataset size: 12.90 MiB
Splits:

Split	Examples
`'test'`	1,007
`'train'`	61,802
`'validation'`	682

Feature structure:

Translation({
    'en': Text(shape=(), dtype=string),
    'gl_pt': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
en	Text	string
gl_pt	Text	string

Supervised keys (See as_supervised doc): ('gl_pt', 'en')
Examples (tfds.as_dataframe):

ted_hrlr_translate/he_to_pt

Config description: Translation dataset from he to pt in plain text.
Dataset size: 11.71 MiB
Splits:

Split	Examples
`'test'`	1,623
`'train'`	48,511
`'validation'`	1,145

Feature structure:

Translation({
    'he': Text(shape=(), dtype=string),
    'pt': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
he	Text	string
pt	Text	string

Supervised keys (See as_supervised doc): ('he', 'pt')
Examples (tfds.as_dataframe):

ted_hrlr_translate/it_to_pt

Config description: Translation dataset from it to pt in plain text.
Dataset size: 9.94 MiB
Splits:

Split	Examples
`'test'`	1,669
`'train'`	46,259
`'validation'`	1,162

Feature structure:

Translation({
    'it': Text(shape=(), dtype=string),
    'pt': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
it	Text	string
pt	Text	string

Supervised keys (See as_supervised doc): ('it', 'pt')
Examples (tfds.as_dataframe):

ted_hrlr_translate/pt_to_en

Config description: Translation dataset from pt to en in plain text.
Dataset size: 10.89 MiB
Splits:

Split	Examples
`'test'`	1,803
`'train'`	51,785
`'validation'`	1,193

Feature structure:

Translation({
    'en': Text(shape=(), dtype=string),
    'pt': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
en	Text	string
pt	Text	string

Supervised keys (See as_supervised doc): ('pt', 'en')
Examples (tfds.as_dataframe):

ted_hrlr_translate/ru_to_en

Config description: Translation dataset from ru to en in plain text.
Dataset size: 63.22 MiB
Splits:

Split	Examples
`'test'`	5,476
`'train'`	208,106
`'validation'`	4,805

Feature structure:

Translation({
    'en': Text(shape=(), dtype=string),
    'ru': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
en	Text	string
ru	Text	string

Supervised keys (See as_supervised doc): ('ru', 'en')
Examples (tfds.as_dataframe):

ted_hrlr_translate/ru_to_pt

Config description: Translation dataset from ru to pt in plain text.
Dataset size: 13.00 MiB
Splits:

Split	Examples
`'test'`	1,588
`'train'`	47,278
`'validation'`	1,184

Feature structure:

Translation({
    'pt': Text(shape=(), dtype=string),
    'ru': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
pt	Text	string
ru	Text	string

Supervised keys (See as_supervised doc): ('ru', 'pt')
Examples (tfds.as_dataframe):

ted_hrlr_translate/tr_to_en

Config description: Translation dataset from tr to en in plain text.
Dataset size: 42.33 MiB
Splits:

Split	Examples
`'test'`	5,029
`'train'`	182,450
`'validation'`	4,045

Feature structure:

Translation({
    'en': Text(shape=(), dtype=string),
    'tr': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	Translation
en	Text	string
tr	Text	string

Supervised keys (See as_supervised doc): ('tr', 'en')
Examples (tfds.as_dataframe):