- Description:
VoxForge is a language classification dataset. It consists of user submitted audio clips submitted to the website. In this release, data from 6 languages is collected - English, Spanish, French, German, Russian, and Italian. Since the website is constantly updated, and for the sake of reproducibility, this release contains only recordings submitted prior to 2020-01-01. The samples are splitted between train, validation and testing so that samples from each speaker belongs to exactly one split.
Additional Documentation: Explore on Papers With Code
Homepage: http://www.voxforge.org/
Source code:
tfds.audio.Voxforge
Versions:
1.0.0
(default): No release notes.
Download size:
Unknown size
Dataset size:
Unknown size
Manual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir
(defaults to~/tensorflow_datasets/downloads/manual/
):
VoxForge requires manual download of the audio archives. The complete list of archives can be found in https://storage.googleapis.com/tfds-data/downloads/voxforge/voxforge_urls.txt It can be downloaded using the following command: wget -i voxforge_urls.txt -x Note that downloading and building the dataset locally requires ~100GB disk space (but only ~60GB will be used permanently).Auto-cached (documentation): Unknown
Splits:
Split | Examples |
---|
- Feature structure:
FeaturesDict({
'audio': Audio(shape=(None,), dtype=int64),
'label': ClassLabel(shape=(), dtype=int64, num_classes=6),
'speaker_id': string,
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
audio | Audio | (None,) | int64 | |
label | ClassLabel | int64 | ||
speaker_id | Tensor | string |
Supervised keys (See
as_supervised
doc):('audio', 'label')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe): Missing.
Citation:
@article{maclean2018voxforge,
title={Voxforge},
author={MacLean, Ken},
journal={Ken MacLean.[Online]. Available: http://www.voxforge.org/home.[Acedido em 2012]},
year={2018}
}