- Description:
An audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Its primary goal is to provide a way to build and test small models that detect when a single word is spoken, from a set of ten target words, with as few false positives as possible from background noise or unrelated speech. Note that in the train and validation set, the label "unknown" is much more prevalent than the labels of the target words or background noise. One difference from the release version is the handling of silent segments. While in the test set the silence segments are regular 1 second files, in the training they are provided as long segments under "background_noise" folder. Here we split these background noise into 1 second clips, and also keep one of the files for the validation set.
Additional Documentation: Explore on Papers With Code
Homepage: https://arxiv.org/abs/1804.03209
Source code:
tfds.datasets.speech_commands.Builder
Versions:
0.0.3
(default): Fix audio data type with dtype=tf.int16.
Download size:
2.37 GiB
Dataset size:
8.17 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'test' |
4,890 |
'train' |
85,511 |
'validation' |
10,102 |
- Feature structure:
FeaturesDict({
'audio': Audio(shape=(None,), dtype=int16),
'label': ClassLabel(shape=(), dtype=int64, num_classes=12),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
audio | Audio | (None,) | int16 | |
label | ClassLabel | int64 |
Supervised keys (See
as_supervised
doc):('audio', 'label')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@article{speechcommandsv2,
author = { {Warden}, P.},
title = "{Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition}",
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1804.03209},
primaryClass = "cs.CL",
keywords = {Computer Science - Computation and Language, Computer Science - Human-Computer Interaction},
year = 2018,
month = apr,
url = {https://arxiv.org/abs/1804.03209},
}