- Description:
The Free Universal Sound Separation (FUSS) Dataset is a database of arbitrary sound mixtures and source-level references, for use in experiments on arbitrary sound separation.
This is the official sound separation data for the DCASE2020 Challenge Task 4: Sound Event Detection and Separation in Domestic Environments.
Overview: FUSS audio data is sourced from a pre-release of Freesound dataset known as (FSD50k), a sound event dataset composed of Freesound content annotated with labels from the AudioSet Ontology. Using the FSD50K labels, these source files have been screened such that they likely only contain a single type of sound. Labels are not provided for these source files, and are not considered part of the challenge. For the purpose of the DCASE Task4 Sound Separation and Event Detection challenge, systems should not use FSD50K labels, even though they may become available upon FSD50K release.
To create mixtures, 10 second clips of sources are convolved with simulated room impulse responses and added together. Each 10 second mixture contains between 1 and 4 sources. Source files longer than 10 seconds are considered "background" sources. Every mixture contains one background source, which is active for the entire duration. We provide: a software recipe to create the dataset, the room impulse responses, and the original source audio.
Additional Documentation: Explore on Papers With Code
Source code:
tfds.audio.Fuss
Versions:
1.2.0
(default): No release notes.
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'test' |
1,000 |
'train' |
20,000 |
'validation' |
1,000 |
- Feature structure:
FeaturesDict({
'id': string,
'jams': string,
'mixture_audio': Audio(shape=(160000,), dtype=int16),
'segments': Sequence({
'end_time_seconds': float32,
'label': string,
'start_time_seconds': float32,
}),
'sources': Sequence({
'audio': Audio(shape=(160000,), dtype=int16),
'label': ClassLabel(shape=(), dtype=int64, num_classes=4),
}),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
id | Tensor | string | ||
jams | Tensor | string | ||
mixture_audio | Audio | (160000,) | int16 | |
segments | Sequence | |||
segments/end_time_seconds | Tensor | float32 | ||
segments/label | Tensor | string | ||
segments/start_time_seconds | Tensor | float32 | ||
sources | Sequence | |||
sources/audio | Audio | (160000,) | int16 | |
sources/label | ClassLabel | int64 |
Supervised keys (See
as_supervised
doc):('mixture_audio', 'sources')
Figure (tfds.show_examples): Not supported.
Citation:
\
@inproceedings{wisdom2020fuss,
title = {What's All the {FUSS} About Free Universal Sound Separation Data?},
author = {Scott Wisdom and Hakan Erdogan and Daniel P. W. Ellis and Romain Serizel and Nicolas Turpault and Eduardo Fonseca and Justin Salamon and Prem Seetharaman and John R. Hershey},
year = {2020},
url = {https://arxiv.org/abs/2011.00803},
}
@inproceedings{fonseca2020fsd50k,
author = {Eduardo Fonseca and Xavier Favory and Jordi Pons and Frederic Font Corbera and Xavier Serra},
title = { {FSD}50k: an open dataset of human-labeled sound events},
year = {2020},
url = {https://arxiv.org/abs/2010.00475},
}
fuss/reverberant (default config)
Config description: Default reverberated audio.
Download size:
7.35 GiB
Dataset size:
43.20 GiB
Examples (tfds.as_dataframe):
fuss/unprocessed
Config description: Unprocessed audio without additional reverberation.
Download size:
8.28 GiB
Dataset size:
45.58 GiB
Examples (tfds.as_dataframe):