DESED dataset contains:
- Recorded soundscapes.
- Synthetic soundbank (+ code to create new soundscapes using Scaper) and dcase 2019 soundscapes.
- Public evaluation (recorded soundscapes) used in dcase 2019 (a.k.a. Youtube eval set in dcase, Vimeo is not available.).
The dataset is split into two subsets as described below.
- Verified and unverfied subset of Audioset.
- Unlabel_in_domain data: Unverified data have their label discarded: 14412 files.
- Weakly labeled data: training data have their labels verified at the clip level: 1578 files.
- Validation data have their labels with time boundaries (strong labels): 1168 files.
- Evaluation public files: 692 Youtube files
- Background files are extracted from SINS , MUSAN  or Youtube and have been selected because they contain a very low amount of our sound event classes.
- Foreground files are extracted from Freesound  and manually verified to check the quality and segmented to remove silences.
- Mixtures are described in Generating new synthetic data.
- Sound bank:
- Training: 2060 background files (SINS) and 1009 foreground files (Freesound).
- Eval: 12 (Freesound) + 5 (Youtube) background files and 314 foreground files (Freesound).
You can find information about this dataset in these papers:
- Turpault et al. Description of DESED dataset + official results of DCASE 2019 task 4.
- Serizel et al. Robustness of DCASE 2019 systems on synthetic evaluation set.