7. Single-channel classification and clustering approaches

Speech Enhancement in Non-Stationary Noise

Original Minimum statistics spectral subtraction [1, 2] LSTM denoising [3] Sparse NMF [4] Exemplar-based sparse NMF [5]
Multi-condition example 1: TUM NAVIC corpus, English, City noise (bicycle) @ 5 dB(A)
Multi-condition example 2: TUM NAVIC corpus, English, Music noise @ 5 dB(A)
Application to real phone recording (close-talk microphone, Munich-Maxvorstadt city noise)

DNN/LSTM benchmark on CHiME-2 data [6]

Original DNN LSTM Noise-free speech
male speech + child noise @ 9 dB input SNR, si_dt_05
female speech + music noise @ 0 dB input SNR, si_dt_05
female speech + child noise @ -6 dB input SNR, si_dt_05
male speech + child noise @ 0 dB input SNR, si_et_05
male speech + child noise + female speech + telephone noise @ -6 dB input SNR, si_et_05

DNN-based speech enhancement on Aurora-4 data [7]

Original DNN Noise-free speech
test set: babble noise
test set: airport noise
test set: car noise
test set: street noise
test set: restaurant noise

[1] VOICEBOX

[2] Rainer Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech and Audio Processing, 9(5):504-512, 2001

[3] Felix Weninger, Florian Eyben, and Björn Schuller, Single-Channel Speech Separation With Memory-Enhanced Recurrent Neural Networks, Proceedings 39th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014, Florence, Italy, 2014 [pdf]

[4] P. D. O’Grady and B. A. Pearlmutter, Discovering convolutive speech phones using sparseness and non-negativity, Proceedings 7th International Conference on Independent Component Analysis and Signal Separation, ICA 2007, London, UK, pp. 520-527, 2007

[5] Jort F. Gemmeke and Tuomas Virtanen and Antti Hurmalainen, Exemplar-Based Speech Enhancement and its Application to Noise-Robust Automatic Speech Recognition, Proceedings of the CHiME Workshop, Florence, Italy, 2011 [pdf]

[6] Felix Weninger et al., Discriminatively trained recurrent neural networks for single-channel speech separation, Proceedings of the IEEE Global Signal Processing Conference (GlobalSIP), Atlanta, GA, 2014.

[7] Jun Du et al., Robust speech recognition with speech enhanced deep neural networks, Proc. INTERSPEECH, 2014, pp.616-620. [pdf]

[8] Tian Gao et al., A unified speaker-dependent speech separation and enhancement system based on deep neural networks, Proc. ChinaSIP, 2015, pp.687-691. [pdf]

[9] Tian Gao et al., SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement, Proc. INTERSPEECH, 2016, pp.3713-3717. [pdf][poster]

Comments are closed.