NMF-based Improvement of DNN and LSTM Pre-Training for Speech Enhancemet
A novel pre-training method is proposed to improve deep-neural-networks (DNN) and long-short-term-memory (LSTM) performance, and reduce the local minimum problem for speech enhancement. We propose initializing the last layer weights of DNN and LSTM by Non-Negative-Matrix-Factorization (NMF) basis transposed values instead of random weights. Due to its ability to extract speech features even in presence of non-stationary noises, NMF is faster and more successful than previous pre-training methods for network convergence. Using NMF basis matrix in the first layer along with another pre-training method is also proposed. To achieve better results, we further propose training individual models for each noise type based on a noise classification strategy. The evaluation of the proposed method on TIMIT data shows that it outperforms the baselines significantly in terms of perceptual-evaluation-of-speech-quality (PESQ) and other objective measures. Our method outperforms the baselines in terms of PESQ up to 0.17, with an improvement percentage of 3.4%.
- حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران میشود.
- پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانههای چاپی و دیجیتال را به کاربر نمیدهد.