A Two Phase Speech Enhancement Based on Deep Denoising Autoencoder
The short-and the long-term information in speech signal are useful for speech enhancement, especially if the speech signal is corrupted by both stationary and non-stationary noises. This paper proposes a new approach to provide long-term speech input for a deep denoising autoencoder by reducing the number of frequency sub-bands of the input data. This paper also proposes a two phase speech enhancement approach. The first phase performs short-term speech enhancement by using a deep denoising autoencoder. In the second phase, long-term speech enhancement denoising autoencoder is applied on the output of short-term enhanced speech data. The proposed models were evaluated on the Aurora-2 Speech recognition corpus and our results show significant improvements of 0.3 in PESQ score at lower SNR values. The proposed models were evaluated on the recognition task where the proposed method results in 4% reduction in word error rate for the multi-condition training when compared to the baseline MFCC front-end.