Robust Speech Recognition using Long Short Term Memory Networks and Bottleneck Features
Deep neural networks have been widely used in speech recognition systems in recent years. However, the robustness of these models in the presence of environmental noise has been less discussed. In this paper, we propose two approaches for the robustness of deep neural networks models against environmental additive noise. In the first approach, we propose to increase the robustness of long short-term memory (LSTM) networks in the presence of noise based on their abilities in learning long-term noise behavior. For this purpose, we propose to use noisy speech for training models. In this way, LSTMs are trained in a noise-aware manner. The results on the noisy TIMIT dataset show that if the models are trained with noisy speech rather than clean speech, recognition accuracy will be improved up to 18%. In the second approach, we propose to reduce noise effects on the extracted features using a denoised autoencoder network and to use the bottleneck features to compress the feature vector and represent the higher level of input features. This method increases the accuracy of the proposed recognition system in the first approach by 4% in the presence of noise.