Robust Speech Recognition using Long Short Term Memory Networks and Bottleneck Features

Deep neural networks have been widely used in speech recognition systems in recent years. However, the robustness of these models in the presence of environmental noise has been less discussed. In this paper, we propose two approaches for the robustness of deep neural networks models against environmental additive noise. In the first approach, we propose to increase the robustness of long short-term memory (LSTM) networks in the presence of noise based on their abilities in learning long-term noise behavior. For this purpose, we propose to use noisy speech for training models. In this way, LSTMs are trained in a noise-aware manner. The results on the noisy TIMIT dataset show that if the models are trained with noisy speech rather than clean speech, recognition accuracy will be improved up to 18%. In the second approach, we propose to reduce noise effects on the extracted features using a denoised autoencoder network and to use the bottleneck features to compress the feature vector and represent the higher level of input features. This method increases the accuracy of the proposed recognition system in the first approach by 4% in the presence of noise.

Article Type:
Research/Original Article
Journal of Electrical Engineering, Volume:49 Issue:3, 2020
1333 - 1343  
روش‌های دسترسی به متن این مطلب
اشتراک شخصی
در سایت عضو شوید و هزینه اشتراک یک‌ساله سایت به مبلغ 300,000ريال را پرداخت کنید. همزمان با برقراری دوره اشتراک بسته دانلود 100 مطلب نیز برای شما فعال خواهد شد!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی همه کاربران به متن مطالب خریداری نمایند!