Adaptive Windows Convolutional Neural Network for Speech Recognition

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov model (HMM) leads to considerable performance achievement in speech recognition problem because deep networks model complex correlations between features. The main aim of this paper is to achieve a better acoustic modeling by changing the structure of deep Convolutional Neural Network (CNN) in order to adapt speaking variations. In this way, existing models and corresponding inference task have been improved and extended.
Here, we propose adaptive windows convolutional neural network (AWCNN) to analyze joint temporal-spectral features variation. AWCNN changes the structure of CNN and estimates the probabilities of HMM states. We propose adaptive windows convolutional neural network in order to make the model more robust against the speech signal variations for both single speaker and among various speakers. This model can better model speech signals. The AWCNN method applies to the speech spectrogram and models time-frequency varieties.
This network handles speaker feature variations, speech signal varieties, and variations in phone duration. The obtained results and analysis on FARSDAT and TIMIT datasets show that, for phone recognition task, the proposed structure achieves 1.2%, 1.1% absolute error reduction with respect to CNN models respectively, which is a considerable improvement in this problem. Based on the results obtained by the conducted experiments, we conclude that the use of speaker information is very beneficial for recognition accuracy.
Language:
Persian
Published:
Signal and Data Processing, Volume:15 Issue: 3, 2018
Pages:
13 to 30
magiran.com/p1919846  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
دسترسی سراسری کاربران دانشگاه پیام نور!
اعضای هیئت علمی و دانشجویان دانشگاه پیام نور در سراسر کشور، در صورت ثبت نام با ایمیل دانشگاهی، تا پایان فروردین ماه 1403 به مقالات سایت دسترسی خواهند داشت!
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!