Implementation of a Real-time Speaker Independent Discrete Utterance Speech Recognition System with Large Vocabulary

Abstract:
During the process of design and implementation of classic real-time speaker independent discrete utterance speech recognition systems with large vocabulary (1000 to 10000 words), one encounter two major problems: First, the time consuming process of preparing a large vocabulary data set with a considerable number (100 to 10000) of speakers for obtaining a satisfactory and reliable training of the system, and, second, impossibility of a real-time execution of recognition phase using available personal computers. In order to solve these problems, we have done a detailed and vast research. Regarding the first problem, we have prepared a large speech data set (50 to 60 pronunciations/word for each speaker) using 50 to 100 speakers chosen based on a special methodology (number of males is 1.5 times the number of females), then, we have designed a speaker dependent speech recognition system for each speaker, and by a special combination of reference speakers, we have achieved a speaker independent speech recognition system with an recognition rate of 97.4% with a standard deviation of 2.1%. However, due to the high computational cost of ML (Maximum Likelihood) training method, real-time implementation of recognition phase is impossible. In order to solve this problem, we have used several Tied Mixtures methods to represent the pdf (probability density function) of HMM states. Finally, using Tied Mixtures methods, SCD (Semi Continuous Density) modeling and fast search algorithms in SCD code book, we could reach a real-time implementation of our system during the recognition phase. Due to the utilization of sub-optimal methods, the speech recognition performance of the resulted system has a reduction of 1.5% comparing the previous results. As a consequence, we have achieved a speaker independent speech recognition system with a recognition rate of 95.9% with a standard deviation of 2.8%. In speaker dependent mode, the recognition rate is 98.5% with a standard deviation of 1.2%. This system works in real-time mode tested on a Pentium IV PC with a speed higher than 2.4 GHz and 512 MB of RAM.
Language:
Persian
Published:
Signal and Data Processing, Volume:4 Issue: 2, 2008
Page:
27
magiran.com/p883434  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!