Introducing a new information retrieval method applicable for speech recognized texts

Abstract:
In this article a pre-processing method is introduced which is applicable in speech recognized texts retrieval task. We have a text corpus that generated from a speech recognition system and a query as inputs, want to search queries in these documents and find relevant documents. The main problem is that the typical speech recognized texts suffer from some percentage of recognition error. This problem causes terms to have erroneously assign to irrelevant documents.
The idea of our proposed method, is to detect error-prone terms and to find similar words for each term. A parameter is defined which calculate the probability for occurring error in the error-prone words. To recognize similar words for each specific term, based on a criterian which is called average detection rate (ADR) and levenshtein distance criterion, some candidates are chosen as the initial similar words set. Then, a conversion probability is defined based on the conversion rate (CR) and the noisy channel model (NCM) and the words with higher probability based on a threshold level are selected as the final similar words. In the retrieval process, these words are considered in the search step in addition to the base word. Implementation result shows a significant improvement up to 30% in F-measure in information retrieval method with consideration this pre-processing.
Language:
Persian
Published:
Signal and Data Processing, Volume:13 Issue: 4, 2017
Pages:
93 to 108
magiran.com/p1702018  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!