An Application and the Adjustment of Zipf Law and Zou Statistical Model in the Recognition of Stop Words in Persian language by utilizing Language Corpus of Articles of scientific research in the field of Library and Information Science

Abstract:
Purpose
the aim of this research was to recognize and extract a systematic listof Stop Words in order to utilize it in the automatic indexing of Persian texts in the field of Library and Information Science
Method
We used content analysis. The research population was 56 articles from which 20 articles were selected on the basis of simple random sampling.
Findings
Among 15557 words existing in the corpus, according to Zou model in the pre-adjustment list, 1368 words and in the post-adjustment list, 468 words were recognized as stop words. Also according to Zipf law, in the pre-adjustment list, 217 words and in the post-adjustment list, 607 stop words were recognized. The total number of words in the abstract of articles was 1989. In the Zou model, according to pre-adjustment style148 words and according to post-adjustment style173 words were extracted as stop words. Also on the basis of the Zipf law, in pre-adjustment style, 60 words and in post-adjustment style, 186 words were recognized. In the both applied method there was a direct relation between the frequency of words and probability of being stop words. The highest percentage of stop words (39/44 percent) was attained in the texts of the articles through the application of Zou Statistical Model. The results of this research can lead to increase efficiency of information store and retrieval, decreasing of input and saving in time and expense.
Language:
Persian
Published:
Library and Information Science Research, Volume:3 Issue: 2, 2014
Pages:
191 to 208
magiran.com/p1307507  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!