An Application and the Adjustment of Zipf Law and Zou Statistical Model in the Recognition of Stop Words in Persian language by utilizing Language Corpus of Articles of scientific research in the field of Library and Information Science
Author(s):
Abstract:
Purpose
the aim of this research was to recognize and extract a systematic listof Stop Words in order to utilize it in the automatic indexing of Persian texts in the field of Library and Information Science Method
We used content analysis. The research population was 56 articles from which 20 articles were selected on the basis of simple random sampling. Findings
Among 15557 words existing in the corpus, according to Zou model in the pre-adjustment list, 1368 words and in the post-adjustment list, 468 words were recognized as stop words. Also according to Zipf law, in the pre-adjustment list, 217 words and in the post-adjustment list, 607 stop words were recognized. The total number of words in the abstract of articles was 1989. In the Zou model, according to pre-adjustment style148 words and according to post-adjustment style173 words were extracted as stop words. Also on the basis of the Zipf law, in pre-adjustment style, 60 words and in post-adjustment style, 186 words were recognized. In the both applied method there was a direct relation between the frequency of words and probability of being stop words. The highest percentage of stop words (39/44 percent) was attained in the texts of the articles through the application of Zou Statistical Model. The results of this research can lead to increase efficiency of information store and retrieval, decreasing of input and saving in time and expense.Keywords:
Language:
Persian
Published:
Library and Information Science Research, Volume:3 Issue: 2, 2014
Pages:
191 to 208
https://www.magiran.com/p1307507
سامانه نویسندگان
مقالات دیگری از این نویسنده (گان)
-
Factors Affecting the Quantitative Growth of Iran’s Scientific Production after the Islamic Revolution of Iran
*
International Journal of Information Science and Management, Autumn 2024 -
Content Analysis of the articles in Journal of Linguistic and Rhetorical Studies of Semnan University (2010 - 2023)
Hossein Moradi Moghadam *
Journal Of Linguistic and Rhetorical Studies,