The Study of Persian Writing Style Variations and their Impacts on Information Retrieval: The case of Hamshahri Corpus
In its interactions with the digital environment, Persian language faces a variety of challenges, especially regarding its writing style, which can affect the effectiveness of Persian information retrieval (IR).Persian writers may either ignore some characters or replace them with other ones.
Using a conceptual content analysis method, the present research investigated seven challenging characters to explore the writing behaviors as reflected in the Hamshahri Corpus during 1996-2007. The degree of the writing behavior conformity to the standard Persian writing style was also investigated using "Engagement Index" (i.e. the arithmetic ratio of the words conformed to the standard writing style.
Results showed that the Corpus typists generally tend to omit the challenging characters or replace them with simplified characters. It seems, therefore, that ignoring the challenges in Persian IR systems would not largely affect IR effectiveness. The engagement index equals 0.033 demonstrating the least conformity of the writing behavior to the standard. This may have roots in the fact that Persian writers tend to simplify the writing style due to the “least effort” principle and feel no needs to adhere to the traditional Arabic writing style, as prescribed by the standard.
- حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران میشود.
- پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانههای چاپی و دیجیتال را به کاربر نمیدهد.