A Corpus-based Study of Using Function and Content Words in Persian Authorship Attribution

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Nowadays, corpora are widely used in authorship attribution. In this research, a corpus of persian contemporary texts was applied to identify the authorship of texts and the effectiveness of function and content words in this task was compared. In order to reach this goal, seven contemporary writers named Hoshang Golshiri, Bozor Alavi, Ahmad Mahmoud, Mahmoud Dolatabadi, Nader Ebrahimi, Jalal Al Ahmad and Gholamhossein Saedi were selected and their books were collected. Then by using this corpus and deep learning algorithms like multilayer perceptron and Long Short Term Memory, effectiveness of function and content words was evaluated. The results of the research indicated that function words based method was superior to content words one in authorship attribution. In addition, pronouns, especially demonstrative and personal pronouns, showed the highest efficiency among the types of function words to determine the author of a text. Moreover, features based on conjunctions and auxiliary verbs were valuable to recognize persian writers.
Language:
Persian
Published:
Language and Linguistics, Volume:19 Issue: 37, 2024
Pages:
193 to 220
https://www.magiran.com/p2708094  
سامانه نویسندگان
  • Fatemeh Soltanzadeh
    Corresponding Author (1)
    .Ph.D Linguistics, Allameh Tabataba'i University, Tehran, Iran
    Soltanzadeh، Fatemeh
  • Azadeh Mirzaei
    Author (2)
    Associate Professor Linguistics, Allameh Tabataba'i University, Tehran, Iran
    Mirzaei، Azadeh
  • Mohammad Bahrani
    Author (3)
    Assistant Professor Faculty of Statistics, Mathematics and Computer, Allameh Tabataba'i University, Tehran, Iran
    Bahrani، Mohammad
اطلاعات نویسنده(گان) توسط ایشان ثبت و تکمیل شده‌است. برای مشاهده مشخصات و فهرست همه مطالب، صفحه رزومه را ببینید.
مقالات دیگری از این نویسنده (گان)