A Corpus-based Study of Using Function and Content Words in Persian Authorship Attribution
Author(s):
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Nowadays, corpora are widely used in authorship attribution. In this research, a corpus of persian contemporary texts was applied to identify the authorship of texts and the effectiveness of function and content words in this task was compared. In order to reach this goal, seven contemporary writers named Hoshang Golshiri, Bozor Alavi, Ahmad Mahmoud, Mahmoud Dolatabadi, Nader Ebrahimi, Jalal Al Ahmad and Gholamhossein Saedi were selected and their books were collected. Then by using this corpus and deep learning algorithms like multilayer perceptron and Long Short Term Memory, effectiveness of function and content words was evaluated. The results of the research indicated that function words based method was superior to content words one in authorship attribution. In addition, pronouns, especially demonstrative and personal pronouns, showed the highest efficiency among the types of function words to determine the author of a text. Moreover, features based on conjunctions and auxiliary verbs were valuable to recognize persian writers.
Keywords:
Language:
Persian
Published:
Language and Linguistics, Volume:19 Issue: 37, 2024
Pages:
193 to 220
https://www.magiran.com/p2708094
سامانه نویسندگان
اطلاعات نویسنده(گان) توسط ایشان ثبت و تکمیل شدهاست. برای مشاهده مشخصات و فهرست همه مطالب، صفحه رزومه را ببینید.
مقالات دیگری از این نویسنده (گان)
-
Exploring a Novel Multi-Channel Structure to Improve Facial Expression Recognition on Occluded Samples Using Deep Convolutional Neural Network
Mohammadhossein Zolfagharnasab, *, Masood Hamed Saghayan, Fatemeh Sadat Masoumi
Journal of Artificial Intelligence, Applications, and Innovations, Spring 2024 -
Automatic Recognition of Authors Identity in Persian based on Systemic Functional Grammar
*, , , Shahram Modarres Khiabani
Journal of Western Iranian Languages and Dialects, -
Accusative Case Marker “rā” in the Persian Language: A Corpus-Based Description
A Mirzaei
Namaeh Farhangistan, -
A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
L. Jafar Tafreshi *, F. Soltanzadeh
Journal of Artificial Intelligence and Data Mining, Spring 2020