A Corpus-based study of Persian noun and adjective homographs to help right POS tagging

Author(s):
Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Present research studies morphological structure of nouns and adjectives; there are two main reasons for studying them in the process of making any POS tagger system for tagging nouns: 1. If the system faces an out of vocabulary word (OOV word), one way to identify its tag would be considering its morphological structure. 2. In Persian, lots of homographs are made due to Persian complex morphology; studying morphological structure of nouns in order to distinguish them from adjectives seems to be necessary, since many adjectives, having the same orthographic forms of nouns, would be wrongly tagged as “noun” or vice versa. After studying morphological structure of nouns and adjectives in present study, Persian writing system is studied; then definition of homographs and the related classifications are presented. Finally, the study uses different famous Persian corpora (including Bijankhan, and syntactical dependency corpus (vabastegi ye nahvi) for searching for homographs (using search tools) and Data center for Persian language (Paygah e Dadegan) whose non-tagged file was available (the homographs are searched and tagged manually)) to make a list of homographs. The result of studying the mentioned list showed that the frequency of homographs, especially those which are made due to identical orthographic form of indefinite morpheme, adjective-maker morpheme and second person inflectional morpheme is high is Persian corpora which makes POS tagging difficult.
Language:
Persian
Published:
Journal of Information Processing and Management, Volume:34 Issue: 2, 2019
Pages:
897 to 921
magiran.com/p1950472  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
دسترسی سراسری کاربران دانشگاه پیام نور!
اعضای هیئت علمی و دانشجویان دانشگاه پیام نور در سراسر کشور، در صورت ثبت نام با ایمیل دانشگاهی، تا پایان فروردین ماه 1403 به مقالات سایت دسترسی خواهند داشت!
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!