Corpus-based classification of Persian homographs

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
One of the big challenges in natural language processing is ambiguity. Homographs and homograph sense disambiguation is highly important in computational processing of texts. In languages with complex morphology, there exists lots of homographs, which are worth studying and classifying. In present study, in order to study Persian homographs extracted from the corpus, first words with more than one POS tag were extracted from an annotated corpus, 10978 words. Then, the frequency of each POS tag pertaining to every homograph was studied and another list of homographs was extracted from the first one, which include homographs with high frequency of the first tag (more than 20) and considerable frequency of the second tag (more than 10); the mentioned list include 1675 homographs. Morphological, phonological as well as semantic structures of homographs were studied, based on which all homographs were classified into 11 categories. From 11 categories, only homographs in one category were chosen based on semantic criteria and the rest were classified based on morphological as well as phonological criteria. The output of the present study includes a big list of homographs extracted from Persian text corpus, each of which are categorized in one or more than one category based on morphological and phonological characteristics of homographs. Such list and the related categorization could be used in word sense disambiguation systems.
Language:
Persian
Published:
Journal of Information Processing and Management, Volume:38 Issue: 3, 2023
Pages:
825 to 900
magiran.com/p2555084  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!