Capabilities and Limitations of Persian Stemming in Natural Language Processing

Author(s):

Maryam Assadi * , Vida Shaghaghi , Mohsen Kahani

Message:

Article Type:

Research/Original Article (دارای رتبه معتبر)

Abstract:

This article presents a review of stemming techniques for the Persian language, encompassing structural methods, statistical approaches, and lookup tables. In addition, we explore the potential improvement of Persian stemming by drawing insights from theoretical research and experimental results on languages sharing common challenges with Persian. Through a meticulous analysis, we propose the incorporation of Byte Pair Encoding (BPE) and Sequence-to-Sequence (Seq2Seq) models into the Persian stemming framework. This recommendation is rooted in the unique strengths of these methods, tailored to address Persian's intricate morphology, extensive loanword integration, and script diversity. BPE excels in capturing prevalent morphemes and managing out-of-vocabulary terms, while Seq2Seq models show promise in decoding implicit morphological rules and accommodating linguistic idiosyncrasies. In light of Persian's status as a low-resource language in need of advanced technological resources, we put forward a novel enhancement for Persian stemming. This enhancement leverages both BPE and Seq2Seq models within a unified NLP pipeline, signifying a promising path for further research in Persian language processing. By harnessing linguistic insights, this approach has the potential to contribute significantly to bridging the digital language divide for Persian.

Keywords:

Morphology , Morphological Analysis , Persian Language , Stemming , Natural Language Processing (NLP) , Pre-Processing , Sequence To Sequence Model

Language:

Persian

Published:

Journal of Western Iranian Languages and Dialects, Volume:13 Issue: 48, 2025

Pages:

1 to 17

https://www.magiran.com/p2837410

دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:

اشتراک شخصی

با ثبت ایمیلتان و پرداخت حق اشتراک سالانه به مبلغ 1,950,000 ريال، بلافاصله متن این مقاله را دریافت کنید.اعتبار دانلود 70 مقاله نیز در حساب کاربری شما لحاظ خواهد شد.

پرداخت حق اشتراک به معنای پذیرش "شرایط خدمات" پایگاه مگیران از سوی شماست.

پست الکترونیکی

اگر مقاله ای از شما در مگیران نمایه شده، برای استفاده از اعتبار اهدایی سامانه نویسندگان با ایمیل منتشرشده ثبت نام کنید. ثبت نام

اشتراک سازمانی

به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!

اطلاعات بیشتر ثبت نام با ایمیل دانشگاهی/سازمانی

توجه!

حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.

In order to view content subscription is required

Personal subscription

Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.

Organization subscription

Please contact us to subscribe your university or library for unlimited access!

More information

سامانه نویسندگان

Mohsen Kahani

Author (3)

Professor computer engineering, Ferdowsi University, Mashhad, Iran

اطلاعات نویسنده(گان) توسط ایشان ثبت و تکمیل شده‌است. برای مشاهده مشخصات و فهرست همه مطالب، صفحه رزومه را ببینید.

مقالات دیگری از این نویسنده (گان)

Inferring organizational duties from Persian administrative and employment laws using Large Language Models (LLMs) and few-shot learning
Hojjat Hajizadeh Nowkhandan *, Mohsen Kahani
Journal of Innovations in Computer Science and Engineering, Winter and Spring 2025
Description-based Post-hoc Explanation for Twitter List Recommendations
Havva Alizadeh Noughabi, Behshid Behkamal *, Mohsen Kahani
Journal of Computer and Knowledge Engineering, Summer-Autumn 2024

علمی مصوب

نشریه مطالعات زبان و گویش های غرب ایران

Journal of Western Iranian Languages and Dialects

فصلنامه علوم انسانی

آخرین شماره | آرشیو

ISSN: 2345-2579

صاحب امتیاز:

دانشگاه رازی کرمانشاه

مدیر مسئول:

دکتر شجاع تفکری رضایی

سردبیر:

عامر قیطوری

تلفن نشریه: ۰۸۳-۳۴۲۸۳۹۰۳

اطلاعات بیشتر نشریه

درباره نشریه پیام به نشریه سایت اختصاصی نشریه پذیرش الکترونیکی مقاله