Capabilities and Limitations of Persian Stemming in Natural Language Processing
This article presents a review of stemming techniques for the Persian language, encompassing structural methods, statistical approaches, and lookup tables. In addition, we explore the potential improvement of Persian stemming by drawing insights from theoretical research and experimental results on languages sharing common challenges with Persian. Through a meticulous analysis, we propose the incorporation of Byte Pair Encoding (BPE) and Sequence-to-Sequence (Seq2Seq) models into the Persian stemming framework. This recommendation is rooted in the unique strengths of these methods, tailored to address Persian's intricate morphology, extensive loanword integration, and script diversity. BPE excels in capturing prevalent morphemes and managing out-of-vocabulary terms, while Seq2Seq models show promise in decoding implicit morphological rules and accommodating linguistic idiosyncrasies. In light of Persian's status as a low-resource language in need of advanced technological resources, we put forward a novel enhancement for Persian stemming. This enhancement leverages both BPE and Seq2Seq models within a unified NLP pipeline, signifying a promising path for further research in Persian language processing. By harnessing linguistic insights, this approach has the potential to contribute significantly to bridging the digital language divide for Persian.
-
Inferring organizational duties from Persian administrative and employment laws using Large Language Models (LLMs) and few-shot learning
Hojjat Hajizadeh Nowkhandan *,
Journal of Innovations in Computer Science and Engineering, Winter and Spring 2025 -
Description-based Post-hoc Explanation for Twitter List Recommendations
Havva Alizadeh Noughabi, Behshid Behkamal *,
Journal of Computer and Knowledge Engineering, Summer-Autumn 2024