A Survey of Semantic Search and Retrieval Approaches for Persian and Arabic Texts

Author(s):
Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
Purpose

In recent decades, web search engines have become one of the most prominent and essential tools for accessing information in today's interconnected world. With the increasing volume of information available on the web, the demand for locating and accessing relevant and meaningful information has also risen. Traditional search engines typically retrieve results based on keyword matching and the number of similar entries in the texts. This method often leads to undesirable and irrelevant results. These problems are even more pronounced in Persian and Arabic due to the complex grammar of these languages, which is not machine-readable. The aim of this research is to review and present solutions for semantic search and retrieval of Persian and Arabic texts.

Method

This research is a content analysis study, and the library method was used to collect data. To collect information and access the required resources, various sources were used, including scientific articles, books, theses, and reports. For collecting Persian articles, sources, and for collecting English articles, sources with publication dates from 2020 onwards were used.The content analysis method was utilized to analyze the collected data. By employing data analysis and interpretation methods, the results of previous studies were reviewed and evaluated alongside the new findings of the research. This evaluation involved identifying the issues and constraints of current semantic search engines and offering suggestions for enhancement.

Findings

In Persian and Arabic text semantic search and information retrieval research, methods based on text semantic analysis and processing using pre-trained language models, clustering algorithms like K-Means, and knowledge resources such as knowledge graphs are employed. Additionally, the dataset, the utilization of models and algorithms, and the method of semantic search and retrieval between words all influence the system's performance and accuracy. According to the findings of numerous studies, there is a wide range of methods and algorithms available for text semantic search and retrieval, each of which can produce different results. These findings demonstrate that each of the methods used has the ability to retrieve the semantic meaning of texts and varies in terms of search accuracy capabilities. An examination of the research findings reveals that some methods outperform others. These methods demonstrate strong semantic search capabilities by employing various techniques and algorithms such as topic analysis, neural networks, vector representations, and more. On the other hand, the appropriate method should be chosen based on the nature of the problem and the characteristics of the data. Each problem and dataset may have its own unique requirements. Selecting the best method and adjusting its parameters is critical for optimal performance.

Conclusion

Each of the presented methods offers unique solutions for the issues and linguistic characteristics of the two languages, Persian and Arabic. Additionally, various methods utilizepre-trained language models like BERT, clustering algorithms such as K-Means, and knowledge resource-based retrieval systems like knowledge graphs. The presented solutions also utilize specific datasets and resources for training and evaluation. The differences in the dataset and how these models and algorithms are used and configured are critical. Some methods perform information retrieval based on meaning and semantic relationships between words, while others use keyword and root-based methods. This variation in the search and retrieval method can impact the system's performance and accuracy. Each method has a different performance and accuracy in retrieving information, which is attributed to the varied ways in which models, algorithms, and data sources are utilized.

Language:
Persian
Published:
Journal of Sciences and Techniques of Information Management, Volume:9 Issue: 4, 2023
Pages:
185 to 204
magiran.com/p2708965  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!