Application of the Neural Network-based Machine Learning Method to Classify Scientific Articles

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:

Since 2001s (1380s according to the Iran’s solar calendar), the increasing rate of writing and publishing scientific articles in Iran has become very intense. This caused in addition to the governmental organizations, such as Irandoc & the National Library and Archives of the Islamic Republic of Iran, numerous other online systems, such as the General Portal of Humanities, Noormags, Magiran, Elmnet, Civilica, etc, to manage knowledge and to provide structured archives of the scientific documents. Each of these archives provides facilities to the user. One of these facilities is searching on the documents. An accurate search can greatly improve the usage of these online systems. To increase the accuracy of the search result, it is necessary to determine the scientific field of articles. Classifying large volumes of scientific resources in different fields is very time-consuming. Using machinery methods can be a solution to reduce the severity of the task.The main contribution of this paper is to provide a classification model to classify Persian scientific articles. Although in previous studies, the classification task has been mainly used for simple texts, in this study, the neural network-based classification models, such as convolutional and perceptron neural networks, are used with the contextualized semantic representation, such as ParsBERT; and the results are compared with the other common method utilized for vectorization, namely Word2Vec. To this end, we use the data from the General Portal of Humanities, which includes various articles in the Humanities and each article contains the label of the field. One of the neural network characteristics is that a set of hidden features from the data in the vector space is created and used to train the model. According to the experimental results, the Perceptron classifier that utilized ParsBERT representation obtained the highest performance which is 74.71% based on the Micro F-score, and 72.55% based on the Macro F-score.

Language:
Persian
Published:
Journal of Information Processing and Management, Volume:37 Issue: 4, 2022
Pages:
1217 to 1244
magiran.com/p2477020  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!