A General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification on Telegram

Message:
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:

Nowadays, the use of various messaging services is expanding worldwide with the rapid development of Internet technologies. Telegram is a cloud-based open-source text messaging service. According to the US Securities and Exchange Commission and based on the statistics given for October 2019 to present, 300 million people worldwide used telegram per month. Telegram users are more concentrated in countries such as Iran, Venezuela, Nigeria, Kenya, Russia, and Ukraine. This messenger has become a popular and extensively used messenger because it supports various languages and provides diverse services such as creating groups and channels with a large number of users and members. There is a large amount of contextual data on telegram groups containing hidden knowledge; the extraction of this knowledge can be beneficial. The requests on telegram users' messages are examples of this sort of data with hidden knowledge. Hence, identifying requests can respond to users' needs and help them fulfill their desires immediately; this drives users' business development. The authors identified these requests in a telegram search engine named the Idekav system of Yazd University. Then, the authors created opportunities to earn money by sending these requests to the business owners who were able to respond to them. Given the high dimensions of feature space in contextual data, it is necessary to reduce attributes using feature selection.        In the present study, the appropriate features were selected for Persian text classification and request identification. Among the feature selection methods, two local and global filter-based methods were chosen. By general investigation and combining the most extensively used filter-based FS methods, an optimal subset of important features was obtained. This hybrid feature selection method resulted in increased request identification accuracy, improved Persian text classification efficiency, and reduced training time and computation by optimizing the feature reduction. Of course, it is noteworthy that the classification accuracy is reduced in some methods; however, this value is negligible compared to the feature reduction value. Incorporating the concept of opinion mining into the analysis of emotions and questions can be a method to identify positive or negative demand in social networks. Therefore, the requests in the Persian telegram messages can be identified using opinion mining researches. For experiments in the present article, a dataset called Persian is used, which is extracted from the Idekav system. The selection of suitable features to increase model accuracy in request identification is an important part of this research. The support vector machine was employed to calculate accuracy. Given the acceptable results of the SVM, its various kernels were also calculated. Micro-averaging and macro-averaging criteria were also used for evaluation. Model inputs include many optimal feature subsets. Furthermore, feature selection methods have been proposed to produce suitable features for each model for increasing the accuracy of the model. Afterward, among all the features investigated, appropriate features have been selected for each of the applied feature selection models. For a more precise explanation, the main innovations of the present study are as follows: Use of the most common filters based on local and global feature selection methods to find the optimal feature set. Use of hybrid methods to create suitable features for predictive models of accuracy in Persian text classification and their application in identifying requests in Persian messages on telegram. Selecting suitable features to increase accuracy and reduce computational time for each of the models under consideration. In this regard, in addition to picking an efficient algorithm, it is attempted to provide a method for making more appropriate choices. Evaluation and testing of the proposed models for a large set of Persian data and many different features.

Language:
Persian
Published:
Signal and Data Processing, Volume:19 Issue: 2, 2022
Pages:
175 to 196
magiran.com/p2491245  
دانلود و مطالعه متن این مقاله با یکی از روشهای زیر امکان پذیر است:
اشتراک شخصی
با عضویت و پرداخت آنلاین حق اشتراک یک‌ساله به مبلغ 1,390,000ريال می‌توانید 70 عنوان مطلب دانلود کنید!
اشتراک سازمانی
به کتابخانه دانشگاه یا محل کار خود پیشنهاد کنید تا اشتراک سازمانی این پایگاه را برای دسترسی نامحدود همه کاربران به متن مطالب تهیه نمایند!
توجه!
  • حق عضویت دریافتی صرف حمایت از نشریات عضو و نگهداری، تکمیل و توسعه مگیران می‌شود.
  • پرداخت حق اشتراک و دانلود مقالات اجازه بازنشر آن در سایر رسانه‌های چاپی و دیجیتال را به کاربر نمی‌دهد.
In order to view content subscription is required

Personal subscription
Subscribe magiran.com for 70 € euros via PayPal and download 70 articles during a year.
Organization subscription
Please contact us to subscribe your university or library for unlimited access!