Using Data Augmentation Techniques for Sentiment Analysis of Users’ Opinions on Reopening of Schools During the Covid-19 Epidemic
Sentiment analysis, also called opinion mining, is one of the sub-areas of natural language processing that aims to classify texts according to the sentiments, beliefs and attitudes expressed in them. In the most current research, texts are divided into two "positive" and "negative" categories. However, there are also other categories such as good/bad" and agree/disagree, every one of which has its applications. The purpose of this paper is to analyze the opinions expressed by users on social media about the reopening of schools during the Covid-19 outbreak using supervised machine learning techniques, and to classify them into two "agree" and "disagree" categories. Users' opinions, in this paper, are in Persian. The lack of sufficient datasets and also the low accuracy of natural language processing tools are the most important problems of text processing in Persian. Due to the mentioned limitations, the use of supervised machine learning algorithms and also the extraction of effective features for training machine learning classifiers in Persian are facing a serious challenge. In this paper, first, a small dataset of the users' opinions about the reopening of schools was collected and manually labeled. Then, a combined method was used for data augmentation of the dataset. In the proposed method, first, Persian sentences were translated into English. Then nouns, verbs and adjectives of the English sentences were replaced with their synonyms. Next, the English sentences were translated into Persian again. The new sentence with the class label of the initial sentence was added to the training set. Thus, the size of the training set increased by 97 percent. After that, the efficiency of employing the common pre-processing steps and using common feature sets in sentiment analysis of the English texts for Persian were evaluated and the best of them were selected. Considering the low accuracy of the Persian natural language processing tools, it was tried to select those features that were less dependent on the tools. Finally, machine learning classification was used to determine agree/disagree class of the user opinions of the test sets. The results of the experiments indicated that by applying the proposed method for data augmentation and using selected features in this paper, 81 and 79 percent precision was obtained for the polarity classification of opinions using SVM and CNN algorithms, respectively.
-
Task Placement in Fog Computing Considering User Mobility and Overload
S. Ansari Moghaddam, S. Noferesti *, M. Rajaei
Journal of Electrical Engineering, -
Combination of Instance Selection and Data Augmentation Techniques for Imbalanced Data Classification
Parastoo Mohaghegh, *, Mehri Rajaei
Iranian Journal of Electrical and Computer Engineering,