sentiment analysis
در نشریات گروه برق-
Accurate prediction of stock market trends is crucial for informed investment decisions and effective portfolio management, ultimately leading to enhanced wealth creation and risk mitigation. This study proposes a novel approach for predicting stock prices in the stock market by integrating Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks, using sentiment analysis of social network data and candlestick data (price). The proposed methodology consists of two primary components: sentiment analysis of social networks and candlestick data. By amalgamating candlestick data with insights gleaned from Twitter, this approach facilitates a more detailed and accurate examination of market trends and patterns, ultimately leading to more effective stock price predictions. Additionally, a Random Forest algorithm is used to classify tweets as either positive or negative, allowing for a more subtle and informed assessment of market sentiment. This study uses CNN and LSTM networks to predict stock prices. The CNN extracts short-term features, while the LSTM models long-term dependencies. The integration of both networks enables a more comprehensive analysis of market trends and patterns, leading to more accurate stock price predictions.
Keywords: Stock Price Prediction, Deep Learning, Sentiment Analysis, Long Short-Term Memory, Convolutional Neural Network -
نشریه پژوهش های نظری و کاربردی هوش ماشینی، سال دوم شماره 1 (پیاپی 3، بهار و تابستان 1403)، صص 120 -130افسردگی یکی از شایع ترین بیماری های روانی دنیای امروز است که می توان آن را با استفاده از اطلاعات موجود در شبکه های اجتماعی به طور موثری شناسایی کرد. استفاده از شبکه های اجتماعی برای مقاصد مختلف در سال های گذشته افزایش داشته است، زیرا این شبکه ها بیانگر اطلاعات مهمی هم از افراد و هم از جامعه هستند. پژوهشگران تلاش کرده اند تا افسردگی را با استفاده از وجوه مختلف اطلاعات مثل عکس، متن و صوت شناسایی کنند؛ اما بیشتر پژوهش ها تمرکز بر این موضوع داشتند که فقط از یک نوع اطلاعات مثل متن یا عکس برای تشخیص استفاده کنند که به نتایج قابل توجهی دست یافته اند. در این پژوهش یک مدل هوش مصنوعی چندوجهی از نوع شبکه های عمیق معرفی می شود که اطلاعات متن و عکس را با هم تحلیل کرده و افسردگی را تشخیص می دهد. این پژوهش از کدگذار متنی Bert و ResNet برای استخراج ویژگی استفاده می کند. این مدل نسبت به مدل های مشابه با استفاده از مقدار بسیار کمتری از مجموعه داده ی اصلی، دقت را نزدیک به 5 درصد ارتقا داده است و به 87/89 درصد رسانده است.کلید واژگان: تشخیص افسردگی، ترکیب داده های چندوجهی، تحلیل احساسات، شبکه های اجتماعیDepression is one of the most significant and prevalent mental health disorders in today’s world. Early detection of depression is critical, and this study aims to identify depression in individuals using information derived from social media. The use of social media for various purposes has grown in recent years, as these platforms provide valuable insights into both individuals and society. Social media can be effectively utilized to detect depression. Researchers have attempted to identify depression using various types of data, such as images, text, and audio. Most studies have focused on using only one type of data, such as text or images, for detection. While these methods have achieved notable results, they have limitations in accuracy that can be addressed by incorporating new methods and integrating multiple data modalities into the model. In this study, we propose a multimodal model that analyzes text and images together to detect depression. Compared to similar models, our approach achieves an approximate 5% improvement in accuracy, reaching 89.87%, while utilizing significantly less of the original dataset.Keywords: Depression Detection, Multi-Modal Data Fusion, Sentiment Analysis, Social Networks
-
Journal of Electrical and Computer Engineering Innovations, Volume:13 Issue: 1, Winter-Spring 2025, PP 27 -42Background and ObjectivesThe lack of a suitable tool for the analysis of conversational texts in Persian language has made various analyzes of these texts, including Sentiment Analysis, difficult. In this research, it has we tried to make the understanding of these texts easier for the machine by providing PSC, Persian Slang Convertor, a tool for converting conversational texts into formal ones, and by using the most up-to-date and best deep learning methods along with the PSC, the sentiment learning of short Persian language texts for the machine in a better way.MethodsBe made More than 10 million unlabeled texts from various social networks and movie subtitles (as dialogue texts) and about 10 million news texts (as official texts) have been used for training unsupervised models and formal implementation of the tool. 60,000 texts from the comments of Instagram social network users with positive, negative, and neutral labels are considered as supervised data for training the emotion classification model of short texts. The latest methods such as LSTM, CNN, BERT, ELMo, and deep processing techniques such as learning rate decay, regularization, and dropout have been used. LSTM has been utilized in the research, and the best accuracy has been achieved using this method.ResultsUsing the official tool, 57% of the words of the corpus of conversation were converted. Finally, by using the formalizer, FastText model and deep LSTM network, the accuracy of 81.91 was obtained on the test data.ConclusionIn this research, an attempt was made to pre-train models using unlabeled data, and in some cases, existing pre-trained models such as ParsBERT were used. Then, a model was implemented to classify the Sentiment of Persian short texts using labeled data.Keywords: Natural Language Processing, Persian Conversational Text, Sentiment Analysis, Deep Learning
-
امروزه با توجه به تمایل روزافزون مردم برای خرید اجناس از طریق فروشگاه های اینترنتی و شبکه های مجازی، شاهد افزایش داده های بدون ساختار مانند متن در سطح اینترنت هستیم. لذا پردازش متون و توسعه الگوریتم های کارآمد جهت استخراج دانش، توجه پژوهشگران حوزه علوم داده را در بسترهای مذکور به خود جلب کرده است. از رویکردهای پردازش متن می توان به دسته بندی جملات به گروه های احساسی متفاوت با استفاده از الگوریتم ها و روش های گوناگون اشاره کرد. در پژوهش حاضر، چارچوبی برای دسته بندی نظرات، مبتنی بر احساسات کاربران توسعه داده شده است که از پردازش در سطح حروف بهره می برد. از این رو در چارچوب پیشنهادی، از معماری تعبیه شده در مدل های زبانی استفاده شده است که لایه های چهارگانه تعبیه (جهت انتقال حروف به فضای برداری)، پیچش یک بعدی (جهت استخراج بردار ویژگی برای هر واژه)، نگاشت و شبکه عصبی بازگشتی را شامل می شود. در چارچوب پیشنهادی، ابتدا با بکارگیری لایه تعبیه در سطح حروف، برداری ثابت برای آنها تعیین می شود. سپس، مبتنی بر عملگرهای پیچش یک بعدی که به صورت موازی بکارگیری شده اند، ارتباط معنایی و منطقی بین حروف تشکیل دهنده هر واژه به دست آمده و بردار 128 بعدی برای هر لغت، حاصل می شود. پس از دستیابی به بردارهای واژگان، با استفاده از دو معماری شبکه های عصبی بازگشتی، ارتباط بین واژگان کشف شده و احساس مرتبط با دیدگاه، تعیین می شود. نتایج حاصل از بکارگیری مدل پیشنهادی بر روی مجموعه نظرات مبتنی بر سنجه های Accuracy و F-score، به ترتیب 79.87% و 79.90% می باشد.کلید واژگان: پردازش زبان طبیعی، تشخیص احساسات، مدل مبتنی بر محتوا، شبکه عصبی عمیق، بسترهای اینترنتیNowadays, due to people being more willing to shop online through online stores and social media, we are facing the growth of unstructured data like texts on the internet. Hence, text processing and the development of optimal algorithms for extracting knowledge have drawn scholar’s attention to this field. One of the aspects of the text processing field is classifying texts in the form of classes of various sentiments using different algorithms. In this paper, we propose a novel framework to classify the comments based on the user’s sentiment performed in the character-level scenario. Hence, the proposed framework is mounted on the architecture of embedding from the language model triggered by the quad-layer, namely embedding, one-dimensional convolution, the map, and the recurrent neural network. In the proposed framework, first, by using the embedding layer at the level of the character, a constant vector is assigned to them. Next, the semantic and logical relation between the characters per word for surviving word-specific 128-dimensional vectors is extracted by exerting the parallel-oriented one-dimensional convolution operators. After obtaining vectors, based on two recurrent neural network architectures, the relationship between the discovered words and the comment-specific sentiment is determined. The obtained results show that the proposed framework has an Accuracy of 79.87% and a F-score of 79.90% for comments class labeling.Keywords: Natural Language Processing, Sentiment Analysis, Context Based Model, Deep Neural Network, Internet Platforms
-
ازجمله روش های موفق برای تحلیل احساسات، روش های یادگیری باناظر است که با آموزش یک طبقه بند بر روی یک مجموعه داده آموزشی از نظرات دارای برچسب احساس، یک مدل پیش بینی کننده می سازند که قادر است، جملات جدید را طبقه بندی کند. در زبان فارسی، نبود داده های آموزشی کافی و دقت کم ابزارهای پردازش زبان طبیعی، به کارگیری الگوریتم های باناظر و نیز استخراج ویژگی های باکیفیت را با چالش جدی روبه رو ساخته است. هدف مقاله حاضر به کارگیری روش های یادگیری ماشین باناظر برای طبقه بندی نظرات مطرح شده توسط کاربران فارسی زبان در رسانه های اجتماعی درباره بازگشایی مدارس در دوران همه گیری کووید-19 است. برای غلبه بر مشکل کمبود داده های آموزشی یک روش ترکیبی برای داده افزایی پیشنهاد شده است که اندازه مجموعه آموزش را حدود 97درصد افزایش می دهد. نتایج آزمایش های انجام گرفته نشان می دهد که با اعمال روش پیشنهادی برای داده افزایی و به کارگیری ویژگی های انتخابی در این مقاله، به ترتیب دقت 81 و 79درصد برای طبقه بندی نظرات با استفاده از الگوریتم های ماشین بردار پشتیبان و شبکه عصبی پیچشی حاصل می شود.
کلید واژگان: تحلیل احساسات، نظرکاوی، یادگیری باناظر، یادگیری عمیق، داده افزایی، کووید-19Sentiment analysis, also called opinion mining, is one of the sub-areas of natural language processing that aims to classify texts according to the sentiments, beliefs and attitudes expressed in them. In the most current research, texts are divided into two "positive" and "negative" categories. However, there are also other categories such as good/bad" and agree/disagree, every one of which has its applications. The purpose of this paper is to analyze the opinions expressed by users on social media about the reopening of schools during the Covid-19 outbreak using supervised machine learning techniques, and to classify them into two "agree" and "disagree" categories. Users' opinions, in this paper, are in Persian. The lack of sufficient datasets and also the low accuracy of natural language processing tools are the most important problems of text processing in Persian. Due to the mentioned limitations, the use of supervised machine learning algorithms and also the extraction of effective features for training machine learning classifiers in Persian are facing a serious challenge. In this paper, first, a small dataset of the users' opinions about the reopening of schools was collected and manually labeled. Then, a combined method was used for data augmentation of the dataset. In the proposed method, first, Persian sentences were translated into English. Then nouns, verbs and adjectives of the English sentences were replaced with their synonyms. Next, the English sentences were translated into Persian again. The new sentence with the class label of the initial sentence was added to the training set. Thus, the size of the training set increased by 97 percent. After that, the efficiency of employing the common pre-processing steps and using common feature sets in sentiment analysis of the English texts for Persian were evaluated and the best of them were selected. Considering the low accuracy of the Persian natural language processing tools, it was tried to select those features that were less dependent on the tools. Finally, machine learning classification was used to determine agree/disagree class of the user opinions of the test sets. The results of the experiments indicated that by applying the proposed method for data augmentation and using selected features in this paper, 81 and 79 percent precision was obtained for the polarity classification of opinions using SVM and CNN algorithms, respectively.
Keywords: Sentiment Analysis, Opinion Mining, Supervised Learning, Deep Learning, Data Augmentation, Covid-19 -
In the era of deep learning, transformer-based models have revolutionized natural language processing tasks, offering unparalleled performance in capturing contextual relationships. This paper delves into the realm of sentiment analysis in Persian Twitter, employing state-of-the-art transformer architectures. Through rigorous experimentation on a dedicated Persian sentiment dataset, we explore the capabilities of transformers in deciphering nuanced emotions expressed in tweets. The results demonstrate the potency of these models, highlighting their effectiveness in understanding the intricacies of sentiment within the Persian language. This study not only contributes insights into sentiment analysis but also underscores the transformative impact of transformer architectures in unlocking the expressive dynamics of Persian social media discourse. We trained multiple deep learning architectures based on transformers for sentiment analysis on Persian Twitter data, and in the test section, we achieved a 60.37% F-score.
Keywords: Sentiment Analysis, Persian Language, Deep Learning, Transformers, Social Media Sentiment Analysis -
Sarcasm is a form of speech in which a person expresses his opinion implicitly. We may encounter a seemingly positive sentence in sarcasm, but the speaker has a contrary opinion. Sarcasm can be recognized in spoken language based on body language and the tone of voice. However, the lack of these features makes it difficult to recognize sarcasm in text. In recent years, Twitter has attracted much attention and has become a popular platform for sharing opinions and viewpoints. It is also common for people to use sarcasm on Twitter as an indirect means of expressing their opinions. The presence of sarcasm in the text makes it difficult to recognize the sentiment. Thus, it is necessary and inevitable to have solutions that can detect sarcasm. This study aims to provide a solution for detecting sarcasm on Twitter using deep learning approaches. This study used two Twitter datasets containing balance and imbalance data for modeling. The main idea of this research is to use additional features such as sentimental features, subjectivity, number of hashtags, and punctuation along with features that deep learning algorithms automatically extract. The impact of each feature is reported in the paper. In this research, GRU-Capsule based neural network has been used. According to the results, the proposed model has improved accuracy by 5% for balanced data and by 2% for imbalanced data.
Keywords: Sarcasm Detection, Deep Learning, Sentiment Analysis -
The ParsiAzma[1] challenges in 2023 focused on Improving Persian text analysis in social media. We designed four shared tasks: stance detection, sentiment analysis, emotion detection, and claim detection in social media posts. The goal of these challenges was to bring together various teams to develop the best models for these challenges and to establish a standard test platform for future Persian language research. A total of 28 teams participated, competing to solve the specified tasks. The most effective models in all shared tasks utilized the BERT model. Text embedding was first obtained using a BERT[2]-based model, followed by final predictions with either an MLP[3] or CNN[4]. Additionally, several meta-classifiers were developed as fusion models to leverage the strengths of individual models. The best results based on accuracy criteria for the four challenges—stance detection, sentiment analysis, emotion recognition, and claim detection—were 0.67, 0.67, 0.45, and 0.56, respectively. These results indicate that emotion detection has lower accuracy than the other three tasks, highlighting its complexity.
Keywords: Stance Detection, Claim Detection, Sentiment Analysis, Emotion Detection, Social Media, Competition -
BERT-based models have gained popularity for addressing various NLP tasks, yet the optimal utilization of knowledge embedded in distinct layers of BERT remains an open question. In this paper, we introduce and compare diverse architectures that integrate the hidden layers of BERT for text classification tasks, with a specific focus on Persian social media. We conduct sentiment analysis and stance detection on Persian tweet datasets. This work represents the first investigation into the impact of various neural network architectures on combinations of BERT hidden layers for Persian text classification. The experimental results demonstrate that our proposed approaches can outperform the vanilla BERT that utilizes an MLP classifier on top of the corresponding output of the CLS token in terms of performance and generalization.
Keywords: BERT, Persian Text Classification, Social Media, Sentiment Analysis, Stance Detection, CNN, LSTM -
بسیاری از شبکه های اجتماعی و سایت ها به مردم اجازه می دهند تا احساسات و نظرات خود را در مورد محصولات و خدمات مختلف به اشتراک بگذارند. در این مقاله روشی جدید مبتنی بر قطبیت نظرات مثبت و منفی فارسی درباره محصولات تلفن همراه از سایت دیجی کالا و داده های سنتی پرس ارائه شده است. نتیجه اجرا با الگوریتم های بیز ساده، ماشین بردار پشتیبان، کاهش گرادیان تصادفی، رگرسیون لجستیک، جنگل تصادفی و یادگیری عمیق مانند شبکه عصبی کانولوشن و حافظه کوتاه مدت متوالی بر اساس پارامترهایی مانند صحت، بازیابی، معیار فیشر و دقت، موردتوجه قرار گرفته شده است. روش پیشنهادی روی داده های دیجی کالا، با الگوریتم های بیز ساده بین 10 تا 34 درصد و ماشین بردار پشتیبان بین 5 تا 24 درصد و کاهش گرادیان تصادفی بین 7 تا 38 درصد و رگرسیون لجستیک بین 5 تا 38 درصد و جنگل تصادفی بین 4 تا 22 درصد و روش شبکه عصبی کانولوشن به میزان 4 درصد افزایش دقت را به همراه داشته است. هم چنین در داده های سنتی پرس با الگوریتم های بیز ساده بین 12 تا 46 درصد و ماشین بردار پشتیبان بین 5 تا 46 درصد و کاهش گرادیان تصادفی بین 5 تا 35 درصد و رگرسیون لجستیک بین 6 تا 46 درصد و جنگل تصادفی بین 4 تا 46 درصد دقت نسبت به قبل از اعمال روش پیشنهادی به دست آمده است.
کلید واژگان: تحلیل احساسات، نظرکاوی، یادگیری ماشین، یادگیری عمیق، قطبیتIn recent years, the massive growth of generated content by users in social networks and online marketing sites, allows people to share their feelings and opinions on a variety of opinions about different products and services. Sentiment analysis is an important factor for better decision-making that is done using natural language processing (NLP), computational methods, and text analysis to extract the polarity of unstructured documents. The complexity of human languages and sentiment analysis have created a challenging research context in computer science and computational linguistics. Many researchers used supervised machine learning algorithms such as Naïve Bayes (NB), Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), Logistic Regression (LR) Random Forest (RF), and deep learning algorithms such as Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM). Some researchers have used Dictionary-based methods. Despite the existence of effective techniques in text mining, there are still unresolved challenges. Note that user comments are unstructured texts; Therefore, in order to structure the textual inputs, parsing is usually done along with adding some features, linguistic interpretations and removing additional items, and inserting the next terms in the database, then extracting the patterns in the structured data and finally the outputs will evaluate and interpret. The imbalance of data with the difference in the number of samples in each class of a dataset is an important challenge in the learning phase. This phenomenon breaks the performance of the classifications because the machine does not learn the features of the unpopulated classes well. In this paper, words are weighted based on the prescribed dictionary to influence the most important words on the result of the opinion mining by giving higher weight. On the other hand, the combination of the adjacent words using n-gram methods will improve the outcome. The dictionaries are highly related to the domain of the application. Some words in an application are important but in mobile comments are not impressive. Another challenge is the unbalanced train data, in which the number of positive sentences is not equal to the number of negative sentences. In this paper, two ideas are applied to build an efficient opinion mining algorithm. First, we build a precise dictionary for mobile Persian comments, and the second idea is to balance the positive and negative comments in train data. In summary, the main achievements of the current research can be mentioned: creating a weighted comprehensive dictionary in the field of mobile phone opinions to increase the accuracy of opinion analysis, balancing positive and negative opinions to improve the accuracy of opinion analysis, and eliminating the negative effect of overfitting and providing a precise approach to Determining the polarity of users' opinions about mobile phones using machine learning and recurrent deep learning algorithms. This new method is presented on mobile phone products from the Digikala site and Senti-Pers data. The result is performed with Naive Bayesian, Support Vector Machine, Stochastic Gradient Descent, Logistic Regression, Random Forest, and deep learning methods such as Convolutional Neural Network and Long Short-Term Memory based on parameters such as Accuracy, Precision, Retrieval, and F-Measure. The proposed method increases accuracy on Digikala, with NB between 10% and 34% and SVM between 5% and 24%, SGD between 7% and 38%, LR between 5% to 38%, and RF between 4% Up to 22% and CNN by 4%. The results show an accuracy increment on Senti-Pers, with NB between 12% and 46% and SVM between 5% and 46%, SGD between 5% and 35%, LR between 6% to 46%, and RF between 4% Up to 46%.
Keywords: Sentiment Analysis, Opinion Mining, Machine Learning, Deep Learning, Polarity -
Text analysis has been one of the issues in recent research to identify users' sentiments. Most studies have identified sentiments' positive and negative polarity in Persian, and limited research has been done on analyzing emotions in Persian sentences by covering the primary emotional states. In this study, first, a dataset of emotional sentences was prepared to label six basic emotional states, JAMFA. This dataset contains 2350 sentences and (31222 words). This paper presents two models, efficient BERT-BiLSTM(EBB) and XLM-R Catboost(XLM-RC), that enhance the performance of the Persian text emotion classification. This study has the advantages of human intelligence methods and statistical approaches to achieve better accuracy in sentence labeling. The evaluation indicates the accuracy of labeling is 92%, and the reliability of the dataset based on the type of emotions is 88%. The results show that the models at best achieved 86\% accuracy in basic emotion classification and an 81% F-score in binary classification.Keywords: Sentiment Analysis, Annotated Corpora, Basic Emotions, Deep Learning, Emotion Detection
-
In the era of pervasive internet use and the dominance of social networks, researchers face significant challenges in Persian text mining, including the scarcity of adequate datasets in Persian and the inefficiency of existing language models. This paper specifically tackles these challenges, aiming to amplify the efficiency of language models tailored to the Persian language. Focusing on enhancing the effectiveness of sentiment analysis, our approach employs an aspect-based methodology utilizing the ParsBERT model, augmented with a relevant lexicon. The study centers on sentiment analysis of user opinions extracted from the Persian website 'Digikala.' The experimental results not only highlight the proposed method's superior semantic capabilities but also showcase its efficiency gains with an accuracy of 88.2% and an F1 score of 61.7. The importance of enhancing language models in this context lies in their pivotal role in extracting nuanced sentiments from user-generated content, ultimately advancing the field of sentiment analysis in Persian text mining by increasing efficiency and accuracy.
Keywords: Opinion Mining, Sentiment Analysis, aspect-based sentiment analysis, lexical semantic disambiguation, WordNet -
Journal of Electrical and Computer Engineering Innovations, Volume:12 Issue: 1, Winter-Spring 2024, PP 57 -68Background and ObjectivesTwitter is a microblogging platform for expressing assessments, opinions, and sentiments on different topics and events. While there have been several studies around sentiment analysis of tweets and their popularity in the form of the number of retweets, predicting the sentiment of first-order replies remained a neglected challenge. Predicting the sentiment of tweet replies is helpful for both users and enterprises. In this study, we define a novel problem; given just a tweet's text, the goal is to predict the overall sentiment polarity of its upcoming replies.MethodsTo address this problem, we proposed a graph convolutional neural network model that exploits the text's dependencies. The proposed model contains two parallel branches. The first branch extracts the contextual representation of the input tweets. The second branch extracts the structural and semantic information from tweets. Specifically, a Bi-LSTM network and a self-attention layer are used in the first layer for extracting syntactical relations, and an affective knowledge-enhanced dependency tree is used in the second branch for extracting semantic relations. Moreover, a graph convolutional network is used on the top of these branches to learn the joint feature representation. Finally, a retrieval-based attention mechanism is used on the output of the graph convolutional network for learning essential features from the final affective picture of tweets.ResultsIn the experiments, we only used the original tweets of the RETWEET dataset for training the models and ignored the replies of the tweets in the training process. The results on three versions of the RETWEET dataset showed that the proposed model outperforms the LSTM-based models and similar state-of-the-art graph convolutional network models.ConclusionThe proposed model showed promising results in confirming that by using only the content of a tweet, we can predict the overall sentiment of its replies. Moreover, the results showed that the proposed model achieves similar or comparable results with simpler deep models when trained on a public tweet dataset such as ACL 2014 dataset while outperforming both simple deep models and state-of-the-art graph convolutional deep models when trained on the RETWEET dataset. This shows the proposed model's effectiveness in extracting structural and semantic relations in the tweets.Keywords: Sentiment Analysis, Deep Leaning, Social media, Twitter, Graph Convolutional Neural Networks
-
با رشد چشمگیر رسانه های اجتماعی، افراد و سازمانها به طور فزایندهای از افکار عمومی در این رسانه ها برای تصمیم گیری خود استفاده می کنند. هدف تحلیل احساسات، استخراج خودکار احساسات افراد از این شبکه های اجتماعی می باشد. شبکه های اجتماعی مرتبط به بازارهای مالی، از جمله بازارهای سهام، اخیرا مورد توجه بسیاری از افراد و سازمان ها قرار گرفته است. افراد در این شبکه ها نظرات و عقاید خود را در مورد هر سهم در قالب یک پست یا توییت، به اشتراک می گذارند. در واقع تحلیل احساسات در این حوزه، سنجش نگرش افراد به هر سهم است. یکی از رویکردهای پایه ای و اصلی در تحلیل خودکار احساسات روش های مبتنی بر واژگان است. اغلب واژگان های مرسوم به صورت دستی استخراج شده اند که فرایندی بسیار دشوار و هزینه بر است. در این مقاله روشی جدید جهت استخراج یک واژگان به صورت خودکار در حوزه شبکه های اجتماعی بورسی ارایه شده است. یک ویژگی خاص این شبکه ها، وجود اطلاعات قیمتی هر سهم در هر روز است. با در نظر گرفتن وضعیت قیمتی سهم در روز درج نظر برای آن سهم، واژگانی را برای بهبود کیفیت عقیده کاوی در این شبکه ها استخراج نمودیم. برای ارزیابی واژگان های تولید شده با استفاده از روش پیشنهادی، آن را با نسخه فارسی واژگان SentiStrength که با هدف استفاده عمومی طراحی شده است، مقایسه نمودیم. نتایج آزمایشات 20 درصد بهبود را در معیار صحت نسبت به استفاده از واژگان عمومی نشان می دهد.
کلید واژگان: تحلیل احساسات، عقیده کاوی، ساخت واژگان، واژگان فارسیWith the significant growth of social media, individuals and organizations are increasingly using public opinion in these media to make their own decisions. The purpose of Sentiment Analysis is to automatically extract peoplechr('39')s emotions from these social networks. Social networks related to financial markets, including stock markets, have recently attracted the attention of many individuals and organizations. People on these social networks share their opinions and ideas about each share in the form of a post or tweet. In fact, sentiment analysis in this area is measuring peoplechr('39')s attitudes toward each share. One of the basic approaches in automatic analysis of emotions is lexicon-based methods. Most conventional lexicon is manually extracted, which is a very difficult and costly process. In this article, a new method for extracting a lexicon automatically in the field of stock social networks is proposed. A special feature of these networks is the availability of price information per share. Taking into account the price information of the share on the day of tweeting for that share, we extracted lexicon to improve the quality of opinion mining in these social networks. To evaluate the lexicon produced using the proposed method, we compared it with the Persian version of the SentiStrength lexicon, which is designed for general purpose. Experimental results show a 20% improvement in accuracy compared to the use of general lexicon.
Keywords: Sentiment Analysis, Opinion Mining, Lexicon Creation, Persian Lexicon -
Humor is a linguistic device that can make people laugh, and in the case of expressing opinions, it can transform a phrase's polarity. Humorous sentences presenting ideas and criticism, occasionally using informal forms, have made their way to social media platforms like Twitter in almost every domain. Persian speakers likewise express their opinions through humorous tweets on Twitter. As one of the early efforts for detecting humor in Persian, the current research proposes a model by fine-tuning a transformer-based language model on a Persian humor detection dataset. The proposed model has an accuracy of 84.7% on the test set. Moreover, This research introduced a dataset of 14,946 automatically-labeled tweets for humor detection in Persian.
Keywords: Humor Detection, Sentiment Analysis, Natural Language Processing, Deep learning, Persian language -
در بحران کرونا با طیف وسیعی از افکار، احساسات و نگرش ها در شبکه های اجتماعی مواجه ایم. دستیابی به درک جامعی از نگرش های جامعه نیازمند پردازش این داده هاست. هدف این پژوهش شناسایی ویژگی پیام هایی است که منجر به قطبیت های احساسی مختلف در شبکه های اجتماعی می شوند. در این پژوهش از پست های فارسی توییتر، اینستاگرام، تلگرام و کانال های خبری و تکنیک های پردازش زبان طبیعی استفاده شده است. در روش پیشنهادی این پژوهش، خوشه بندی دو مرحله ای مبتنی بر شبکه عصبی خود سازمانده و K-میانگین استفاده شده است. نتایج نشان دادند پست های حوزه سلامت و فرهنگ با قطبیت منفی، به احساساتی مانند ترس، تنفر، غم و خشم منجر شده است. پیام های مربوط به عملکرد هیجانی و نادرست مردم با احساس غم، ترس و استرس همراه است و امید در جامعه را کاهش داده است.
کلید واژگان: کرونا، شبکه های اجتماعی، تحلیل احساسات، خوشه بندیIn the Corona crisis, we face a wide range of thoughts, feelings, attitudes, and behaviors on social media. This data contains valuable information for responding to the crisis by the people and administrators. The goal of this study is to identify the characteristics of messages that lead to different emotional polarities. This study aims to investigate the information posted by Twitter, Instagram, and Telegram users and news related to the COVID-19 pandemic in Iran. The data extracted from social networks are focused on the period of January 21, to April 29, 2020, which were shared in Iran and in Persian. It should be noted that the data set and their labels were published by the Cognitive Sciences and Technologies Council (CSTC) in Iran. In this work, the content of each post was pre-processed. Pre-processing was performed by removing stop words, normalizing the words, tokenizing, and stemming. The emotion labels were based on plutchik’s model and included joy, trust, fear, surprise, sadness, anticipation, anger, disgust, stress, and other emotions. In this study, clustering algorithms were used to analyze social media posts. We applied a two-stage clustering method. The proposed clustering algorithm was a combination of self-organized neural network and K-means algorithms. According to our proposed algorithm, the data were clustered through SOM at first, the results of which provided the initial cluster centers for the K-means algorithm. Implementations were built in Python version 3.7 and MATLAB R2015a. Hazm Tools was used for pre-processing data, and clustering was done in MATLAB. The Davies-Bouldin clustering evaluation was applied to find the optimal number of clusters. This measure was calculated for the number of clusters in the range of 2-50 in the two-stage clustering method. The results showed that the optimal number of clusters was ten. Analysis of the results showed that posts related to health and culture with negative polarity led to negative emotions such as fear, hatred, sadness, and anger. Messages about people's emotional and improper functioning have led to feelings of sadness, fear, and stress, and reduced hope in society. The results revealed a strong correlation between anger and disgust. Also, a positive correlation between fear, stress, and sadness was observed. In order to reduce the negative feelings and to create a sense of trust in the authorities, we suggest clarifying about the corona pandemic
Keywords: : COVID-19, Social media, Sentiment analysis, Clustering -
امروزه اینترنت و به خصوص شبکه های اجتماعی مانند توییتر، فیس بوک و تلگرام به بستری برای تبادل ایده ها و به اشتراک گذاری نظرات کاربران تبدیل شده است. تجزیه و تحلیل احساسات بر اساس نظرات کاربران در این شبکه ها می تواند کمک شایانی در توضیح و پیش بینی پدیده های اجتماعی و همچنین یافتن محصولات یا خدمات مناسب برای افراد، شرکت ها و سازمان ها نماید. تاکنون پژوهش های زیادی بر روی داده های شبکه های اجتماعی به زبان انگلیسی انجام شده است؛ اما برای زبان فارسی پژوهش های محدودی انجام شده است. در این مقاله یک سیستم تجزیه و تحلیل احساسات بر روی داده های تلگرام فارسی پیشنهاد شده است. برای این منظور، چند روش استخراج ویژگی شامل بردار رخداد، فراوانی اصطلاح-معکوس فراوانی سند و ماتریس تعبیه کلمات جهت بازنمایی داده های متنی به عددی بررسی شده است. سپس جهت طبقه بندی داده ها روش های مختلف یادگیری ماشین کلاسیک شامل ماشین بردار پشتیبان، درخت تصمیم، K نزدیک ترین همسایه، بیز ساده و رگرسیون منطقی، تلفیق روش های کلاسیک و همچنین روش های یادگیری عمیق شامل شبکه عصبی عمیق، شبکه عصبی پیچشی و شبکه های حافظه طولانی کوتاه مدت یک طرفه و دوطرفه بررسی شده است. در نهایت ارزیابی و تحلیل نتایج بر روی داده های جمع آوری شده از تلگرام فارسی نشان می دهد که بهترین کارایی توسط روش استخراج ویژگی ماتریس تعبیه کلمات به همراه شبکه حافظه طولانی کوتاه مدت دوطرفه با دقت 67/90، صحت 01/90، فراخوان 54/89 و معیار F، 77/89 درصد به دست آمده است.
کلید واژگان: تجزیه و تحلیل احساسات، پیام های تلگرام، یادگیری ماشین، یادگیری عمیق، ماشین بردار پشتیبانToday, the Internet, especially social networks such as Twitter, Facebook, and Telegram, has become a platform for exchanging ideas and sharing user opinions. Sentiment analysis based on user opinions in these networks can help explain and predict social phenomena and find suitable products or services for individuals, companies, and organizations. So far, a lot of research has been done on social media data in English; But limited research has been done for the Persian language. In this paper, a Sentiment analysis system on Persian Telegram data is proposed. For this purpose, several feature extraction methods including Countvectorizer, TF-IDF, and word embedding matrix have been studied to represent textual data numerically. Then, to classify the data, different classical machine learning methods including support vector machine, decision tree, K-nearest neighbor, Naïve Bayes, and logistic regression, the combination of classical methods as well as deep learning methods including deep neural network (DNN), convolutional neural network (CNN), long short-term memory network and bidirectional long short-term memory network has been investigated. Finally, the evaluation and analysis of the results on the data collected from Persian Telegram shows that the best performance has been obtained by word embedding and bidirectional long short-term memory network with an accuracy of 90.67%, precision of 90.01%, recall of 89.54% and F1 of 89.77%.
Keywords: Sentiment analysis, Telegram Message, Machine Learning, Deep learning, SVM -
امروزه به دلیل وجود حجم انبوه نظرات منتشرشده توسط افراد در فضای مجازی، تحلیل احساسات نقش اساسی را در استخراج اطلاعات بازی می-کند. یکی از تکنیکهای نوین براساس مطالعات انجام شده به منظور تعیین دقیق تر قطبیت جمله در تحلیل احساسات مبتنی بر الگوریتم های یادگیری عمیق است. در این تحقیق به منظور تعیین قطبیت نظرات متنی از الگوریتم یادگیری عمیق LSTM و RNN استفاده شده است تا با بررسی و مقایسه این دو الگوریتم بتوان الگوریتم مناسب برای تحلیل احساسات را انتخاب نمود. همچنین در روش پیشنهادی برای تعیین روابط معنایی بین کلمات از روش تعبیه گذاری کلمات از پیش آموزش داده شده ی Wordtovec استفاده شد تا دقت روش پیشنهادی افزایش یابد. روش پیشنهادی بر روی دو مجموعه داده airline-tweet و IMDB ارزیابی شد. نتایج ارزیابی نشان می دهد که روش پیشنهادی بر روی مجموعه داده airline-tweet در صورت استفاده از تعبیه گذاری Wordtovec دقت 78/0 دارد. همچنین روش پیشنهادی بر روی مجموعه داده IMDB در صورت استفاده از تعبیه گذاری Wordtovec دقت 84/0 دارد.
کلید واژگان: تجزیه و تحلیل احساسات، یادگیری عمیق، RNN، LSTM، تعبیه گذاری کلماتToday, due to the large volume of opinions published by people in cyberspace, sentiment analysis plays a key role in extracting information. One of the new techniques based on studies has been done to determine the exact polarity of the sentence in sentiment analysis is deep learning algorithms. In this research, two deep learning algorithms, namely RNN and LSTM, has been used to determine sentence polarity in order to achieve more accurate results. Moreover, in the proposed technique, pre-trained word embedding algorithm, namely Wordtovec, was used to determine the semantic relationships between words to increase the accuracy of the proposed method. The proposed method was evaluated on two data sets; airline-tweet and IMDB. The evaluation results show that on the airline-tweet dataset, the proposed method has an accuracy of 0.78 and accuracy of 0.84 on the IMDB data set.
Keywords: Sentiment Analysis, Deep learning, RNN, LSTM, Word Embedding, Word2vec -
با رشد چشم گیر رسانه های اجتماعی مانند توییتر و افزایش نظرات کاربران در تارنماهای تجارت الکترونیکی و تارنماهای خبری، افراد و سازمان ها به طور فزاینده ای از نظرات در این رسانه ها برای تصمیم گیری خود استفاده می کنند. تحلیل احساس یکی از روش های تحلیل نظرات کاربران است که در سال های اخیر مورد توجه قرار گرفته است. تحلیل احساس روی هر زبانی نیازمندی های مختص به خود را دارد و به کارگیری روش ها، ابزارها و منابع زبان انگلیسی به طور مستقیم در زبان فارسی با محدودیت هایی روبه رو است. متون نوشته شده به زبان فارسی ویژگی های خاصی دارند که نیازمند روش های خاص تحلیل احساس هستند که متفاوت از زبان انگلیسی است. در این مقاله، پژوهش های تحلیل احساس که روی متون به زبان فارسی انجام شده است، مورد بررسی و مقایسه قرار می گیرد. ابتدا رویکردهای تحلیل احساس، وظایف و سطوح تحلیل احساس تشریح می شود. در ادامه تلاش می شود که مروری روی روش های به کارگرفته شده برای وظایف تحلیل احساس متون فارسی انجام شود و جایگاه کارهای انجام شده در زبان فارسی روشن شود. همچنین منابع داده ای ایجاد و منتشر شده برای تحلیل احساس متون فارسی معرفی شده است. در نهایت با توجه به مطالعات انجام گرفته روی آخرین پیشرفت های تحلیل احساس، مسایل و چالش هایی که در زبان فارسی به آن پرداخته نشده را برشمرده و نقشه راهی برای پژوهش های آینده پردازش متون فارسی ارایه می شود.
کلید واژگان: تحلیل احساس، نظرکاوی، طبقه بندی قطبیت، مجموعه داده های تحلیل احساس، زبان فارسیWith the explosive growth of social media such as Twitter and Instagram, reviews on e-commerce websites, and comments on news websites, individuals and organizations are increasingly using analyzing opinions in these media for their decision-making and designing strategies. Sentiment analysis is one of the techniques used to analyze users' opinions in recent years. The Persian language has specific features and thereby requires unique methods and models to be adopted for sentiment analysis, which are different from those in English and other languages. This paper identifies the characteristics and limitations of the Persian language. Sentiment analysis in each language has specified prerequisites; hence, the direct use of methods, tools, and resources developed for the English language in Persian has its limitations. The present study aims to investigate and compare previous sentiment analysis studies on Persian texts and describe views presented in articles published in the last decade. First, the sentiment analysis levels, approaches, and tasks are described. Then, a detailed survey of the applied sentiment analysis methods used for Persian texts is presented, and previous works in this field are discussed. The advantages and disadvantages of each proposed method are demonstrated. Moreover, the publicly available sentiment analysis resources of Persian texts are studied, and the characteristics and differences of each are highlighted. As a result, according to the recent development of the sentiment analysis field, some issues and challenges not being addressed in Persian texts are listed, and some guidelines are provided for future research on Persian texts. Future requirements of Persian text for improving the sentiment analysis system are detailed.
Keywords: Sentiment Analysis, Opinion Mining, Sentiment Classification, Sentiment Data Resource, Persian Language -
یکی از مهم ترین داده های متنی موجود در سطح وب احساسات و دید گاه های افراد نسبت به یک موضوع یا مفهوم مشخص است. با این حال، یافتن و نظارت بر وبگاه های حاوی این احساسات و استخراج اطلاعات موردنیاز از آن ها به علت گسترش وبگاه های گوناگون کاری دشوار محسوب می شود. در این راستا، توسعه سامانه های تجزیه و تحلیل خودکار احساسات که بتواند نظرات را استخراج کرده و روند فکری مرتبط با آن ها را بیان کند، در سال های اخیر توجه زیادی را به خود جلب کرده است و روش های بر پایه یادگیری ژرف، یکی از راه کارهایی هستند که توانسته ا ند به نتایج چشم گیری در کاربردهای مختلف پردازش زبان های طبیعی به خصوص تجزیه و تحلیل احساسات دست یابند؛ اما این روش ها برخلاف عملکرد قابل توجه هنوز با چالش هایی مواجه هستند و نیاز به پیشرفت در این حوزه همچنان وجود دارد؛ ازاین رو، هدف این مقاله ترکیب مدل های یادگیری ژرف به منظور ارایه یک روش جدید برای تجزیه و تحلیل احساسات متنی است که بتواند ضمن استفاده هم زمان از مزایای شبکه های عصبی ژرف بر مشکلات آن ها چیره شود. در این راستا، در این مقاله روشی بر پایه ترکیب شبکه عصبی پیچشی و شبکه عصبی هم گشتی معرفی شده است که در آن به منظور حفظ وابستگی های بلندمدت در جملات و کاهش از دست رفتن داده های محلی که به عنوان چالش های شبکه عصبی پیچشی به شمار می آیند، از لایه هم گشتی تعمیم یافته که در آن از یک ویژگی میانی حاصل از ترکیب گره های فرزندان استفاده می شود، به عنوان جایگزین لایه ادغام در شبکه عصبی پیچشی بر پایه ساز و کار توجه استفاده شده است. بر اساس نتایج آزمایش ها، روش پیشنهادی به ترتیب با دقت 92/53 و 89/92 درصد روی مجموعه داده های SST1 و SST2 و دارای دقت بالاتری نسبت به سایر روش های موجود است.
کلید واژگان: تجزیه و تحلیل احساسات، یادگیری ژرف، شبکه عصبی پیچشی، شبکه عصبی هم گشتی، ساز و کار توجهPeople's opinions about a specific concept are considered as one of the most important textual data that are available on the web. However, finding and monitoring web pages containing these comments and extracting valuable information from them is very difficult. In this regard, developing automatic sentiment analysis systems that can extract opinions and express their intellectual process has attracted considerable attention in recent years. Sentiment analysis is considered as one of the most active research areas in the field of natural language processing which tries to classify a piece of text containing opinions based on its polarity and determine whether an expressed opinion about a specific topic, event or product is positive or negative. Since about a decade ago, many studies have been carried out to investigate the effects of traditional classification models, such as Support Vector Machine (SVM), Naïve Bayes, Logistic Regression, etc. in the task of sentiment analysis. Although machine learning models have achieved great success in this filed, they are still confronted with some limitations, notably manual feature engineering requirements. In other words, the classification performance of machine learning models is highly dependent on the extracted features and they play an important role in obtaining higher classification accuracy. To deal with these problems, deep learning models have been extensively employed as an alternative to traditional machine learning models and have achieved impressive results. It is worth mentioning that despite the remarkable performance of these methods, they are still confronted with some limitations and they are on their first steps of progress. Therefore, the goal of this paper is to propose a combinational deep learning model that can overcome their problems as well as utilizing their benefits. In this regard, an efficient method based on combination of convolutional and recursive neural networks is proposed in this paper that employs a generalized recursive neural network, where an intermediate feature is obtained by combining children's nodes, as an alternative of pooling layer in attention-based convolutional neural network with the aim of capturing long term dependencies and decreasing the loss of local information. Based on empirical results, the proposed method with the accuracy of 53.92% and 92.89% respectively on SST1 and SST2 datasets not only outperforms other existing models but also can be trained much faster.
Keywords: Sentiment analysis, Deep Leaning, Convolutional neural network, Recursive neural network, Attention mechanism
- نتایج بر اساس تاریخ انتشار مرتب شدهاند.
- کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شدهاست. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
- در صورتی که میخواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.