-
در سال های اخیر، کاربرد اینترنت اشیا در جوامع به طور گسترده ای رشد یافته و از طرفی، فناوری جدیدی به نام شبکه های نرم افزارمحور جهت حل چالش های اینترنت اشیا پیشنهاد شده است. چالش های موجود در این شبکه های نرم افزارمحور و اینترنت اشیا موجب گردیده که امنیت SDN-IoT به یکی از نگرانی های مهم این شبکه ها تبدیل شود. از طرف دیگر، الگوریتم های هوشمند فرصتی بوده که به کارگیری آنها در موارد متعددی از جمله امنیت و تشخیص نفوذ، موجب پیشرفت چشم گیری شده است. البته سیستم های تشخیص نفوذ جهت محیط SDN-IoT، همچنان با چالش نرخ هشدار غلط بالا مواجه هستند. در این مقاله یک روش ترکیبی جدید مبتنی بر الگوریتم های هوشمند پیشنهاد شده که جهت دسترسی به نتایج خوبی در زمینه تشخیص نفوذ، الگوریتم های نظارتی دروازه بازگشتی مکرر و طبقه بند غیرنظارتی -k میانگین را ادغام می کند. نتایج شبیه سازی نشان می دهند که روش پیشنهادی با بهره گیری مزایای هر کدام از الگوریتم های ادغام شده و پوشش معایب یکدیگر، نسبت به روش های دیگر مانند روش Hamza دارای دقت بیشتری و بالاخص نرخ هشدار غلط کمتری است. همچنین روش پیشنهادی توانسته نرخ هشدار غلط را به 1/1% کاهش داده و دقت را در حدود 99% حفظ کند.
کلید واژگان: شبکه های نرم افزارمحور، الگوریتم های هوشمند، اینترنت اشیا، تشخیص نفوذ، یادگیری ماشینIn recent years, the use of Internet of Things in societies has grown widely. On the other hand, a new technology called Software Defined Networks has been proposed to solve the challenges of the Internet of Things. The security problems in these Software Defined Networks and the Internet of Things have made SDN-IoT security one of the most important concerns. On the other hand, the use of intelligent algorithms has been an opportunity that these algorithms have been able to make significant progress in various cases such as image processing and disease diagnosis. Of course, intrusion detection systems for SDN-IoT environment still face the problem of high false alarm rate and low accuracy.In this article, a new hybrid method based on intelligent algorithms is proposed. The proposed method integrates the monitoring algorithms of frequent return gate and unsupervised k-means classifier in order to obtain suitable results in the field of intrusion detection. The simulation results show that the proposed method, by using the advantages of each of the integrated algorithms and covering each other's disadvantages, has more accuracy and a lower false alarm rate than other methods such as the Hamza method. Also, the proposed method has been able to reduce the false alarm rate to 1.1% and maintain the accuracy at around 99%.
Keywords: Neural Networks, Spam Detection, Twitter, Autoencoder, Softmax -
نشریه مهندسی برق و مهندسی کامپیوتر ایران، سال بیست و چهارم شماره 4 (پیاپی 83، زمستان 1402)، صص 284 -290
امروزه شبکه های اجتماعی، نقش مهمی در گسترش اطلاعات در سراسر جهان دارند. توییتر یکی از محبوب ترین شبکه های اجتماعی است که در هر روز 500 میلیون توییت در این شبکه ارسال می شود. محبوبیت این شبکه در میان کاربران منجر شده تا اسپمرها از این شبکه برای انتشار پست های هرزنامه استفاده کنند. در این مقاله برای شناسایی اسپم در سطح توییت از ترکیبی از روش های یادگیری ماشین استفاده شده است. روش پیشنهادی، چارچوبی مبتنی بر استخراج ویژگی است که در دو مرحله انجام می شود. در مرحله اول از Stacked Autoencoder برای استخراج ویژگی ها استفاده شده و در مرحله دوم، ویژگی های مستخرج از آخرین لایه Stacked Autoencoder به عنوان ورودی به لایه softmax داده می شوند تا این لایه پیش بینی را انجام دهد. روش پیشنهادی با برخی روش های مشهور روی پیکره متنی Twitter Spam Detection با معیارهای Accuracy، -Score1F، Precision و Recall مورد مقایسه و ارزیابی قرار گرفته است. نتایج تحقیق نشان می دهند که دقت کشف روش پیشنهادی به 1/78% می رسد. در مجموع، این روش با استفاده از رویکرد اکثریت آرا با انتخاب سخت در یادگیری ترکیبی، توییت های اسپم را با دقت بالاتری نسبت به روش های CNN، LSTM و SCCL تشخیص می دهد.
کلید واژگان: توییتر، شناسایی اسپم، شبکه عصبی، Autoencoder، SoftmaxToday, social networks play a crucial role in disseminating information worldwide. Twitter is one of the most popular social networks, with 500 million tweets sent on a daily basis. The popularity of this network among users has led spammers to exploit it for distributing spam posts. This paper employs a combination of machine learning methods to identify spam at the tweet level. The proposed method utilizes a feature extraction framework in two stages. In the first stage, Stacked Autoencoder is used for feature extraction, and in the second stage, the extracted features from the last layer of Stacked Autoencoder are fed into the softmax layer for prediction. The proposed method is compared and evaluated against some popular methods on the Twitter Spam Detection corpus using accuracy, precision, recall, and F1-score metrics. The research results indicate that the proposed method achieves a detection of 78.1%. Overall, the proposed method, using the majority voting approach with a hard selection in ensemble learning, outperforms CNN, LSTM, and SCCL methods in identifying spam tweets with higher accuracy.
Keywords: Neural networks, spam detection, Twitter, Autoencoder, softmax -
Journal of Electrical and Computer Engineering Innovations, Volume:12 Issue: 1, Winter-Spring 2024, PP 57 -68Background and ObjectivesTwitter is a microblogging platform for expressing assessments, opinions, and sentiments on different topics and events. While there have been several studies around sentiment analysis of tweets and their popularity in the form of the number of retweets, predicting the sentiment of first-order replies remained a neglected challenge. Predicting the sentiment of tweet replies is helpful for both users and enterprises. In this study, we define a novel problem; given just a tweet's text, the goal is to predict the overall sentiment polarity of its upcoming replies.MethodsTo address this problem, we proposed a graph convolutional neural network model that exploits the text's dependencies. The proposed model contains two parallel branches. The first branch extracts the contextual representation of the input tweets. The second branch extracts the structural and semantic information from tweets. Specifically, a Bi-LSTM network and a self-attention layer are used in the first layer for extracting syntactical relations, and an affective knowledge-enhanced dependency tree is used in the second branch for extracting semantic relations. Moreover, a graph convolutional network is used on the top of these branches to learn the joint feature representation. Finally, a retrieval-based attention mechanism is used on the output of the graph convolutional network for learning essential features from the final affective picture of tweets.ResultsIn the experiments, we only used the original tweets of the RETWEET dataset for training the models and ignored the replies of the tweets in the training process. The results on three versions of the RETWEET dataset showed that the proposed model outperforms the LSTM-based models and similar state-of-the-art graph convolutional network models.ConclusionThe proposed model showed promising results in confirming that by using only the content of a tweet, we can predict the overall sentiment of its replies. Moreover, the results showed that the proposed model achieves similar or comparable results with simpler deep models when trained on a public tweet dataset such as ACL 2014 dataset while outperforming both simple deep models and state-of-the-art graph convolutional deep models when trained on the RETWEET dataset. This shows the proposed model's effectiveness in extracting structural and semantic relations in the tweets.Keywords: Sentiment Analysis, Deep Leaning, Social media, Twitter, Graph Convolutional Neural Networks
-
The result of the research is a proposed model for text analysis and identifying the subject and content of texts on Twitter. In this model, two main phases are implemented for classification. In text mining problems and in text mining tasks in general, because the data used is unstructured text, there is a preprocessing phase to extract the feature from this unstructured data. Done. In the second phase of the proposed method, a multilayer neural network algorithm and random graphs are used to classify the texts. In fact, this algorithm is a method for classifying a text based on the training model. The results show a significant improvement. Comparing the proposed method with other methods, according to the results, we found that the proposed algorithm has a high percentage of improvement in accuracy and has a better performance than other methods. All the presented statistics and simulation output results of the proposed method are based on the implementation in MATLAB software.
Keywords: Text mining, subject, content recognition, multilayer neural network, random graphs, Twitter -
The spread of internet and smartphones in recent years has led to the popularity and easy accessibility of social networks among users. Despite the benefits of these networks, such as ease of interpersonal communication and providing a space for free expression of opinions, they also provide the opportunity for destructive activities such as spreading false information or using fake accounts for fraud intentions. Fake accounts are mainly managed by bots. So, identifying bots and suspending them could very much help to increase the popularity and favorability of social networks. In this paper, we try to identify Persian bots on Twitter. This seems to be a challenging task in view of the problems pertinent to processing colloquial Persian. To this end, a set of features based on user account information and activity of users added to content features of tweets to classify users by several machine learning algorithms like Random Forest, Logistic Regression and SVM. The results of experiments on a dataset of Persian-language users show the proper performance of the proposed methods. It turns out that, achieving a balanced-accuracy of 93.86%, Random Forest is the most accurate classifier among those mentioned above.
Keywords: social networks, Twitter, bot detection, classification, Persian language -
Stance detection aims to identify an author's stance towards a specific topic which has become a critical component in applications such as fake news detection, claim validation, author profiling, etc. However, while the stance is easily detected by humans, machine learning models are falling short of this task. In the English language, due to having large and appropriate e datasets, relatively good accuracy has been achieved in this field, but in the Persian language, due to the lack of data, we have not made significant progress in stance detection. So, in this paper, we present a stance detection dataset that contains 3813 labeled tweets. We provide a detailed description of the newly created dataset and develop deep learning models on it. Our best model achieves a macro-average F1-score of 58%. Moreover, our dataset can facilitate research in some fields in Persian such as cross-lingual stance detection, author profiling, etc.
Keywords: stance detection, fake news, social media, twitter, Persian dataset, author profiling -
توییتر یکی از محبوب ترین و مشهورترین شبکه های اجتماعی برخط برای گسترش اطلاعات است که در عین قابل اعتماد بودن، می تواند به عنوان منبعی برای گسترش شایعات باشد. شایعاتی غیرواقعی و فریبنده که می تواند تاثیرات جبران ناپذیری برروی افراد و جامعه به وجود بیاورد. در این پژوهش مجموعه کاملی از ویژگی های جدید ساختاری مربوط به درخت پاسخ و گراف کاربران در تشخیص مکالمه های شایعه توییتر استخراج شدند. این ویژگی ها با توجه به معیارهای سنتی گراف ها و معیارهای مخصوص انتشار شایعه، در بازه های زمانی مختلف به مدت 24 ساعت از زمان شروع مکالمه ها در خصوص رویدادهای بحرانی در توییتر استخراج شده اند. نتایج حاصل از بررسی ویژگی های جدید، دیدگاه عمیقی از ساختار انتشار اطلاعات در مکالمه ها را فراهم می کند. بر اساس نتایج به دست آمده، ویژگی های جدید ساختاری در تشخیص مکالمه های شایعه در رویدادهای توییتر موثر هستند؛ ازاین رو، الگوریتم دسته بند شایعه مبتنی بر ویژگی های جدید ساختاری، زبانی و کاربران در تشخیص مکالمه های شایعه زبان انگلیسی توییتر ، پیشنهاد داده شد. روش پیشنهادی در مقایسه با روش های پایه، عملکرد بهتری دارد. همچنین، با توجه به اهمیت کاربر توییت منبع در مکالمه ها، این کاربر از جنبه های مختلفی موردبررسی و آنالیز قرار گرفت.
کلید واژگان: مکالمه، تشخیص شایعه، توییتر، درخت پاسخ، گراف کاربرانToday, online social media with numerous users from ordinary citizens to top government officials, organizations, artists and celebrities, etc. is one of the most important platforms for sharing information and communication. These media provide users with quick and easy access to information so that the content of shared posts has the potential to reach millions of users in a matter of seconds. Twitter is one of the most popular and practical/used online social networks for spreading information, which, while being reliable, can also, be a source for spreading unrealistic and deceptive rumors as a result can have irreversible effects on individuals and society. Recently, several studies have been conducted in the field of rumor detection and verify using models based on deep learning and machine learning methods. Previous research into rumor detection has focused more on linguistic, user, and structural features. Concerning structural features, they examined the retweet propagation graph. However, in this study, unlike the previous studies, new structural features of the reply tree and user graph in extracting rumored conversations were extracted and analyzed from different aspects. In this study, the effectiveness of new structural features related to reply tree and user graph in detecting rumored conversations in Twitter events were evaluated from different aspects. First, the structural features of the reply tree and user graph were extracted at different time intervals, and important features in these intervals were identified using the Sequential Forward Selection approach. To evaluate the usefulness of valuable new structural features, these features have been compared with consideration of linguistic and user-specific features. Experiments have shown that combining new structural features with linguistic and user-specific features increases the accuracy of the rumor detection classification. Therefore, a rumor classification algorithm based on new structural, linguistic, and user-specific features in rumor conversation detection was proposed. This algorithm performs better than the basic methods and detects rumored conversations with greater accuracy. In addition, due to the importance of the source tweet user in conversations, this user was examined and analyzed from different aspects. The results showed that most rumored conversations were started by a small number of users. Rumors can be prevented by early identification of these users on Twitter events.
Keywords: Conversion, Rumor detection, Twitter, Reply tree, User graph -
Journal of Electrical and Computer Engineering Innovations, Volume:8 Issue: 2, Summer-Autumn 2020, PP 183 -192Background and Objectives
With the extensive web applications, review sentiment classification has attracted increasing interest among text mining works. Traditional approaches did not indicate multiple relationships connecting words while emphasizing the preprocessing phase and data reduction techniques, making a huge performance difference in classification.
MethodsThis study suggests a model as an efficient model for sentiment classification combining preprocessing techniques, sampling methods, feature selection methods, and ensemble supervised classification to increase the classification performance. In the feature selection phase of the proposed model, we applied n-grams, which is a computational method, to optimize the feature selection procedure by extracting features based on the relationships of the words. Then, the best-selected feature through the particle swarm optimization algorithm to optimize the feature selection procedure by iteratively trying to improve feature selection.
ResultsIn the experimental study, a comprehensive range of comparative experiments conducted to assess the effectiveness of the proposed model using the best in the literature on Twitter datasets. The highest performance of the proposed model obtains 97.33, 92.61, 97.16, and 96.23% in terms of precision, accuracy, recall, and f-measure, respectively.
ConclusionThe proposed model classifies the sentiment of tweets and online reviews through ensemble methods. Besides, two sampling techniques had applied in the preprocessing phase. The results confirmed the superiority of the proposed model over state-of-the-art systems The author(s). This is an open access article distributed under the terms of the Creative Commons Attribution (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, as long as the original authors and source are cited. No permission is required from the authors or the publishers.
Keywords: Text Classification, Sampling Technique, Feature selection, optimization algorithm, Twitter -
Journal of Electrical and Computer Engineering Innovations, Volume:8 Issue: 1, Winter-Spring 2020, PP 41 -52Background and Objectives
Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment target and seek for tweets containing positive, negative, or neutral opinions. This is remarkable for consumers to investigate the products before purchase automatically.
MethodsThis paper suggests a model for sentiment classification. The goal of this model is to investigate what is the role of n-grams and sampling techniques in Sentiment Classification application using an ensemble method on Twitter datasets. Also, it examines both binary and multiple classifications, which are classified datasets into positive, negative, or neutral classes.
ResultsTwitter Classification is an outstanding problem, which has very few free resources and not available due to modified authorization status. However, all Twitter datasets are not labeled and free, except for our applied dataset. We reveal that the combination of ensemble methods, sampling techniques, and n-grams can improve the accuracy of Twitter Sentiment Classification.
ConclusionThe results confirmed the superiority of the proposed model over state-of-the-art systems. The highest results obtained in terms of accuracy, precision, recall, and f-measure..The author(s). This is an open access article distributed under the terms of the Creative Commons Attribution (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, as long as the original authors and source are cited. No permission is required from the authors or the publishers.
Keywords: Text Mining, Text Classification, Machine Learning, Ensemble method, Twitter -
با معرفی وب 2.0، داده های ایستا که در وب 1.0 وجود داشتند، حالت ساخت یافته تری به خود گرفتند. ویکی ها، بلاگ ها، شبکه های اجتماعی و سیستم های بوکمارکینگ اجتماعی مثال هایی از آن هستند که کاربران در آنها محتوا تولید می کنند. یکی از مشکلات تولید محتوا توسط کاربر، عدم یکپارچگی محتوای تولیدشده می باشد که باعث تولید داده های ناهمگون شده و اجرای الگوریتم ها و تکنیک های کامپیوتری را دشوار می سازد. راه حل وب 2.0 برای کاهش اثر این مشکل، استفاده از هشتگ (تگ) برای مطالب منتشرشده توسط کاربر است که خود کاربر به مطالب منتشرشده خود، تگ می زند. این راهکار در میکروبلاگ هایی چون توئیتر کماکان رفع نشده است چرا که کاربران با محدودیت کاراکتری (140 کاراکتر برای هر توئیت) مواجه هستند و ممکن است تعداد کاراکترهای محتوا باعث شود که برخی کاراکترهای هشتگ در پست نباشد. در این مقاله سعی شده تا با استفاده از روش تخصیص دیریکله نهفته و نمونه برداری Gibbs فروریخته، مشکل پیشنهاد هشتگ در محیط ناهمگون توئیتر رفع شود. پیشنهاد هشتگ بر روی 8396744 توئیت به زبان انگلیسی پیاده سازی و در آزمایش های مختلف بین 1 تا 5 مرتبط ترین هشتگ پیشنهاد شده است. نتایج در حالات مختلف دقت بالای 20% و فراخوانی بالای 45% را نشان می دهد که نشانگر افزایش دقت از 3% به 21% و افزایش فراخوانی از 32% به 46% در مقایسه با دقیق ترین روش بررسی شده پیشنهاد هشتگ توسط LDA بدون تغییر، توسط نویسندگان است.کلید واژگان: سیستم های توصیه گر، توصیه هشتگ، بردار موضوعی، تخصیص دیریکله نهفته، نمونه برداری Gibbs، میکروبلاگ، توئیترStatic contents defined in Web 1.0 were replaced with structured user generated contents by means of Web 2.0. Wikis, Blogs, Social Networks, and Social Bookmarking Systems are some of the examples where users can generate and publish contents. Generating contents by users leads to creation of heterogeneous data which makes computation and algorithms hard to be applied. Web 2.0 benefits hashtags (tags) in order to solve the heterogeneous problem of the contents in which users can label their contents with hashtags. This technique cannot help in microblogging systems such as Twitter because of number of characters in each tweet (140 characters per tweet) and leads the tags or words be truncated or be used in heterogeneous form. In the current paper, a novel method is introduced based on Latent Dirichlet Allocation which can be used for numericalization tweets in a vector namely topic vector (TV). Additionally, TV is used for modeling users’ taste which can improve hashtag recommendation. The proposed method has been tested on 8396744 real tweets in English. The top 1 to 5 hashtags are recommended for each tweet and results show precision more than 20% and recall more than 45%. The improvement applied by TV shows that the most precision is increased from 3% to 32%, and recall from 21% to 46% to the best method tested by the authors.Keywords: Hashtag recommendation, topic vector, microblog, Twitter
- نتایج بر اساس تاریخ انتشار مرتب شدهاند.
- کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شدهاست. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
- در صورتی که میخواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.