Rumor Detection on Twitter using tweet and user features
When every news item is posted on social media, reactions to it are different and arouse curiosity from different viewpoints. The most important part is to understand the accuracy of the news. A rumor is invalid news, meaning it has not yet been confirmed and it may cause irreparable damage if it is not valid. Therefore, it is very important to detect it. Rumor detection, or in other words, determining its validity, plays an essential role in preventing fake news. Naturally, every phenomenon of normal and anomaly is transmitted to people through social networks. Every News Reactions to that news are different. Depending on the importance of the news, it may be widely covered or it may not have a specific reaction. But if the news spreads widely, it arouses curiosity from different angles. The news is false or true, or the news is valid or invalid. In this work, an attempt was made to identify rumors on social networks by using Hand-Crafted features based on tweets, users and a combination of the two, oversampling and normalization, and by using machine learning classification. Using 4 machine learning classifiers, including Support vector machine, Logistic regression, K-nearest neighbors and Random forest, the two rumors on social networks were detected. Two data sets, PHEME 2017 and PHEME 2018, have been used. The results on these two datasets show that in PHEME 2017, the random forest classifier shows an accuracy of 0.988 using tweet and combination features. Also, these features show a precision of 0.987, which is better than other classifiers used in this work. This classifier has a better recall than other classifiers along with logistic regression with a value of 0.986. Also, this classifier obtained better results with the two mentioned features, with 0.987. In the PHEME 2018 dataset, it obtained the RF classifier with an accuracy of 0.969 using tweet and combination features, and it has better performance in precision, recall and F1. In addition, the user feature in the classifier of k nearest neighbors brings better results than the other two features.