Ensemble Bayesian Classification Using Genetic Algorithm Wrapper Feature Selection in Spam Detection
Author(s):
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
The role of email in communication is seriously threatened by a phenomenon called spam. So far, many methods have been proposed to deal with this phenomenon, one of the most important of which is to classify emails based on their content into two categories: spam and non-spam. Content-based classification mechanisms use the words as features, where applying an efficient feature selection mechanism is critical due to the large number of features. Therefore, the main focus of this paper is to select useful features via proposing a wrapper feature selection approach based on a powerful genetic algorithm. We then apply a Bayesian classifier, which has demonstrated a high efficiency in text classification. The main steps of the proposed method is as follows: first, an initial feature vector is chosen, then it is optimized by multiplying the vector in a matrix called the transformation matrix made by the genetic algorithm, and finally, a set of k feature vectors is generated. An ensemble classification approach composed of k Bayesian classifiers is applied to the feature vectors, and the ultimate class label is determined by voting among ensemble members. The proposed method is implemented on two datasets PU1 and PU2. The results show that the classification accuracy of the proposed method with k=7 reaches 87.86 and 87.91 in PU1 and PU2, rspectively. The results also indicate the efficiency of the proposed method compared to naïve Bayes and two well-known classifiers SVM and KNN.
Keywords:
Language:
Persian
Published:
Information management, Volume:6 Issue: 2, 2021
Pages:
250 to 277
https://www.magiran.com/p2332917
سامانه نویسندگان
مقالات دیگری از این نویسنده (گان)
-
Improved Multi-scale Local Binary Pattern for Feature Extraction and Coral Reef Classification
Zahra Nazmi, MohammadHossein Shakoor *,
Machine Vision and Image Processing,