A Clustering Based Feature Selection Method in Spam Detection
Author(s):
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
One of the ways to detect spam is classifying emails into two categories: spam and non-spam. The high efficiency of machine learning methods in various fields has developed them in text clasification problems. The mechanism of machine learning-based classifiers that classify emails according to their content is based on a set of features, where due to the high volume of emails, using an efficient feature reduction algorithm plays an important role. Unlike the previous methods which select only the superior features and ignore the rest of the unselected features, in the proposed method of this article we try to use unselected features as well. The method is that after applying an initial feature selection, the unselected features are clustered and then each cluster is mapped to a new feature and the final feature vector forms from the selected ones and those mapped from the clusters. In this study, by applying two methods of selecting the initial feature and also two mapping functions, four methods were presented and analyzed using two datasets PU2 and PU3. The results of the analysis showed that the method based on feature selection DF and the advanced mapping function has the highest efficiency among all the proposed methods. Also, the proposed methods are more efficient than base feature selection methods (without clustering).
Keywords:
Language:
Persian
Published:
Information management, Volume:8 Issue: 1, 2022
Pages:
202 to 224
https://www.magiran.com/p2640037
سامانه نویسندگان
مقالات دیگری از این نویسنده (گان)
-
Improved Multi-scale Local Binary Pattern for Feature Extraction and Coral Reef Classification
Zahra Nazmi, MohammadHossein Shakoor *,
Machine Vision and Image Processing, -
A New Framework for Distributed Multivariate Feature Selection
Mona Sharifnezhad, *, Hosein Ghafarian
Signal and Data Processing, -
Ensemble Bayesian Classification Using Genetic Algorithm Wrapper Feature Selection in Spam Detection
*,
Information management,