Applying Regression Models on Subsets with High Correlations for a Better Numeric Missing Values Imputation
Author(s):
Article Type:
Research/Original Article (دارای رتبه معتبر)
Abstract:
The presence of missing values in the real world data is a very prevalent and inevitable problem. So, it’s necessary to fill up these missing values accurately, before they are used for knowledge discovery process. This paper proposes three novel methods to fill numeric missing values. All of the proposed methods apply regression models on subsets of data which there are strong correlations among them. These subsets are selected using forward selection based approaches. In the selection of the desired subsets, it is tried to maximize the correlation between missing attribute and other attributes. The correlation coefficient is used to measure the relationships between attributes. The priority of each missing attribute for imputation purpose is also considered in the proposed methods. The performance of proposed methods is evaluated on five real world datasets with different missing ratios. The efficiency of the proposed methods is compared with five different estimation methods, namely, the mean imputation, the k nearest neighbours imputation, a fuzzy c-means based imputation, a decision tree based imputation, and a regression based imputation algorithm, called “Incremental Attribute Regression Imputation” (IARI) method. Two well-known evaluation criteria, namely, Root Mean Squared Error (RMSE) and Coefficient of Determination (CoD) are used to compare the performance of proposed methods with other imputation methods. Experimental results show that the proposed methods perform better than other compared methods, even when the missing ratio is high.
Keywords:
Language:
Persian
Published:
Journal of Electrical Engineering, Volume:48 Issue: 3, 2018
Pages:
1187 to 1200
https://www.magiran.com/p1921987
سامانه نویسندگان
مقالات دیگری از این نویسنده (گان)
-
Presenting a new method for mixed data clustering based on the number of similar features
*
Signal and Data Processing, -
Scalable unsupervised feature selection via matrix learning and bipartite graph theory
Kosar Salehnezhad, *
Journal of Iranian Association of Electrical and Electronics Engineers,