Outlier detection in test samples and supervised training set selection
Outlier detection is a technique for recognizing samples out of the main population within a data set. Outliers have negative impacts on classification. The recognized outliers are deleted to improve the classification power generally. This paper proposes a method for outlier detection in test samples besides a supervised training set selection. Training set selection is done based on the intersection of three well known similarity measures namely, jacquard, cosine, and dice. Each test sample is evaluated against the selected training set for possible outlier detection. The selected training set is used for a two-stage classification. The accuracy of classifiers are increased after outlier deletion. The majority voting function is used for further improvement of classifiers.