Improvement of the performance of machine learning algorithms in predicting breast cancer
Breast cancer is one of the most common cancers among women compared to all other ones. Machine learning (ML) techniques can bring a large contribute on the process of prediction and early diagnosis of breast cancer, became a research hotspot and has been proved as a strong technique. Using ML models performed on multidimensional dataset, this article aims to find the most efficient and accurate ML models for tumor classification prediction.
Several supervised ML algorithms were utilized to diagnosis and prediction of cancer tumor such as Logistic Regression Decision Tree, Random Forest and KNN. The algorithms are applied to a dataset taken from the UCI repository including 699 samples. The dataset includes Breast cancer features. To enhance the algorithms’ performance, these features are analyzed, the feature importance score and cross validation are considered. In this research, ML algorithms improved coupled by limited and selective features to produce high classification accuracy in tumor classification.
As a result of evaluation, Logistic Regression algorithm with accuracy value equal to 99.14%, AUC ROC equal to 99.6%, Extra Tree algorithm with accuracy value equal to 99.14% and AUC ROC equal to 99.1% have better performance than other algorithms. Therefore, these techniques can be useful for diagnosis and prediction of cancer tumor and prescribe it correctly.
The technique of ML can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of ML to evaluate breast cancer and indeed, the diagnosis and prediction of breast cancer is compared to determine the most appropriate classifier.