به جمع مشترکان مگیران بپیوندید!

تنها با پرداخت 70 هزارتومان حق اشتراک سالانه به متن مقالات دسترسی داشته باشید و 100 مقاله را بدون هزینه دیگری دریافت کنید.

برای پرداخت حق اشتراک اگر عضو هستید وارد شوید در غیر این صورت حساب کاربری جدید ایجاد کنید

عضویت
جستجوی مقالات مرتبط با کلیدواژه

feature selection

در نشریات گروه پزشکی
  • Tahereh Manouchehri, Reza Fereidooni, Seyyed Taghi Heydari, Kamran Bagheri Lankarani*
    Background

    Traffic accidents remain a critical global public health issue, resulting in numerous fatalities and injuries annually.

    Objectives

    This study aims to explore the application of machine learning (ML) in analyzing traffic accident data obtained from self-report questionnaires to identify factors influencing the incidence and severity of accidents.

    Methods

    The study design is cross-sectional. In this study, approximately 660 participants completed the questionnaire, of which 43 were incomplete or invalid and were excluded. The remaining 617 participants answered all questions in full. Participants were selected using a convenience sampling method from five districts in Shiraz to ensure diversity, including outreach to taxi and heavy vehicle terminals. Data were collected through face-to-face questionnaires administered by trained researchers, and all responses were self-reported. The dataset collected from 617 participants includes information on demographics, vehicle and road features, personality traits, driving habits, and risky driving behavior. The questionnaire incorporated multiple validated instruments capturing driving behavior, demographics (such as age, gender, marital status, education, income), and habits (e.g., driving duration, cellphone use, fatigue, and substance use). Various ML algorithms, such as random forest and SHapley Additive exPlanations (SHAP) analysis, were employed to identify factors influencing both the occurrence and severity of accidents. Furthermore, the C5.0 algorithm was utilized to extract specific patterns, while prediction tasks were addressed using a combination of random forest, support vector machine (SVM), logistic regression, and Naive Bayes algorithms.

    Results

    The random forest algorithm highlighted that factors such as income, driving time, working time, age, duration of non-stop driving, type of law enforcement, openness, normlessness, sensation seeking, and vehicle safety significantly influence the occurrence of accidents. For accident severity, important predictors included driving time, non-stop driving, working time, age, aggressive violations, income, road quality, type of law enforcement, driving while tired, vehicle safety, foreign car status, and vehicle comfort. Additionally, the C5.0 algorithm revealed specific patterns—such as high normlessness and extended driving hours—increasing the likelihood of accidents, while factors like low normlessness and balanced income served as protective elements.

    Conclusions

    The findings highlight the impact of lifestyle and work-related factors, as well as certain personality traits of drivers, on the incidence and severity of accidents. While the results of the study should not be taken verbatim due to the reliance on self-reported data, the study supports the application of ML in the analysis of accident data. It also advocates for the use of strategies including social and economic interventions, psychological assessments, enhanced road safety education, and customized regulatory measures based on individual risk assessments to effectively prevent traffic accidents.

    Keywords: Traffic Accidents, Predictive Analytics, Machine Learning, Feature Selection
  • Jamshid Pirgazi *, Mohammad Mehdi Pourhashem Kallehbasti, Ali Ghanbari Sorkhi, Ali Kermani
    Introduction

    High-dimensional datasets often contain an abundance of features, many of which are irrelevant to the subject of interest. This issue is compounded by the frequently low number of samples and imbalanced class samples. These factors can negatively impact the performance of classification algorithms, necessitating feature selection before classification. The primary objective of feature selection algorithms is to identify a minimal subset of features that enables accurate classification.

    Methods

    In this paper, we propose a two-stage hybrid method for the optimal selection of relevant features. In the first stage, a filter method is employed to assign weights to the features, facilitating the removal of redundant and irrelevant features and reducing the computational cost of classification algorithms. A subset of high-weight features is retained for further processing in the second stage. In this stage, an enhanced Harris Hawks Optimization algorithm and GRASP, augmented with crossover and mutation operators from genetic algorithms, are utilized based on the weights calculated in the first stage to identify the optimal feature set.

    Results

    Experimental results demonstrate that the proposed algorithm successfully identifies the optimal subset of features.

    Conclusion

    The two-stage hybrid method effectively selects the optimal subset of features, improving the performance of classification algorithms on high-dimensional datasets. This approach addresses the challenges posed by the abundance of features, low number of samples, and imbalanced class samples, demonstrating its potential for application in various fields.

    Keywords: Feature Selection, High-Dimensional Data, Harris Hawks Optimization, Global Search
  • Leila Nezamabadi Farahani, Anoshirvan Kazemnejad *, Mahlagha Afrasiabi, Leili Tapak
    Objective

    This study aimed to develop a hybrid model for variable selection in high-dimensional survival analysis using a support vector regression (SVR), to identify prognostic biomarkers associated with survival in oral cancer (OC) patients through the analysis of gene expression data. 

    Materials and Methods

    In this retrospective cohort study, gene expression profiles (54,613 probes) related to 97 patients from the GSE41613 dataset from the GEO repository were used. First of all, martingale residuals were obtained using a Cox regression without covariates, and were used as pseudo-survival outcome. Then, the particle swarm optimization (PSO) and genetic algorithm (GA) were used in combination with SVR for selecting features related to pseudo-survival outcome. Concordance index (C-index), mean absolute error (MAE), mean squared error (MSE) and R-squares, were used to evaluate the performance of the models using selected features. Functional enrichment analysis was performed using DAVID database, and external validation utilized three independent datasets (GSE9844, GSE75538, GSE37991, GSE42743). 

    Results

    The findings indicated that the PSO-based method outperformed the GA-based method, achieving a smaller MAE (0.061) and MSE (0.005), R-square (0.99) and C-index (0.973), selecting 291 probes from 1069 screened. A protein-protein interaction (PPI) network was constructed, including 200 nodes and 120 edges. Eleven key genes with the highest degree, including RBM25, SMC3, PRPF40A, POLE, SRRT, BCLAF1, PDS5B, HNRNPR, JAK1, MED23, and SULT1A1 were identified as significant biomarkers associated with OC survival. 

    Conclusion

    The PSO-based hybrid model effectively improved SVR performance in survival prediction for OC patients and identified key prognostic biomarkers. Despite its promising results and validation on independent datasets, limitations in generalizability and signs of overfitting suggest the model is not yet ready for clinical use. Further studies with larger, diverse datasets are recommended.

    Keywords: Feature Selection, Gene Expression, Genetic Algorithm, Mouth Neoplasms, Particle Swarm Optimization
  • Deepak Painuli, Suyash Bhardwaj, Utku Kose

    Parkinson’s disease (PD) is a neurodegenerative disorder that progressively worsens with age, particularly affecting the elderly. Symptoms of PD include visual hallucinations, depression, autonomic dysfunction, and motor difficulties. Conventional diagnostic methods often rely on subjective interpretations of movement, which can be subtle and challenging to assess accurately, potentially leading to misdiagnoses. However, recent studies indicate that over 90% of individuals with PD exhibit vocal abnormalities at the onset of the disease. Machine learning (ML) techniques have shown promise in addressing these diagnostic challenges due to their higher efficiency and reduced error rates in analyzing complex, high-dimensional datasets, particularly those derived from speech signals. This study investigates 12 machine learning models—logistic regression (LR), support vector machine (SVM, linear/RBF), K-nearest neighbor (KNN), Naïve bayes (NB), decision tree (DT), random forest (RF), extra trees (ET), gradient boosting (GbBoost), extreme gradient boosting (XgBoost), adaboost, and multi-layer perceptron (MLP)—to develop a robust ML model capable of reliably identifying PD cases. The analysis utilized a PD voice dataset comprising 756 acoustic samples from 252 participants, including 188 individuals with PD and 64 healthy controls. The dataset included 130 male and 122 female subjects, with age ranges of 33 - 87 years and 41 - 82 years, respectively. To enhance model performance, the GridSearchCV method was employed for hyperparameter tuning, alongside recursive feature elimination (RFE) and minimum redundancy maximum relevance (mRMR) feature selection techniques. Among the 12 ML models evaluated, the RF model with the RFE-generated feature subset (RFE-50) emerged as the top performer. It achieved an accuracy of 96.46%, a recall of 0.96, a precision of 0.97, an F1-score of 0.96, and an AUC score of 0.998, marking the highest performance metrics reported for this dataset in recent studies.

    Keywords: Medical Diagnosis, Parkinson’S Disease, Machine Learning, Data Preprocessing, Feature Selection, Gridsearchcv
  • شیوا کنعانی*، ایرج مهدوی، نغمه ضیایی، باقر رحیم پور کامی
    مقدمه

     بیماری قلبی یکی از علل اصلی مرگ ومیر است و پیش بینی می شود تا سال 2030 مرگ ومیر ناشی از بیماری های قلبی- عروقی به 23/3 میلیون نفر افزایش یابد. نارسایی قلبی و شوک کاردیوژنیک سهم بالایی از این مرگ ومیرها دارند و به عنوان اورژانس پزشکی نیازمند درمان به موقع هستند. هدف این پژوهش، پیش بینی سریع مرگ در بیماران نارسایی قلبی با شوک کاردیوژنیک با استفاده از ویژگی های کمتر است.

    روش کار

    این پژوهش به روش تحلیلی - مقطعی با نمونه گیری تمام شماری صورت گرفت. داده های 201 بیمار قلبی بالای 18 سال که در سال 2020 در بیمارستان روحانی بابل دچار شوک کاردیوژنیک شده بودند، بررسی شدند. از 34 ویژگی مانند سن، سابقه جراحی قلب باز، pH، لاکتات، دیابت و فشارخون استفاده شد و مرگ یک ماهه از طریق تماس تلفنی بررسی شد. برای پیش بینی مرگ از رگرسیون لجستیک و الگوریتم GBM استفاده شد.

    یافته ها

    میانگین سن بیماران 69/44±15/71 سال بود. از این تعداد، 47/7 درصد فوت کردند. چهار ویژگی شامل سن، لاکتات، دیابت و گیجی به عنوان مهم ترین ویژگی ها انتخاب شدند. با یک سال افزایش در سن، احتمال مرگ 7 درصد افزایش می یابد. احتمال مرگ در افراد دیابتی بیش از دوبرابر است. گیجی خطر مرگ را 4 برابر و افزایش لاکتات خطر مرگ را 1/5 برابر افزایش می دهد.

    نتیجه گیری

    نتایج نشان داد انتخاب ویژگی های موثر در پیش بینی مرگ بیماران نارسایی قلبی با شوک کاردیوژنیک با رگرسیون لجستیک و الگوریتم GBM امکان پذیر است و می تواند به بهبود برنامه های ارجاع درمانی و کاهش هزینه های پزشکی کمک کند.

    کلید واژگان: نارسایی قلبی، شوک کاردیوژنیک، پیش بینی مرگ، رگرسیون لجستیک، انتخاب ویژگی
    Shiva Kanani*, Iraj Mahdavi, Naghmeh Ziaie, Bagher Rahimpour Cami
    Introduction

    Cardiovascular diseases remain a leading global cause of mortality, with ischemic heart disease projected to account for 23.3 million deaths by 2030. Heart failure and cardiogenic shock account for a significant proportion of these deaths and require timely treatment as medical emergencies. This study aims to predict mortality within one month in patients experiencing cardiogenic shock secondary to heart failure using a concise set of predictive features.

    Method

    An analytical cross-sectional study was conducted at Babol Razi Hospital, involving 201 adult patients (≥18 years) treated for cardiogenic shock in 2020. Data from 34 clinical variables, including age, history of cardiac surgery, pH levels, lactate concentration, diabetes status, and blood pressure, were meticulously analyzed. Mortality outcomes within one month were assessed via structured telephone follow-up. Logistic regression and Gradient Boosting Machine (GBM) algorithms were used for predictive modeling.

    Results

    The average age of patients was 69.44 ±15.71 years. Among them, 47.7% died. The study identified age, lactate levels, diabetes, and initial confusion as significant predictors of mortality risk. Each additional year of age was associated with a 7% higher probability of mortality. Diabetic patients faced more than double the mortality risk compared to non-diabetics. Confusion at presentation increased the mortality risk fourfold, while elevated lactate levels raised it by 1.5 times.

    Conclusion

    Logistic regression and GBM algorithms demonstrated effectiveness in predicting one-month mortality among cardiogenic shock patients with heart failure based on selected features. This approach holds promise for improving referral processes and reducing costs in healthcare settings.

    Keywords: Heart Failure, Cardiogenic Shock, Death Prediction, Logistic Regression, Feature Selection
  • حامد صباغ گل، حمید سعادت فر*، مهدی خزاعی پور
    هدف

    بیماری آلزایمر به عنوان یک اختلال عصبی پیش رونده و تخریب کننده، به دلیل تاثیر قابل توجه بر کیفیت زندگی افراد، به ویژه سالمندان، از اهمیت بالایی برخوردار است. با توجه به افزایش روزافزون شیوع این بیماری، توسعه روش های دقیق برای پیش بینی و تشخیص زودهنگام آن ضروری است. در این پژوهش، با بهره گیری از روش های نوین انتخاب ویژگی و مدل های یادگیری ماشین، به دنبال شناسایی عوامل کلیدی موثر در پیش بینی بیماری آلزایمر هستیم. هدف اصلی این مطالعه، کمک به توسعه ابزارهای تشخیصی دقیق تر و در نتیجه بهبود مدیر یت و درمان این بیماری است.

    روش ها

    در این پژوهش، با بهره گیری از ده روش انتخاب ویژگی مبتنی بر Wrapper ، به شناسایی دقیق ترین و مرتبط ترین ویژگی های بیماری آلزایمر پرداخته ایم. کارایی این مدل ها با استفاده از الگوریتم های یادگیری ماشین پرکاربرد و معیارهای ارزیابی استاندارد نظیر دقت، ویژگی، صحت، حساسیت، اندازه گیری F1 و تحلیل منحنی ROCمورد ارزیابی قرار گرفته و نتایج حاصل با یکدیگر مقایسه می شوند. تمامی ارزیابی ها بر روی مجموعه داده استاندارد بیماران آلزایمر ADNI انجام شده ا ست.

    نتایج

    ویژگی های اثرگذار شامل نتایج آزمون های شناختی (مانند آزمون کوتاه وضعیت ذهنی)، ارزیابی های عملکردی، گزارش های بیماران درباره مشکلات حافظه و رفتاری، و همچنین امتیاز فعالیت های روزمره، به عنوان شاخص های کلیدی در تشخیص بیماری آلزایمر، شناسایی شدند.

    نتیجه گیری

    نتایج نشان میدهد که با استفاده از روش های نوین انتخاب ویژگی و الگوریتم های یادگیری ماشین، می توان مدل های دقیق تری برای پیش بینی بیماری آلزایمر توسعه داد. این یافته ها می تواند در بهبود تشخیص زود هنگام و مدیریت این بیماری موثر باشد.

    کلید واژگان: کاهش ابعاد، انتخاب ویژگی، بیماری آلزایمر، یادگیری ماشین
    Hamed Sabbaghgol, Hamid Saadatfar*, Mahdi Khazaiepoor
    Introduction

    Alzheimer’s disease, a progressive and debilitating neurological disorder, significantly impacts the quality of life, particularly in the elderly. Given the increasing prevalence of this disease, developing accurate methods for early prediction and diagnosis is crucial. This study aims to identify key factors influencing Alzheimer’s disease prediction using novel feature selection techniques and machine learning models. The primary objective of this study is to contribute to the development of more accurate diagnostic tools, thereby improving the management and treatment of this disease.

    Methods

    In this study, we employed ten wrapper-based feature selection methods to identify the most accurate and relevant features of Alzheimer’s disease. The performance of these models was evaluated using popular machine learning algorithms and standard evaluation metrics such as accuracy, precision, recall, F1-score, and ROC curve analysis. All evaluations were conducted on the ADNI standard Alzheimer’s disease dataset.

    Results

    The influential features included cognitive test results (e.g., Mini-Mental State Examination), functional assessments, patient-reported memory and behavioral problems, and activities of daily living scores, which were identified as key indicators for Alzheimer’s disease diagnosis.

    Discussion

    The results demonstrate that employing novel feature selection techniques and machine learning algorithms can lead to the development of more accurate models for predicting Alzheimer’s disease. These findings can contribute to improving early diagnosis and management of this diseas.

    Keywords: Dimensionality Reduction, Feature Selection, Alzheimer's Disease, Machine Learning
  • مهرنوش آهنگرانی، محمدجعفر تارخ*
    زمینه و هدف

    در سال های اخیر، یادگیری ماشین و الگوریتم های تکاملی توجه پژوهشگران و متخصصین در حوزه های مختلف، به ویژه حوزه سلامت را به جنبه های کاربردی آنها در پردازش مجموعه داده های کلان برای ارائه بینش های مفید به خود جلب کرده اند. از طرف دیگر، تشخیص سریع و دقیق بیماری دیابت یکی از مهم ترین مسائل در پزشکی است و افزایش نرخ ابتلا به این بیماری برای جوامع جهانی نگرانی های بسیاری را به همراه داشته است. مطالعه حاضر با هدف ایجاد یک مدل تشخیصی مبتنی بر الگوریتم های تکاملی و یادگیری ماشین جهت تشخیص بیماری دیابت انجام شد.

    روش کار

    این پژوهش یک چارچوب مبتنی بر تشخیص هوشمند بیماری دیابت را ارائه می دهد. روش پیشنهادی شامل دو مرحله اصلی است: مرحله اول شامل رویکرد طبقه بندی با استفاده از الگوریتم های K-نزدیک ترین همسایه و جنگل تصادفی است. مرحله دوم شامل رویکرد ترکیبی انتخاب ویژگی و طبقه بندی به منظور بهبود نتایج مرحله اول است که در آن از الگوریتم های بهینه ساز گرگ خاکستری، بهینه ساز نهنگ و بهینه ساز ازدحام ذرات جهت انتخاب ویژگی استفاده شده است. در این تحقیق از مجموعه داده دیابت هندی پیما استفاده شده است. تجزیه و تحلیل مقایسه ای بین رویکردهای مختلف از طریق شاخص های ارزیابی دقت، صحت و فراخوانی و امتیاز F1 انجام شده است.

    نتایج

    پس از مقایسه های تطبیقی بین مدل های پیشنهادی، مدل جنگل تصادفی مبتنی بر بهینه ساز گرگ خاکستری با صحت پیش بینی 81/38%  به عنوان مدل نهایی انتخاب و معرفی شد.

    نتیجه گیری

    نتایج حاصل از این پژوهش نشان می دهد که استفاده از الگوریتم های تکاملی در کنار مدل های یادگیری ماشینی، می تواند کارایی و صحت تشخیص بیماری دیابت و عوارض ناشی از آن را در بیش تر مواقع افزایش دهد.

    کلید واژگان: تشخیص دیابت، یادگیری ماشین، الگوریتم های تکاملی، انتخاب ویژگی
    Mehrnoosh Ahangarani, Mohammadjafar Tarokh*
    Background and Aim

    In recent years, machine learning and evolutionary algorithms have drawn the attention of researchers and specialists in various fields, especially in healthcare, due to their practical applications in processing large datasets to provide valuable insights. Considering the increasing prevalence of diabetes and its rapid and accurate diagnosis being one of the most critical issues in medicine, significant concerns are faced by global communities worldwide. The present study was conducted with the aim of creating a diagnostic model based on evolutionary algorithms and machine learning to diagnose diabetes.

    Materials and Methods

    This research based on the Indian Pima diabetes dataset presents a framework based on intelligent diabetes diagnosis. The proposed method consists of two main stages. The first stage involves a classification approach using K-nearest neighbors and random forest algorithms. The second stage includes a combined feature selection and classification approach to enhance the results of the first stage, utilizing grey wolf optimization, whale optimization, and particle swarm optimization algorithms for feature selection. Comparative analysis among different approaches is conducted through evaluation metrics such as accuracy, precision, recall, and F1-score.

    Results

    After comparative comparisons among the proposed models, the random forest model based on the grey wolf optimization was selected and introduced as the final model with a prediction accuracy of 81.38%.

    Conclusion

    The findings of this research indicate that the use of evolutionary algorithms alongside machine learning models can often enhance the efficiency and accuracy of diabetes diagnosis and its associated complications.

    Keywords: Diabetes Diagnosis, Machine Learning, Evolutionary Algorithms, Feature Selection
  • Elnaz Sheikhian, Majid Ghoshuni, Mahdi Azarnoosh, Mohammad Mahdikhalilzadeh
    Background

    This study explores a novel approach to detecting arousal levels through the analysis of electroencephalography (EEG) signals. Leveraging the Faller database with data from 18 healthy participants, we employ a 64‑channel EEG system.

    Methods

    The approach we employ entails the extraction of ten frequency characteristics from every channel, culminating in a feature vector of 640 dimensions for each signal instance. To enhance classification accuracy, we employ a genetic algorithm for feature selection, treating it as a multiobjective optimization task. The approach utilizes fast bit hopping for efficiency, overcoming traditional bit‑string limitations. A hybrid operator expedites algorithm convergence, and a solution selection strategy identifies the most suitable feature subset.

    Results

    Experimental results demonstrate the method’s effectiveness in detecting arousal levels across diverse states, with improvements in accuracy, sensitivity, and specificity. In scenario one, the proposed method achieves an average accuracy, sensitivity, and specificity of 93.11%, 98.37%, and 99.14%, respectively. In scenario two, the averages stand at 81.35%, 88.65%, and 84.64%.

    Conclusions

    The obtained results indicate that the proposed method has a high capability of detecting arousal levels in different scenarios. In addition, the advantage of employing the proposed feature reduction method has been demonstrated.

    Keywords: Arousal Level, Feature Selection, Genetic Algorithms, Machine Learning
  • Reza Sheibani*, Mohammad Reza Mazaheri Habibi, Hojjat Azadravesh
    Introduction

    The remarkable growth of lung cancer and its associated impacts and consequences, along with the substantial costs it imposes on society, has driven the medical community to pursue programs aimed at further examination, prevention, early detection, and diagnosis. In medicine science, timely discovery and diagnosis of diseases can prevent many life-threatening conditions and save people's lives.

    Material and Methods

    This study aims to predict lung cancer using a novel feature selection method integrated with a classifier. Our approach entails a comprehensive four-stage method. Initially, we calculate feature similarities within a lung cancer dataset using the absolute value of the Pearson correlation coefficient, followed by the clustering of initial features using the community detection algorithm called Louvain. Next, we employ techniques to determine the optimal subset of features using the concept of node centrality. Ultimately, lung cancer diagnosis is executed using the selected features, leveraging a classifier.

    Results

    Comparative analysis reveals that our proposed method outperforms existing techniques in terms of reduced execution time and improved prediction accuracy. When compared with established methods, our approach demonstrates superior outcomes in terms of the number of selected features and classification accuracy. Our method reduced 12600 features to 118 features and its accuracy was 95.28, 95.49, 95.23 and 95.32 for Support Vector Machine (SVM), Decision Tree (DT), Naive Bayes (NB) and K-Nearest Neighbor (KNN) classifier. The comparison of runtime shows that the proposed method is significantly improved with a runtime of 2.146 seconds compared to other methods.

    Conclusion

    The proposed feature selection method successfully reduced the initial feature set and significantly decreased computational time. Moreover, the achieved prediction accuracies underscore the reliability of our approach. This significant reduction in feature space while maintaining consistently high prediction accuracies serves as a strong validation of the potency and practical applicability of our methodology in the domain of lung cancer prediction. These compelling results strongly advocate for the potential real-world impact of our approach.

    Keywords: Lung Cancer Prediction, Feature Selection, Community Detection Algorithm
  • Sanaz Rezvani, Ali Chaibakhsh*
    Background

    Applying efficient feature extraction and selection methods is essential in improving the performance of machine learning algorithms employed in brain-computer interface (BCI) systems.

    Objectives

    The current study aims to enhance the performance of a motor imagery-based BCI by improving the feature extraction and selection stages of the machine-learning algorithm applied to classify the different imagined movements.

    Materials & Methods

    In this study, a multi-rate system for spectral decomposition of the signal is designed, and then the spatial and temporal features are extracted from each sub-band. To maximize the classification accuracy while simplifying the model and using the smallest set of features, the feature selection stage is treated as a multiobjective optimization problem, and the Pareto optimal solutions of these two conflicting objectives are obtained. For the feature selection stage, non-dominated sorting genetic algorithm II (NSGA-II), an evolutionary-based algorithm, is used wrapper-based, and its effect on the BCI performance is explored. The proposed method is
    implemented on a public dataset known as BCI competition III dataset IVa.

    Results

    Extracting the spatial and temporal features from different sub-bands and selecting the features with an evolutionary optimization approach in this study led to an improved classification accuracy of 92.19% which has a higher value compared to the state of the art.

    Conclusion

    The results show that the proposed improved classification accuracy could achieve a high-performance subject-specific BCI system.

    Keywords: Brain-computer interface, Motor imagery, Feature extraction, Feature selection, Optimization
  • مقدمه

     تحقیقات قبلی عوامل موثر مختلفی را در میزان موفقیت درمان های IVF/ICSI شناسایی کرده اند، اما عدم وجود یک رویکرد استاندارد برای درمان های مختلف همچنان یک چالش است.

    هدف

     هدف از این مطالعه استفاده از رویکرد یادگیری ماشینی برای شناسایی عوامل پیش بینی کننده اصلی موفقیت در درمان های لقاح آزمایشگاهی (IVF) و تزریق اسپرم داخل سیتوپلاسمی (ICSI) است.

    مواد و روش ها

     اطلاعات 734 نفر از آبان 1395 تا اسفند 1396 با همکاری ارزشمند دو مرکز ناباروری در مشهد جمع آوری شد. این همکاری شامل یک مرکز دولتی ناباروری وابسته به دانشگاه علوم پزشکی (میلاد) و یک مرکز خصوصی ناباروری (نوین) در مشهد، ایران بود. ما از روش های انتخاب ویژگی (فیلتر، تعبیه شده و بسته بندی) برای کاهش ابعاد در یک مدل جنگل تصادفی استفاده کردیم. ما ابتدا از مجموعه های فازی مردد (HFS) برای انتخاب موثرترین روش برای کاهش ابعاد استفاده کردیم. سپس، ما از یک رویکرد ترکیبی برای شناسایی پیش بینی کننده های کلیدی، افزایش دقت و قابلیت اطمینان استفاده کردیم. عملکرد روش پیشنهادی با استفاده از معیارهای یادگیری ماشین مانند MCC، Runtime، Accuracy، AUC، PPV، Recall و F-Score ارزیابی شد. این روش ترکیبی نقاط قوت روش های انتخاب ویژگی را ترکیب می کند و شناسایی پیش بینی کننده را بهبود می بخشد.

    نتایج

     روش انتخاب ویژگی ترکیبی ما با بالاترین دقت (795/0، ACC) AUC (72/0) و F-Score (8/0) برتری داشت، در حالی که تنها 7 ویژگی را انتخاب کرد. اینها عبارتند از FSH، 16Cells، Fage، Oocytes، GIII، Compact و Unsuccessful.

    نتیجه گیری

     در این مطالعه، ما کاربرد جدیدی از مجموعه های فازی مردد (HFS) را در روش پیشنهادی خود، با پشتیبانی از یک مجموعه داده چند مرکزی، برای انتخاب ویژگی های تاثیرگذار در پیشبینی میزان موفقیت ناباروری معرفی می کنیم. با استفاده از انحراف استاندارد در بین معیارهای مختلف، HFS ها کیفیت انتخاب ویژگی را افزایش داده و تعداد ویژگی ها را کاهش می دهند. نتایج ما تفاوت معنی داری را بین مقادیر میانگین گروه های باردار و غیرباردار برای ویژگی های انتخاب شده از جمله FSH، FAge، 16Cells، Oocytes، GIII و Compact نشان می دهد. علاوه بر این، ما ارتباط معنی داری بین FAge و ضربان قلب جنین (FHR) و با نرخ بارداری بالینی (CPR) پیدا کردیم و بالاترین سطح FSH (87/31%) برای دوزهای FSH در محدوده 10 تا 13 (mIU/ml) مشاهده شد.

    کلید واژگان: یادگیری ماشین، انتخاب ویژگی، درمان ناباروری، مجموعه های فازی مردد
    Ameneh Mehrjerd, Hassan Rezaei *, Saeid Eslami, Nayyere Khadem Ghaebi
    Background

    Previous research has identified key factors affecting in vitro fertilization or intracytoplasmic sperm injection success, yet the lack of a standardized approach for various treatments remains a challenge.

    Objective

    The objective of this study is to utilize a machine learning approach to identify the principal predictors of success in in vitro fertilization and intracytoplasmic sperm injection treatments.

    Materials and Methods

    We collected data from 734 individuals at 2 infertility centers in Mashhad, Iran between November 2016 and March 2017. We employed feature selection methods to reduce dimensionality in a random forest model, guided by hesitant fuzzy sets (HFSs). A hybrid approach enhanced predictor identification and accuracy (ACC), as assessed using machine learning metrics such as matthew’s correlation coefficient, runtime, ACC, area under the receiver operating characteristic curve, precision or positive predictive value, recall, and F-Score, demonstrating the effectiveness of combining feature selection methods.

    Results

    Our hybrid feature selection method excelled with the highest ACC (0.795), area under the receiver operating characteristic curve (0.72), and F-Score (0.8), while selecting only 7 features. These included follicle-stimulation hormone (FSH), 16Cells, FAge, oocytes, quality of transferred embryos (GIII), compact, and unsuccessful.

    Conclusion

    We introduced HFSs in our novel method to select influential features for predicting infertility success rates. Using a multi-center dataset, HFSs improved feature selection by reducing the number of features based on standard deviation among criteria. Results showed significant differences between pregnant and non-pregnant groups for selected features, including FSH, FAge, 16Cells, oocytes, GIII, and compact. We also found a significant correlation between FAge and fetal heart rate and clinical pregnancy rate, with the highest FSH level (31.87%) observed for doses ranging from 10-13 (mIU/ml).

    Keywords: Machine learning, Feature selection, Infertility treatment, Hesitant fuzzy set
  • کارلو آبنوسیان، رحمان فرنوش*، محمدحسن بهزادی
    مقدمه

    دیابت یک بیماری مزمن است و میزان مرگ و میر آن در حال افزایش است. متخصصان سلامت به دنبال راهکارهای نوآورانه برای تشخیص و درمان زودهنگام آن هستند. پیشرفت های یادگیری ماشینی تشخیص بیماری را بهبود داده است. با این حال، به دلیل کمبود داده های برچسب گذاری شده، مقادیر ناقص و نامتعادل بودن داده ها، ایجاد یک پیش بین بهینه برای تشخیص بیماری به یک چالش بزرگ تبدیل شده است. هدف این مطالعه ارایه یک چارچوب طبقه بندی مبتنی بر خط لوله برای تشخصیص دیابت در دو مجموعه داده هندی (دو کلاس: بیمار و سالم) و عراقی (سه کلاس: بیمار، سالم و در شرف ابتلا به دیابت) است.

    روش

    بخش مهم این چارچوب پیش پردازش است. مدل های مختلف یادگیری ماشین مبتنی بر رویکرد One-Vs-One برای حالت سه کلاسه، در چارچوب پیشنهادی پیاده سازی شده اند. به دلیل نامتعادل بودن مجموعه داده، علاوه بر معیار ارزیابی دقت طبقه بندی، مساحت زیر منحنی مشخصه عملکرد گیرنده نیز استفاده می شود. با هدف افزایش مساحت زیر منحنی مشخصه عملکرد گیرنده و دقت طبقه بند، فراپارامترهای هریک از مدل ها با روش های بهینه سازی جستجوی شبکه ای و بیزین بهینه سازی می شوند برای ساختن مدلی قدرتمند با زمان کم آموزش و آزمایش از روش های مختلف انتخاب ویژگی استفاده می شود.

    نتایج

    از طریق شبیه سازی، چارچوب پیشنهادی برای تشخیص بیماری دیابت در دو مجموعه داده هندی و عراقی مورد آزمایش قرار گرفت. نتایج نشان داد که با استفاده از AdaBoost در مجموعه داده هندی (94/11AUC=، 89/98ACC=) و با استفاده از جنگل تصادفی در مجموعه داده عراقی  (98/62AUC=، 98/66ACC=)، دقت و عملکرد مطلوبی به دست آمد.

    نتیجه گیری

    از نظر معیارهای ACC، دقت، صحت، یادآور و F1-Score، چارچوب پیشنهادی مبتنی بر خط لوله عملکرد بهینه ای دارد و می تواند در سامانه های پزشکی به عنوان یک برنامه کاربردی مورد استفاده قرار گیرد.

    کلید واژگان: پیش بینی بیماری دیابت، یادگیری ماشین، طبقه بندی، خط لوله، انتخاب ویژگی، مساحت زیر منحنی مشخصه عملکرد
    Karlo Abnoosian, Rahman Farnoosh*, MohammadHassan Behzadi
    Introduction

    Diabetes is a chronic disease worldwide, with an increasing annual death rate. Many health professionals seek innovative ways to detect and treat it early. Rapid advances in machine learning have improved disease diagnosis. However, because of the small amount of labeled data, the frequency of null and missing values, and the imbalance of databases, creating an optimal predictor for disease diagnosis has become a great challenge. This study aimed to present a pipeline-based classification framework for predicting diabetes on two datasets of Indian diabetic patients with two classes (patient and healthy groups) and Iraqi with three classes (patient, healthy, and prediabetes groups).

    Method

    An important part of this framework is preprocessing. Different ML models based on the One-Vs-One approach for the three-class mode are implemented in the framework. Because of the imbalance of the data set, besides the accuracy evaluation criterion, the area under the receiver operating characteristic (ROC) curve is also used. To increase the level of these two criteria, the Hyper-parameters of each model are optimized using optimization methods to build a powerful model with less training and testing time through various feature selection methods.

    Results

    The proposed framework was assessed for diabetes prediction on two datasets of Indian and Iraqi diabetic patients. It was revealed that using AdaBoost for the Indian dataset (ACC=89.98, AUC=94.11) and random forest for the Iraqi dataset (ACC=98.66, AUC= 98.62), good accuracy and performance were obtained.

    Conclusion

    Regarding ACC parameters, precision, accuracy, recall, and F1-Score, the pipeline-based framework has an optimal performance in predicting diabetes, therefore, it can be used in clinical decision support systems.

    Keywords: Diabetes Prediction, Machine Learning, Classification, Pipeline, Feature Selection, The Area Under the Receiver Operating Characteristic Curve (AUC)
  • Mehmet Tahir HUYUT *

    It is important to diagnose coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 at an early stage and to monitor severely infected patients in order to reduce the lethality of the disease. In addition, there is a need for alternative methods with lower costs and faster results to determine the severity of the disease. In this context, routine blood values can be used to determine the diagnosis/prognosis and mortality of COVID-19. In this study, three optimized datasets were prepared to determine the features that affect the diagnosis, prognosis, and mortality of COVID-19. These datasets can be used by researchers to determine the diagnosis and severity of COVID-19 with various classifier machine learning models and artificial intelligence methods. It is hoped that studies on these datasets will reduce the negative pressures on the health system and provide important clinical guidance for decision-makers in the diagnosis and prognosis of COVID-19.

    Keywords: COVID-19, Diagnosis, Prognosis, Mortality, Biochemical, Hematological Biomarkers, Routine Blood Values, Feature Selection, Artificial Intelligence, Machine Learning, Neural Network
  • Fazel Amirvhaedi, Mazaher Maghsoudloo, Farhad Adhami-Moghadam
    Background

    Age-related macular degeneration (AMD) is the progressive degenerative disease of the macula and the main cause of blindness in older adults. Various risk factors have been associated with disease progression among different individuals.AMD is affected by different risk factors such as aging, genetic susceptibility, environmental risk factors and lifestyle. Since the etiology of AMD is not fully known, it would be essential to identify disease risk factors and novel predictive risk factors to detect AMD at an early stage.

    Material and Methods

    The expression data were obtained from the Gene Expression Omnibus database. Samples were quantile normalized, and log2 transformed. Furthermore, outlier samples were removed by hierarchical clustering. R limma was used to run a linear model and identify differentially expressed genes (DEGs). As a result, 33 genes were discovered with a q-value less than 0.05 and a |log (FC)|≥0.7. With a machine learning (ML) approach, DEGs were applied to discriminate between the case and control samples. Furthermore, FeatureSelect is used to extract the most effective separator genes. Nine genes were identified as the best disease discriminator genes through 11 feature selection algorithms.

    Results

    The gene set found in the study distinguishes healthy samples from patient samples with an accuracy of 87.5 %. We found DEF119B, UBD, and GRP to be three novel potential AMD candidate biomarkers using ML models and feature selection.

    Conclusion

    Machine learning can be beneficial in diagnosing, preventing and treating diseases, especially in diseases such as AMD that do not have a clear etiology.

    Keywords: Gene Expression, Machine Learning, AMD, Gene Selection, Feature Selection
  • روح الله مقصودی، میترا میرزارضایی *، مهدی صادقی، بابک نجار اعرابی
    مقدمه

    فارماکوژنومیکس و استفاده از ابزارهای هوش مصنوعی در آن یکی از جدیدترین زمینه های تحقیقاتی بیوانفورماتیک است. یکی از داروهای بسیار مهم که تعیین دوز اولیه درمانی آن کار مشکلی است، داروی ضدانعقادی وارفارین می باشد. وارفارین یک داروی ضد انعقاد خوراکی است که انتخاب دوز بهینه آن به دلیل پنجره درمانی باریک و روابط پیچیده فاکتورهای فردی، چالش برانگیز است. هدف این پژوهش تعیین دوز اولیه بهینه می باشد.

    روش

    در میان روش های مبتنی بر کرنل، مقایسه و شناسایی کرنل مناسب مورد بحث قرار نگرفته است. در این پژوهش ضمن بررسی دقیق این رویکرد، الگوریتم های مختلف انتخاب ویژگی را مورد آنالیز قرار داده و با تکیه به نظر خبرگان، زیرمجموعه مناسب از متغیرهای پیش بین موثر جهت تخمین دوز شناسایی خواهد شد.

    نتایج

    در این مطالعه از مجموعه داده ای جمع آوری شده توسط کنسرسیوم بین المللی وارفارین استفاده شده است. نتایج نشان می دهد که ماشین بردار پشتیبان با کرنل مناسب و زیرمجموعه ویژگی های پیشنهادی قادر است به طور موفقیت آمیزی دوز ایده آل وارفارین را برای درصد قابل توجهی از بیماران با خطایی حدود 0/7 میلی گرم در هفته پیش بینی کند.

    نتیجه گیری

    تخمین با نسخه حداقل مربعات رگرسیون بردار پشتیبان مبتنی بر کرنل مناسب و با یک استراتژی مناسب انتخاب ویژگی صورت گرفت. در این روش، رویکرد بهتری برای پیش بینی دوز بهینه درمانی وارفارین ارایه شده است که قادر است خطای دوزهای اشتباه و عواقب ناشی از آن را به طور قابل ملاحظه ای کاهش دهد.

    کلید واژگان: فارماکوژنومیکس، تخمین دوز اولیه وارفارین، انتخاب ویژگی، رگرسیون بردار پشتیبان حداقل مربعات
    Rouhollah Maghsoudi, Mitra Mirzarezaee *, Mehdi Sadeghi, Babak Najar-Araabi
    Introduction

    Using artificial intelligence tools in pharmacogenomics is one of the latest bioinformatics research fields. One of the most important drugs that determining its initial therapeutic dose is difficult is the anticoagulant warfarin. Warfarin is an oral anticoagulant that, due to its narrow therapeutic window and complex interrelationships of individual factors, the selection of its optimal dose is challenging.

    Method

    Among the relatively successful methods of kernel-based estimation, comparison and identification of suitable kernels have not been researched. In the present research, while carefully examining this approach, different features of selection algorithms were analyzed based on expert opinions, and an appropriate subset of efficient predictor variables was identified for dose estimation.

    Results

    In the current study, a dataset collected by the International Warfarin Consortium was used. The results showed that the support vector machine with a suitable kernel and a subset of the proposed features can successfully predict the ideal dose of warfarin for a significant percentage of patients with an error of approximately 0.7 mg per week.

    Conclusion

    The estimation was conducted using the least squares version of the support vector regression based on a suitable kernel and feature selection strategy. In this method, a better approach for predicting the optimal therapeutic dose of warfarin was presented, which can significantly reduce the wrong dose error and its consequences.

    Keywords: Pharmacogenomics, Initial Warfarin Dose Estimation, Feature Selection, Least Squares Support Vector Regression
  • Xue Bai, Wenjun Liu, Hui Huang, Huan You
    Background

    Hypertension is the main reason why the incidence of cardiovascular disease has increased year-by-year and early diagnosis of hypertension is necessary to reducing the incidence of cardiovascular disease. This also puts forward higher requirements for the accuracy of diagnosis. We tried a variety of feature selection methods to improve the accuracy of logistic regression (LR).

    Methods

    We collected 397 samples from Nanjing, Jiangsu, China between Jan 2016 and Dec 2017, including 178 hypertension samples and 219 control samples. It includes not only clinical and laboratory data, but also imaging data. We focused on the difference of imaging attributes between the control group and the hypertension group, and analyzed the correlation coefficients of all attributes. In order to establish the optimal LR model, this study tried three different feature selection methods, including statistical analysis, random forest (RF) and extreme gradient boosting (XGBoost). The area under the ROC curve (AUC) and accuracy were used as the main criterion for model evaluation.

    Results

    In the prediction of hypertension, the performance of LR with RF as the feature selection method (accuracy: 0.910; AUC: 0.924) was better than the performance of LR with XGBoost as the feature selection method (accuracy: 0.897; AUC: 0.915) and the performance of LR with statistical analysis as the feature selection method (accuracy: 0.872; AUC: 0.926).

    Conclusion

    LR with RF as the feature selection method may provide accurate results in predicting hypertension. Carotid intima-media thickness (cIMT) and pulse wave velocity at the end of systole (ESPWV) are two key imaging indicators in the prediction of hypertension.

    Keywords: Hypertension, Ultrafast pulse wave velocity, Feature selection, Logistic regression
  • Maryam Talachian *
    Background and objective

    Proper and quick diagnosis of disease is necessary in the medical field for the correct and timely treatment. This issue becomes more important when faced to different diseases with similar symptoms, such as thyroid disease, which has similar symptoms to some disease such as cardiovascular disease. Data mining and machine learning techniques are reliable and valuable methods that can improve the ability of physicians for correctly diagnosis and treatment. The main goal of this research is to extract rules of thyroid disease,

    Method

    Create the features and analyze feature selection algorithms including filter-based, wrapper based and the genetic algorithm to select the most effective features for thyroid diagnosis. The analysis also performed using decision trees models, random forest, bagging, boosting, and stacking methods for diagnosis and improvement of the illness classes precision that including Hypothyroidism and Hyperthyroidism. Model evaluation was performed with four metrics of accuracy, precision, recall, and F-measure.

    Results

    This research was conducted on data from the University of California (UCI), which included 7200 records with 21 features. Experimental results showed that the genetic algorithm (GA) has a maximum efficiency in feature selection, and the boosted tree with created features produced maximum F-measure among other classifier.

    Keywords: Thyroid Disease, Data mining, Tree Algorithms, Feature selection
  • Arash Maghsoudi, Ahmad Shalbaf*
    Introduction

    Mental arithmetic analysis based on Electroencephalogram (EEG) signals can help understand disorders, such as attention-deficit hyperactivity, dyscalculia, or autism spectrum disorder where the difficulty in learning or understanding the arithmetic exists. Most mental arithmetic recognition systems rely on features of a single channel of EEG; however, the relationships between EEG channels in the form of effective brain connectivity analysis can contain valuable information. This study aims to find distinctive, effective brain connectivity features and create a hierarchical feature selection for effectively classifying mental arithmetic and baseline tasks.

    Methods

    We estimated effective connectivity using Directed Transfer Function (DTF), direct DTF (dDTF) and Generalized Partial Directed Coherence (GPDC) methods. These measures determine the causal relationship between different brain areas. A hierarchical feature subset selection method selects the most significant effective connectivity features. Initially, Kruskal–Wallis test was performed. Consequently, five feature selection algorithms, namely, Support Vector Machine (SVM) method based on Recursive Feature Elimination, Fisher score, mutual information, minimum Redundancy Maximum Relevance (RMR), and concave minimization and SVM are used to select the best discriminative features. Finally, the SVM method was used for classification. 

    Results

    The obtained results indicated that the best EEG classification performance in 29 participants and 60 trials is obtained using GPDC and feature selection via concave minimization method in Beta2 (15-22Hz) frequency band with 89% accuracy. 

    Conclusion

    This new hierarchical automated system could be helpful in the discrimination of mental arithmetic and baseline tasks from EEG signals effectively.

    Keywords: Electroencephalogram (EEG), Mental arithmetic, Effective connectivity, Feature selection
  • Seyed Ataddin Mahmoudinejad*, Naser Safdarian
    Background

    Cardiovascular disease (CVD) is the first cause of world death, and myocardial infarction (MI) is one of the five primary disorders of CVDs which the patient electrocardiogram (ECG) analysis plays a dominant role in MI diagnosis. This research aims to evaluate some extracted features of ECG data to diagnose MI.

    Methods

    In this paper, we used the Physikalisch‑Technische Bundesanstalt database and extracted some morphological features, such as total integral of ECG, integral of the T‑wave section, integral of the QRS complex, and J‑point elevation from a cycle of normal and abnormal ECG waveforms. Since the morphology of healthy and abnormal ECG signals is different, we applied integral to different ECG cycles and intervals. We executed 100 of iterations on a 10‑fold and 5‑fold cross‑validation method and calculated the average of statistical parameters to show the performance and stability of four classifiers, namely logistic regression (LR), simple decision tree, weighted K‑nearest neighbor, and linear support vector machine. Furthermore, different combinations of proposed features were employed as a feature selection procedure based on classifier’s performance using the aforementioned trained classifiers.

    Results

    The results of our proposed method to diagnose MI utilizing all the proposed features with an LR classifier include 90.37%, 94.87%, and 86.44% for accuracy, sensitivity, specificity, respectively. Also, we calculated the standard deviation value for the accuracy of 0.006.

    Conclusion

    Our proposed classification‑based method successfully classified and diagnosed MI using different combinations of presented features. Consequently, all proposed features are valuable in MI diagnosis and are praiseworthy for future works.

    Keywords: Biological signal processing, classification, cross‑validation, electrocardiography, feature selection, linear support vector machine, myocardial infarction, simple tree, weightedK‑nearest neighbor
  • عاطفه بیگلری صالح، فرهاد سلیمانیان قره چپق*
    مقدمه

    از مشکلات اصلی در علم پزشکی، تشخیص و پیش بینی به موقع بیماری ها می باشد. استفاده از سیستم های تصمیم یار به منظور کشف دانش نهفته در مجموعه اطلاعات بیماری و در سوابق مربوط به بیماران یکی از راهکارهایی است که در زمینه تشخیص و پیشگیری از بیماری بسیار موثر می باشد. هدف اصلی از این مقاله، طراحی یک سیستم تصمیم یار پزشکی است که بتواند بیماری هپاتیت را تشخیص دهد.

    مواد و روش ها

     این مطالعه از نوع توصیفی-تحلیلی می باشد. مجموعه داده آن شامل 155 رکورد با 19 ویژگی موجود در پایگاه داده یادگیری ماشین UCI می باشد. در این مقاله، از الگوریتم جلبک مصنوعی باینری برای انتخاب ویژگی و از k نزدیک ترین همسایه برای کلاس بندی هپاتیت به دو کلاس سالم و ناسالم استفاده شده است. از 80 درصد داده ها جهت آموزش و از 20 درصد باقی مانده جهت آزمون استفاده شده است. هم چنین جهت ارزیابی مدل از شاخص های دقت، بازخوانی، F-Measure و صحت استفاده شده است.

    یافته های پژوهش

    بررسی اولیه نشان داد که درصد صحت مدل پیشنهادی برابر با 45/96 درصد می باشد. بعد از انتخاب ویژگی با الگوریتم جلبک مصنوعی درصد صحت در بهترین حالت به 36/98 درصد رسید. در مدل پیشنهادی در حالت 300 بار تکرار، مقدار معیارهای دقت، بازخوانی، F-Measure، و نرخ خطا به ترتیب برابر با 23/96 درصد، 74/96 درصد، 48/96 درصد، 55/3 درصد می باشند.

    بحث و نتیجه گیری

    هپاتیت یکی از شایع ترین بیماری ها در بین زنان و مردان می باشد. تشخیص به موقع بیماری ضمن کاهش هزینه ها، شانس درمان موفقیت آمیز بیمار را افزایش می دهد. در این مطالعه ضمن تشخیص بیماری به کمک روش ترکیبی، توانستیم با استفاده از انتخاب ویژگی به دقت بالایی در تشخیص بیماری دست یابیم.

    کلید واژگان: سیستم تصمیم یار پزشکی، تشخیص بیماری هپاتیت، الگوریتم جلبک مصنوعی باینری، k نزدیک ترین همسایه، انتخاب ویژگی
    Atefe Biglari Saleh, Farhad Soleimanian Gharehchopogh*
    Introduction

    The timely diagnosis and prediction of diseases are among the main issues in medical sciences. The use of decision-making systems to discover the underlying knowledge in the disease information package and patient records is one of the most effective ways of diagnosing and preventing disease. This study aimed to design a medical decision system that can detect hepatitis.

    Materials & Methods

    This study was conducted based on a descriptive-analytic design. Its dataset contains 155 records with 19 features in the University of California-Irvine machine learning database. This study utilized the Binary Artificial Algae Algorithm (BAAA) for Feature Selection (FS). Moreover, K-Nearest Neighbor (KNN) was used to classify hepatitis into two healthy and unhealthy classes. In total, 80% of the data was employed for training, and the remaining (20%) was used for testing. Furthermore, Precision, Recall, F-measure, and Accuracy were utilized to evaluate the model.

    Findings

    According to the results, the accuracy of the proposed model was estimated at 96.45%. After selecting the features with the BAAA, the percentage of the accuracy reached 98.36% in the best situation. In the proposed model with 300 repetitions, the Precision, Recall, F-Measure, and error rate were 96.23%, 96.74%, 96.48%, and 3.55%, respectively.

    Discussion & Conclusions

    Hepatitis is one of the most common diseases among females and males. A timely diagnosis of this disease not only reduces the costs but also increases the chance of successful treatment. In this study, the disease was diagnosed using the hybrid method, and a high accuracy level was obtained in disease diagnosis by FS.

    Keywords: Binary artificial algae algorithm, Feature selection, Hepatitis disease diagnosis, K-nearest neighbor, Medical decision making system
نکته
  • نتایج بر اساس تاریخ انتشار مرتب شده‌اند.
  • کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شده‌است. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
  • در صورتی که می‌خواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.
درخواست پشتیبانی - گزارش اشکال