supervised machine learning
در نشریات گروه پزشکی-
Interdisciplinary Journal of Virtual Learning in Medical Sciences, Volume:15 Issue: 4, Dec 2024, PP 369 -387BackgroundOnline training has gained popularity as an effective teaching method, necessitating diligent monitoring of learner progress and engagement. The challenge of predicting academic performance in online courses is crucial for supporting learners at risk of academic loss. This study aimed to develop a robust model for predicting learners' performance using ensemble machine learning and feature engineering techniques.MethodsThis research employed a classification approach based on the Digital Electronic Education and Design Suite (DEEDS) dataset, which records real-time interactions of learners within an online educational environment. The dataset analyzed in this research included activity logs from 115 undergraduate students majoring in computer engineering who participated in a digital electronics course at the University of Genoa, Italy, between September and December 2015. Various machine learning algorithms, including Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient Boosting (GB), Light Gradient-Boosting Machine (LightGBM), and eXtreme Gradient Boosting (XGBoost), were applied. The study also utilized ensemble learning methods such as Boosting and Stacking to enhance prediction accuracy. Feature engineering techniques were implemented to extract and select relevant features from the dataset, leading to the development of a predictive model.ResultsThe proposed model achieved an accuracy of 97.43%, a precision of 96.20%, and an F1-score of 98.06%, indicating an acceptable predictive capability. Notably, the findings revealed that feature selection significantly enhanced performance; in the absence of feature selection, the accuracy dropped to 92.15%. Additionally, ensemble methods like Boosting and Stacking provided a 15% enhancement in prediction accuracy compared to traditional approaches. Overall, the integration of feature engineering and ensemble techniques acceptably optimized the model's ability to predict learners’ academic performance in online educational settings.ConclusionThis research validates the effectiveness of employing ensemble machine learning techniques and feature engineering in predicting learners’ academic performance in online education. Future studies should explore additional ensemble methods and incorporate diverse feature types to enhance prediction accuracy.Keywords: Information Science, Supervised Machine Learning, Educational, Data Mining, Dimensionality Reduction, Computer-Assisted
-
هدف
هدف اصلی یادگیری ماشین یک فرآیند پیچیده است که از طریق تعیین مدل و آموزش آن با استفاده از حجم زیادی از داده ها، انجام می شود. در گذشته، تمرکز اصلی در این زمینه بیش تر بر روی بهبود ساختار مدل ها و الگوریتم ها بوده است، اما اخیرا تمرکز بهتری به سمت کیفیت و کمیت داده ها صورت گرفته است. هدف از این مقاله ی مروری بررسی چالش ها در جمع آوری د اده ها و ارزیابی مدل در یادگیری ماشین نظارت شده و ارائه ی راه حل برای آن است.
مواد و روش هادر این مطالعه چالش های پیش روی محققان جهت جمع آوری داده و ارزیابی مدل های یادگیری ماشین نظارت شده به روش مطالعه ی مروری مورد بررسی قرار گرفت، مستندات از پایگاه های مطالعاتی PubMed، Scopus، Science Direct و موتور جست وجو Google Scholar در بازه ی زمانی 2001 الی 2023 بازیابی شد که پس از غربالگری متن کامل 17 مقاله بررسی و به مطالعه وارد شد.
یافته هادر بررسی مطالعات انجام شده چهار چالش عمده در جمع آوری داده ها در حیطه ی یادگیری ماشین نظارت شده که عبارتند از: تعداد ناکافی نمونه، داده های آموزشی غیر نماینده، کیفیت پایین داده و ویژگی های غیر مرتبط یافت شد. در ارزیابی مدل نیز با چهار چالش که عبارتند از: بیش برازش، کمبود برازش، در دسترس نبودن داده کافی جهت اعتبارسنجی و عدم تطبیق داده ها به دست آمد.
نتیجه گیریافزایش تعداد نمونه، استفاده از الگوریتم انتخاب تصادفی، پاک سازی داده، استفاده از آزمون آماری صحیح، انتخاب ویژگی، استخراج ویژگی، استفاده از مدل ساده تر، تکنیک K-fold و پردازش داده ها از جمله مواردی است که رعایت آن باعث دست یابی به مدلی با عملکرد بهتر می شود.
کلید واژگان: یادگیری ماشین نظارت شده، جمع آوری داده، ارزیابی مدلKoomesh, Volume:25 Issue: 6, 2024, PP 551 -561IntroductionThe main purpose of machine learning is a complex process that is carried out by determining the model and training it using a large volume of data. In the past, the main focus in this field was more on improving the structures of models and algorithms, but recently more emphasis has been placed on the quality and quantity of data. This article aims to provide an overview of the problems in data collection and offer a solution for them.
Materials and MethodsIn this study, the challenges faced by researchers in collecting data and evaluating supervised machine-learning models were examined through a review method. Documentation from PubMed, Scopus, Science Direct databases, and Google Scholar search engine from 2001 to 2023 was retrieved. After screening, a total of 17 full articles were reviewed and included in the study.
ResultsThe findings indicate that researchers in supervised machine learning studies face four challenges in data collection, which are: insufficient number of samples, unrepresentative training data, poor data quality, and irrelevant features, and in model evaluation, they face four challenges: overfitting, lack of generalizability, lack of sufficient data for validation, and mismatched data.
ConclusionIncreasing the sample size, utilizing a random selection algorithm, data cleansing, using the correct statistical test, feature selection, feature extraction, using a simpler model, the K-fold technique, and data processing are among the factors that contribute to achieving a model with better performance.
Keywords: Supervised Machine Learning, Data Collection, Model Evaluation -
Introduction
The presence of pigmented skin lesions is a significant global concern in the prevention of skin cancer. Detecting skin cancer at an early stage is essential for proper management and effective treatment. This study aimed to combine image processing and data mining to develop an intelligent model to screen skin cancer from skin lesions.
Material and MethodsThe images were taken in a clinic by smartphone. Patients over 40 years of age participated in the study. During the segmentation phase, the lesions were separated from the original images through machine vision techniques. Various features such as symmetry, border irregularity, color variation, and diameter were extracted from the images, while some features were also obtained through face-to-face examination. Finally, a neural network was employed to classify whether the lesion was cancerous or non-cancerous. In addition, MATLAB version 2022 was considered to design the model.
ResultsThe study results indicated excellent segmentation. Using a neural network-based model, skin lesions were classified with a high level of accuracy, with 98.4% accuracy and 97% sensitivity. The results indicated the designed model significantly screened skin cancer with high accuracy.
ConclusionThis model can help patients to manage self-care and become aware of their skin lesions before consulting a physician.
Keywords: Supervised Machine Learning, Image Processing, Skin Neoplasms, Early Detection of Cancer, Tele-Medicine -
Background
The rise of antibiotic resistance has become a major concern, signaling the end of the golden age of antibiotics. Bacterial biofilms, which exhibit high resistance to antibiotics, significantly contribute to the emergence of antibiotic resistance. Therefore, there is an urgent need to discover new therapeutic agents with specific characteristics to effectively combat biofilm-related infections. Studies have shown the promising potential of peptides as antimicrobial agents.
ObjectivesThis study aimed to establish a cost-effective and streamlined computational method for predicting the antibiofilm effects of peptides. This method can assist in addressing the intricate challenge of designing peptides with strong antibiofilm properties, a task that can be both challenging and costly.
MethodsA positive library, consisting of peptide sequences with antibiofilm activity exceeding 50%, was assembled, along with a negative library containing quorum-sensing peptides. For each peptide sequence, feature vectors were calculated, while considering the primary structure, the order of amino acids, their physicochemical properties, and their distributions. Multiple supervised learning algorithms were used to classify peptides with significant antibiofilm effects for subsequent experimental evaluations.
ResultsThe computational approach exhibited high accuracy in predicting the antibiofilm effects of peptides, with accuracy, precision, Matthew's correlation coefficient (MCC), and F1 score of 99%, 99%, 0.97, and 0.99, respectively. The performance level of this computational approach was comparable to that of previous methods. This study introduced a novel approach by combining the feature space with high antibiofilm activity.
ConclusionsIn this study, a reliable and cost-effective method was developed for predicting the antibiofilm effects of peptides using a computational approach. This approach allows for the identification of peptide sequences with substantial antibiofilm activities for further experimental investigations. Accessible source codes and raw data of this study can be found online (hiABF), providing easy access and enabling future updates.
Keywords: Antibiotics Resistance, Biofilm Inhibition, Peptides, Supervised Machine Learning -
زمینه و هدف
مطالعات متعددی نشان می دهند که میزان مرگ بیماران بستری شده به دلیل ابتلا به انفارکتوس میوکارد با افزایش قطعه ST (STEMI) در صورت وقوع شوک کاردیوژنیک (CS) به طور قابل ملاحظه ای افزایش می یابد. مشخصات دموگرافیک بیمار، نوع انفارکتوس قلبی، علایم بالینی، و روش های درمانی اتخاذشده توسط پزشکان از عوامل موثر در مرگ بیماران STEMI-CS است. در این پژوهش، یک مدل ترکیبی یادگیری ماشین نظارتی با استفاده از الگوریتم بهینه سازی آنتی کرونا (ACVO) و ماشین بردار پشتیبان (SVM) برای پیش بینی مرگ بیماران بستری شده به علت عارضه STEMI-CS ارایه شده است. مدل پیشنهادی همچنین در تعیین موثرترین پارامترها در مرگ بیماران نیز مفید است.
روش کاربه منظور پیش بینی وضعیت بیماران مبتلا به STEMI-CS، روش ACVO-SVM ارایه شده است که با دریافت علایم بیمار، مشخصات دموگرافیک، و سابقه درمانی صورت گرفته، تشخیص می دهد که بیمار زنده خواهد ماند یا خیر. روش پیشنهادی از ترکیب الگوریتم ACVO و مدل SVM ساخته شده است. دلیل استفاده از الگوریتم ACVO، انتخاب مجموعه پارامترهای موثر در پیش بینی وضعیت بیماران و تعیین مقادیر بهینه برای پارامترهای مدل SVM است تا سیستم یادگیر کیفیت بیشتری در فرآیند آموزش داشته و کارایی مطلوبی در دسته بندی داده ها فراهم کند. برای ارزیابی مدل پیشنهادی از یک مجموعه داده حاوی اطلاعات 410 بیمار بستری شده STEMI-CS در بیمارستان شهید مدنی دانشگاه علوم پزشکی تبریز، استفاده شده است. داده های جمع آوری شده مربوط به یک دوره 10 ساله از سال 1388 تا 1397 است.
یافته هامدل پیشنهادی ACVO-SVM با مدل های پیش بینی کننده مطرحی همچون رگرسیون LASSO، سیستم استنتاج فازی-عصبی تطبیقی (ANFIS)، مدل گرادیان تقویت شدید (XGBoost) و مدل SVM استاندارد مقایسه شده است. نتایج آزمایش ها نشان می دهند که مدل ACVO-SVM در قیاس با همتایان خود از کارایی طبقه بندی بهتری برخوردار است. نتایج بر روی مجموعه داده آزمون نشان داد که مشخصه سن، جنسیت، نوع انفارکتوس قلبی، مصرف سیگار، مداخلات عروقی از راه پوست و جراحی بای پس عروق کرونری به عنوان موثرترین عوامل در مرگ بیماران STEMI-CS هستند.
نتیجه گیریدر این مطالعه، یک مدل یادگیری ماشین نظارتی برای تعیین وضعیت بیماران STEMI-CS ارایه شده است. نتایج به دست آمده حاکی از آن است که مدل پیشنهادی ACVO-SVM به سادگی بر روی مجموعه داده های آموزشی مختلف قابل آموزش بوده و توانایی مناسبی برای دسته بندی بیماران دارد. در این پژوهش، ارزیابی مدل ها بر روی یک مجموعه داده کوچک صورت گرفت. بنابراین، یکی از کارهای لازم برای بهبود این پژوهش، ارزیابی روش پیشنهادی و سایر مدل های همتا بر روی مجموعه داده های بزرگ به منظور تعیین نقاط قوت و ضعف آن ها است.
کلید واژگان: انفارکتوس قلبی، شوک کاردیوژنتیک، یادگیری ماشین با نظارت، دسته بندی، مدل ACVO-SVMBackground & AimsAccording to the report released by world health organization (WHO), the ST-segment elevation myocardial infarction- cardiogenic shock (STEMI-CS) is one of the important factors in patient mortality within hospitals (1), (2), (3), (4). CS and its related complications need a huge financial and medical burden. Some researchers stated that high mortality and complication rates of STEMI-CS patients are associated with the lack of effective early preventive treatments. Given the risk of CS and the different risk factors associated with it, accurate clinical risk prediction tools need to be developed to accurately predict the onset of CS. Recently, researchers have been used various machine learning methods to predict the risk of mortality in STEMI-CS patients. Recently, machine learning (ML) methods were developed to establish predictive models to identify the in-hospital mortality risk of STEMI-CS patients. The existing methods achieved encouraging results; however, their performance is not ideal, and more effort is needed to improve the performance. The aim of this study is to present a hybrid machine learning method for predicting the risk of mortality in STEMI-CS patients. Our proposed method combines a powerful swarm intelligence strategy, anti-coronavirus optimization algorithm (ACVO) with support vector machine (SVM) in risk prediction phase. The proposed model is compared with standard support vector machine (SVM), least absolute shrinkage and selection operator (LASSO), and adaptive neuro fuzzy inference system (ANFIS) on a real-world benchmark dataset.
MethodsTo predict the mortality status of STEMI-CS patients, we proposed the ACVO-SVM algorithm. The proposed method is a hybrid machine learning algorithm that combines the SVM with ACVO algorithm to identify the most effective parameters on the death of patients. The incentive mechanism of using ACVO is to optimally configure the parameters of SVM to improve its prediction performance. The proposed ACO-SVM is also useful in determining the optimal subset of features and treatment strategies that have the greatest impact in predicting the status of STEMI-CS patients. The proposed approach models the problem of predicting the status of patients as an optimization problem. In order to determine the most effective features in predicting the survival or death of STEMI-CS patients, the proposed ACVO-SVM model is trained with different combinations of patient characteristics and adopted treatment strategies. Then the best combination of features that provides the highest performance is considered as the superior combination. To select the most effective features, first all the features are considered for training the SVM model, then the remaining features are ignored one by one and the model with the same structure is trained. The models were compared based on accuracy, recall rate, F1 criterion. Finally, the best model is used to predict the status of patients in test dataset. The data set used to evaluate the proposed method includes 410 records of patients hospitalized due to STEMI-CS complications in Shahid Madani Hospital of Tabriz University of Medical Sciences. The collected data is related to a 10-year period from 2009 to 2018. This data set includes five categories of main characteristics, which are demographic characteristics, type of myocardial infarction, risk factors, clinical symptoms, and type of treatment used. It should be noted that 80% of the records of the data set are considered as training data, and 20% of the records are considered as the test data set. The proposed method is implemented in MATLAB software.
ResultsAmong M1 to M5 feature combination models, the experimental results show that the M1 model has higher performance on the training and test dataset in terms of predicting the patient's condition compared to other combination models. Model M1 includes the combination of characteristics of age, sex, type of myocardial infarction, smoking, percutaneous vascular interventions and coronary artery bypass surgery. This shows that considering the mentioned features has the greatest effect on the final condition of STEMI-CS patients. The results are in line with previous studies (2), (3) in this field, which stated that age, gender, smoking, coronary artery bypass surgery and percutaneous vascular interventions have the greatest effect on the mortality rate of patients. The M2 model ranks second in terms of efficiency in determining the status of patients, which shows that smoking also has a greater effect on the mortality of patients with STEMI-CS. Also, the M3 model indicates that the use of the balloon pump treatment strategy, along with other demographic symptoms of the patient, history of heart infarction and smoking have a great effect on the mortality rate of patients. In summary, it can be concluded that the demographic characteristics of the patient such as age and gender, smoking, history of illness and the use of coronary bypass surgery and percutaneous vascular interventions have a great impact on the mortality of STEMI-CS patients. The proposed ACVO-SVM approach is compared with several other popular approaches, which include: standard SVM model, LASSO regression, ANFIS, and XGBoost. The experimental results justify that the proposed ACVO-SVM outperformed its counterparts.
ConclusionIn this study, a hybrid supervised machine learning model was presented to determine the status of patients with cardiogenic shock due to ST-segment elevation myocardial infarction. The proposed ACVO-SVM model uses an ACVO optimization algorithm to estimate the optimal parameters of the SVM model, making the SVM training process more efficient. The proposed model was evaluated using a dataset of patients with cardiogenic shock and the results were compared with the LASSO, ANFIS, XGBoost, and SVM models. The results showed that the proposed method worked well compared to other proposed classification models. We also found that age, gender, type of myocardial infarction, smoking, percutaneous vascular surgery, and coronary bypass transplantation surgery are the most effective factors for survival in STEMI-CS patients. In this research, the models were evaluated on a small dataset. Therefore, one of the necessary tasks to improve this research is to evaluate the proposed method and other counterpart models on large datasets to determine their strengths and weaknesses. Another limitation of this research is the lack of examination of all factors affecting the survival of STEMI-CS patients, such as blood sugar level and duration of ischemia. For this reason, it is necessary to investigate all factors affecting the mortality of STEMI-CS patients to improve the quality of classification and prediction of the final status of patients.
Keywords: Myocardial Infarction, cardiogenetic shock, Supervised machine learning, Classification, ACVO-SVM -
Background and Objective
Driver drowsiness is one of the major reasons of severe accidents worldwide. In this study, an electroencephalography (EEG) measurement-based approach has been proposed to detect driver drowsiness.
Materials and MethodsThe driving tests were conducted in a driving simulator to collect brain data in the alert and drowsy states. Nineteen healthy men participated in these tests. The EEG signals were recorded from the central, parietal, and occipital regions of the brain. 12 features of EEG signal were extracted; then through neighborhood component analysis (NCA), a feature selection method, 6 features including mean, standard deviation (SD), kurtosis, energy, entropy, and power of alpha band in 11-15 Hz, where alpha spindles occur, were selected. For the drowsiness stages assessment, the Observer Rating of Drowsiness (ORD) was applied. Four classifiers including k-nearest neighbor (KNN), support vector machine (SVM), classification tree, and Naive Bayes were employed to classify data.
ResultsThe classification trees detected drowsiness in the early stage with 88.55%. The classification results showed that if only single-channel P4 was used for detecting drowsiness, the better performance could be achieved in comparison to using data of all channels (C3, C4, P3, P4, O1, O2) together. The best performances were 93.13% which were obtained by the classification tree based on data of single-channel P4.
ConclusionThis study suggested that the driver drowsiness was detectable based on single-channel P4 in the early stage.
Keywords: Automobile driving, Electroencephalography, Supervised machine learning, Classification
- نتایج بر اساس تاریخ انتشار مرتب شدهاند.
- کلیدواژه مورد نظر شما تنها در فیلد کلیدواژگان مقالات جستجو شدهاست. به منظور حذف نتایج غیر مرتبط، جستجو تنها در مقالات مجلاتی انجام شده که با مجله ماخذ هم موضوع هستند.
- در صورتی که میخواهید جستجو را در همه موضوعات و با شرایط دیگر تکرار کنید به صفحه جستجوی پیشرفته مجلات مراجعه کنید.