Different procedures can be used to develop predictive models for medical data with binary response. In this study, we aimed to probe the process of developing common predictive models, including decision tree (DT) and logistic regression (LR). Also, we investigated how to set the model parameters, how to develop accurate models efficiently, and how to determine the prediction efficiency.
The main purpose of this study was to find the prevalence and risk factors associated with functional dyspepsia (FD) and gastroesophageal reflux disease (GRED) in a sample of the Iranian population.
This cross-sectional study was conducted in Tehran, from May 2016 to December 2017, on 18,180 participants who were selected randomly and interviewed using a reliable questionnaire.
The areas under the ROC curve (AUC) of DT and LR were 0.93 and 0.94 for GERD and 0.98 and 0.95 for FD, respectively. Generally, 63.8% and 37.2% of the participants had FD and GRED, respectively. The results of multiple logistic regression analysis showed that men had a higher risk of FD than women. The prevalence of FD increased with increasing age.
This study showed a low rate of FD and GERD among urban people of Tehran. Also, the prediction results of both models were approximately similar. Therefore, when we deal with multiple independent variables and a binary response variable in data from a large sample, more statistical techniques and strategies should be considered in developing a prediction model.