Magiran | Journal of Biostatistics and Epidemiology، Volume:9 Issue: 1, Winter 2023

A Modification on Intra Class Correlation Estimation for Ordinal Scale Variable Using Latent Variable Model

Samira Chaibakhsh, Asma Poorhosseingholi Pages 1-10

Background

A common way for computing test-retest reliability is Intra Class Correlation which was developed for continuous variables. But it widely used to assess test-retest reliability in questionnaires with Likert scales. Most of the time consecutive numbers regarded as option labels of a question. If the probability of choosing options be the same, using this method is logic, otherwise it is not. Therefore, in this study a modified estimator of ICC is proposed to improve the estimation of ICC for ordinal scale by using latent variable model.

Method

In this method test-retest answers were considered as bivariate variables and cumulative Probit latent variable model was fitted. A simulation study with N=1500 replicates was conducted to compare the ICC estimations of Likert scale approach with a latent variable approach. Different sample sizes (n=20, 30) was generated with different correlation parameters. The simulations were repeated for questions with 3 and 5 options with different probability of selecting options of a question. After that the two approaches were run on Beck for suicidal ideation questionnaire.

Result

In general the difference between Likert scale approach and latent variable approach were higher in 3 question options compared to 5 and also by increasing sample size and correlation between bivariate data, Root Mean Square Errors and bias were decreased. Assuming different probabilities for options, there was a considerably difference between Root Mean Square Errors, bias and standard deviation of estimation of ICC in two models. Using latent variable approach resulted less bias, SD and Root Mean Square Errors especially in lower sample sizes.

Conclusion

Simulations showed when the probability of choosing options of a question are skewed, using this method reduced Root Mean Square Errors especially when the options are less. This method was affected more on standard deviation compare to bias of estimations.

Keywords: Intraclass correlation, Reliability, Latent variable, Multivariate model, Test-retest

Survival Rate Estimation in Patients with Colorectal Cancer by applying Fuzzy Product Limit Estimator

Galawezh Khedrizadeh, Tohid Koshki, Roya Dolatkhah, Saeid Mousavi Pages 11-19

Background

Survival rates are important to show the progress of the disease and the effect of treatments. The estimation of survival probabilities especially in presence of highly censored data is challenging. In this study, Fuzzy Product Limit Estimator (FPLE) is introduced to mitigate the challenge.

Methods

In a longitudinal study, data of 173 CRC patients were analyzed. To estimate survival probabilities, mean and median survival time, Fuzzy Product Limit Estimator (FPLE), a data-driven method, was applied to the data. It provides a smooth survival probability curve and the continuation of the survival curve is not a concern in the case while the largest observed time is censored.

Results

One-year survival rate for CRC patients was estimated to be 83% using FPLE and KM methods. The five-year survival rate was estimated to be 37% and 52% by the FPLE and KM methods, respectively. The largest observed time in data (71.96 months) was censored, so the survival rate after 71.96 months was not estimable by the KM method. But 10-year and 20-year survival rates were estimated by FPLE as 0.21 and 0.09. The mean (median) survival time was estimated 45.97 (65) and 82.69 (41.70) months by KM and FPLE methods, respectively.

Conclusion

In presence of highly censored survival data, the FPLE method provides acceptable estimates of CRC patients' survival rate. Also, the continuation of the survival curve was estimated after the largest observed time. The smaller estimates by the FPLE at 5-year could be considered as warning that the actual survival rate is lower than that reported by the KM method.

Keywords: Colorectal cancer, Survival, Fuzzy logic, FPLE

Random-Splitting Random Forest with Multiple Mixed-Data Covariates

Mohammad Fayaz, Alireza Abadi, Soheila Khodakarimd Pages 20-29

Background

The bagging (BG) and random forest (RF) are famous supervised statistical learning methods based on classification and regression trees. The BG and RF can deal with different types of responses such as categorical, continuous, etc. There are curves, time series, functional data, or observations that are related to each other based on their domain in many statistical applications. The RF methods are extended to some cases for functional data as covariates or responses in many pieces of literature. Among them, random-splitting is used to summarize the functional data to the multiple related summary statistics such as average, etc.

Methods

This research article extends this method and introduces the mixed data BG (MD-BG) and RF (MD-RF) algorithm for multiple functional and non-functional, or mixed and hybrid data, covariates and it calculates the variable importance plot (VIP) for each covariate.

Results

The main differences between MD-BG and MD-RF are in choosing the covariates that in the first, all covariates remain in the model but the second uses a random sample of covariates. The MD-RF helps to unmask the most important parts of functional covariates and the most important non-functional covariates.

Conclusions

We apply our methods on the two datasets of DTI and Tecator and compare their performances for continuous and categorical responses with the developed R package (“RSRF”) in the GitHub.

Keywords: Bagging, Functional data, Random forest, Random splitting, Statistical learning

Investigation on Determinants and Choice of Contraceptive Usage among Nigeria Women of Reproductive Age

Alabi Banjoko, Waheed Yahya, Mohammed Garba, Razaq Afolayan, Kazeem Dauda, Dorcas Adewara Pages 30-45

Background

Contraception is the intentional prevention of conception and Sexually Transmitted Diseases using devices. This study investigates the prevalence, use and choice of different contraceptive methods among Nigeria women of reproductive age (15 – 49 years).

Materials and Methods

This study utilized the most current dataset from the National Demographic and Health Survey (NDHS). Chi-squares tests of Homogeneity of proportion was utilized to validate the equality of proportions for the different groups of contraception methods. Also, Multinomial Logistic Regression was employed to model the determinants of contraceptive choice among some selected factors.

Results

As reveal in this study, 83.86% of Nigeria women within the reproductive age do no use any method of contraceptives while only 16.14% use one form of contraceptives. Although, all the factors selected contributed significantly (p-value < 0.05) to the choice of contraceptive usage in this study, certain level(s) of some factors such as women from the South-West region, women with more desire for children and women within the age bracket 20 – 24 and 45 – 49 are not significant to the usage of contraception among Nigeria women. The significant factors observed in this study indicated either an increased or decreased risk in the usage of contraceptive methods.

Conclusion

The choice of contraceptive methods used by Nigerian women is influenced by most of the selected maternal and social-demographic factors used in this study. However, enlightenment on the important and use of contraceptives are needed to be put on media to increase the usage of contraceptives among Nigerian women.

Keywords: Contraceptives, NDHS, Chi-squared test, Multinomial logisticregression

The The Geometric Generalized Birnbaum–Saunders model with long-Term Survivors

Ahmad Reza Baghestani, Farid Zayeri, Mojtaba Meshkat Pages 46-58

Introduction

A cure rate survival model was developed based on the assumption that the number of competing reasons for the event of interest has the Geometric distribution and the time allocated to the event of interest follows the Generalized Birnbaum-Saunders distribution.

Methods

The Geometric GB-S distribution was defined and two useful representations were represented for its density function which contributes to the creation of some mathematical properties. Furthermore, the parameters of the model with cure rate were estimated by using the maximum likelihood method.

Results

Several simulations were performed and a real data set was analyzed from the medical area for different sample sizes and censoring percentages.

Conclusion

By considering the advantages of the GGB-S model, the model can be implemented as an appropriate alternative to explain or predict the survival time for long-term individuals.

Keywords: Cure fraction models, Generalized birnbaumsaunders distribution, Geometric distribution, Lifetime data, Fatigue Life Distribution

Beta-Geometric Regression for Modeling Count Data on First Antenatal Care Visit (ANC) with Application

Zainab Al-Balushi, Amadou Sarr, M. Mazharul Islam Pages 59-76

Introduction

Although geometric distribution, which is a special case of Negative Binomial (NB) distribution, also belongs to the discrete family of distributions, little attention has been paid to modeling count data with the geometric distribution. There are many real-life phenomena that follow the geometric distribution with a constant probability of first success. However, in practice, the probability of the first success may vary from trial to trial, making simple geometric models unsuitable for modeling such data. In this paper, assuming that the probability of the first success follows a Beta distribution, we developed a Beta-geometric distribution and Beta-geometric regression for modeling the count data that follow the geometric distribution and illustrated the suitability of the model through application to the count data on time to first antenatal care (ANC) visit.

Methods

The statistical properties of the Beta-geometric distribution are discussed. The estimation of the parameters of the distribution using the method of moments, maximum likelihood estimation (MLE) method, and Bayesian estimation approach are provided. Based on the Beta-geometric distribution, we developed a new Beta-geometric regression model for analyzing count data that follow the geometric distribution. The goodness of fit of the derived model has been tested using real data on time to the first ANC visit.

Results

Beta-geometric distribution has a simple form for its probability mass function (pmf), and is flexible in capturing both underdispersion and overdispersion that may present in count data. It was found that the proposed Beta-geometric regression model fit the count data on the first ANC visit better than simple geometric distribution or Negative Binomial distribution.

Conclusion

Unlike the Poisson or Negative Binomial distribution, Beta-geometric distribution does not need an additional parameter to accommodate underdispersion or overdispersion and thus could be a flexible choice for analyzing any count data.

Keywords: Beta-geometric regression, Geometric regression, Count data, Antenatal care visits(ANC)

Detection of Space-Time Clusters and Ambient Temperature Effects on Non-Toxigenic Vibrio Cholerae in Russia from 2005 To 2021

Vadim Leonov Pages 77-92

Introduction

The identification of climate temperature-sensitive pathogens and infectious diseases is essential in addressing health risks resulting from global warming. Such research is especially crucial in regions where climate change may have a more significant impact like Russia. Recent studies have reasoned that the abundance of V. cholerae is environmentally driven. The aim of the study is to investigate the spatial-temporal trends and thermo-climatic sensitivity of non-toxigenic V. cholerae abundance in Russia.

Methods

This study employed spatial epidemiology tools to identify persistent clusters of the V. cholerae ctx- isolation and areas for exploring temperature-depended patterns of the vibrio distribution. Correlation analysis was used to identify regions with temperature-driven Vibrio abundance in water samples.

Results

The spatial analysis detected 16 persistent (7-8 year) clusters of V. cholerae ctx- across the study period 2005-2021. The persistent clusters should become targeted areas to improve sanitation conditions. A distinct significant thermo-climatic effect on the abundance of V. cholerae ctx- in water basins was found in three Russian regions with temperate marine (the Kaliningrad region) and sharp continental climatic conditions (the Irkutsk region and the Republic of Sakha).

Conclusion

The study offers valuable outcomes to support simplified empirical evaluations of the potential hazards of vibrio abundance that might be useful locally for public health authorities and globally as a part of Russia's warning system of climate change effects.

Keywords: V. cholerae, Spatial analysis, Climate change, Space-time clustering, Russia

Variable Selection for Recurrent Events Using Heuristic Approaches: Identifying Informative Variables for Rehospitalization in Schizophrenia Patients

Mahya Arayeshgari, Leili Tapak, Sharareh Parami, Behnaz Alafchi Pages 93-120

Introduction

Recurrent event data, as a generalization of survival data, are frequently observed in various areas of medical research, including sequential hospitalizations in patients with schizophrenia. As experiencing multiple relapses during schizophrenia can have many implications, such as self-harm or harm to others, loss of education or employment, or other adverse outcomes, identifying and determining the most critical factors related to relapses in this disorder is essential. This study aimed to utilize heuristic approaches for selecting predictor variables in the field of recurrent events with an application to schizophrenia disorder.

Methods

A two-step algorithm was employed to apply a combination of two variable selection methods, recursive feature elimination (RFE) and genetic algorithm feature selection (GAFS), and four modeling techniques: Gradient boosting (GB), artificial neural network (ANN), random forest (RF), and support vector machine (SVM) to simulated recurrent event datasets.

Results

In most simulation scenarios, the results indicated that the combination of RFE and RF applied to the deviance residual (DR) outperforms the other methods. The RFE-RF-DR selected the following predictor variables: Number of children, age, marital status, and history of substance abuse.

Conclusions

Our findings revealed that the proposed machine learning-based model is a promising technique for selecting predictor variables associated with a recurrent outcome when analyzing multivariate time-to-event data with recurrent events.

Keywords: Random forest, Recursive feature elimination, Deviance residual, Recurrent event datasets, Variable selection, Schizophrenia

Prevalence of Restless Legs Syndrome in Rheumatoid Arthritis: A Systematic Review and Meta-analysis

Mehrdad Bagherpour-Kalo, Parvaneh Darabi, Ali Moghadas Jafari, Hamid Najafimehr, Kamal Azam, Mostafa Hosseini Pages 121-131

Background

Restless legs syndrome (RLS) is a common sensorimotor sleep disorder, and rheumatoid arthritis (RA) is an inflammatory autoimmune disease that causes disability. Previous studies showed that the prevalence of RLS varies in different populations of RA (13.2 – 68.4%). It raises the need for a pooled meta-analysis to determine a more reliable estimate. Therefore, we aimed to perform a meta-analysis to assess the pooled prevalence of RLS in RA patients.

Methods

Meta-analysis was performed according to the PRISMA checklist. Embase, MEDLINE, Ovid, Web-of-Science, and Scopus databases were used for the systematic search, and eligible studies were analyzed using R version 4.0.3. For further review, we performed sensitivity analyzes to identify influential studies.

Results

Of a total of 763 studies, 11 studies (3 were from Europe, 4 from North America, and 4 from Asia) were suitable for synthesis. A total of 931 RA patients were identified, 300 of whom had symptoms of RLS. The pooled prevalence of RLS among people with RA from 11 studies was 34% (95% CI: 26-43%). The pooled prevalence of RLS in Europe, Asia, and North America was 48% (95% CI: 32-65%), 32% (95% CI: 18-45%), and 28% (95% CI: 15-42%), respectively. RLS prevalence was dramatically high in RA women patients (32% CI: 23-41%) than RA men patients (3%; 95% CI: 2-5%).

Conclusion

This systematic review and meta-analysis indicates that the risk of RLS in RA patients was 34% and female patients with RA were more prone to having RLS than male patients.

Keywords: Restless legs syndrome, Rheumatoid arthritis, Prevalence, Meta-analysis

به جمع مشترکان مگیران بپیوندید!

فهرست مطالب

Journal of Biostatistics and Epidemiology
Volume:9 Issue: 1, Winter 2023

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology

به جمع مشترکان مگیران بپیوندید!

فهرست مطالب

Journal of Biostatistics and Epidemiology Volume:9 Issue: 1, Winter 2023

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology
Volume:9 Issue: 1, Winter 2023