فهرست مطالب

Journal of Data Science and Modeling
Volume:1 Issue: 1, Jun 2022

  • تاریخ انتشار: 1400/03/11
  • تعداد عناوین: 12
|
  • Raheleh Zamini * Pages 1-9
    In various statistical model, such as density estimation and estimation of regression curves or hazardrates, monotonicity constraints can arise naturally. A frequently encountered problem in nonparametricstatistics is to estimate a monotone density function f on a compact interval. A known estimator fordensity function of f under the restriction that f is decreasing, is Grenander estimator, where is the leftderivative of the least concave majorant of the empirical distribution function of the data. Many authorsworked on this estimator and obtained very useful properties from this estimator. Grenander estimatoris a step function and as a consequence it is not smooth. In this paper, we discuss the estimation of adecreasing density function by the kernel smoothing method. Many works have been done due to theimportance and applicability of Berry-Esseen bounds for the density estimator. In this paper, we studya Berry- Esseen type bound for a smoothed version of Grenander estimator.
    Keywords: Berry-Esseen, Grenander Estimator, Kernel, Least Concave Majorant
  • Rahim Mahmoudvand *, Paulo Rodrigues Pages 11-19
    In a referendum conducted in the United Kingdom (UK) on June 23, 2016, $51.6\\%$ of the participants voted to leave the European Union (EU). The outcome of this referendum had major policy and financial impact for both UK and EU, and was seen as a surprise because the predictions consistently indicate that the ``Remain'''' would get a majority. In this paper, we investigate whether the outcome of the Brexit referendum could have been predictable by polls data. The data consists of 233 polls which have been conducted between January 2014 and June 2016 by YouGov, Populus, ComRes, Opinion, and others. The sample size range from 500 to 20058.We used Singular Spectrum Analysis (SSA) which is an increasingly popular and widely adopted filtering technique for both short and long time series. We found that the real outcome of the referendum is very close to our point estimate and within our prediction interval, which reinforces the usefulness of SSA to predict polls data.
    Keywords: Singular Spectrum Analysis, Recurrent SSA forecasting algorithm, Polls data
  • Mina Norouzirad *, Mohammad Arashi, Mahdi Roozbeh Pages 21-32
    Partial linear model is very flexible when the relation between the covariates and responses, either parametric and nonparametric. However, estimation of the regression coefficients is challenging since one must also estimate the nonparametric component simultaneously. As a remedy, the differencing approach, to eliminate the nonparametric component and estimate the regression coefficients, can be used. Here, suppose the regression vector-parameter is subjected to lie in a sub-space hypothesis. In situations where the use of difference-based least absolute and shrinkage selection operator (D-LASSO) is desired for, we propose a restricted D-LASSO estimator. To improve its performance, LASSO-type shrinkage estimators are also developed. The relative dominance picture of suggested estimators is investigated. In particular, the suitability of estimating the nonparametric component based on the Speckman approach is explored. A real data example is given to compare the proposed estimators. From the numerical analysis, it is obtained that the partial difference-based shrinkage estimators perform better than the difference-based regression model in average prediction error sense.
    Keywords: Double shrinking, Partial linear model, Preliminary test LASSO, Restricted LASSO, Stein-type Shrinkage LASSO
  • Soroush Pakniat∗ Pages 33-44

    This paper presents approximate confidence intervals for each function of parameters in a Banach space based on a bootstrap algorithm. We apply kernel density approach to estimate the persistence landscape. In addition, we evaluate the quality distribution function estimator of random variables using integrated mean square error (IMSE). The results of simulation studies show a significant improvement achieved by our approach compared to the standard version of confidence intervals algorithm. Finally, real data analysis shows that the accuracy of our method compared to that of previous works for computing the confidence interval.

    Keywords: nonparametric topological data analysis, persistence landscape, persistence homology, bootstrap method, density estimation
  • Reza Pourtaheri * Pages 45-58
    Traditionally, the statistical quality control techniques utilize either an attributes or variables product quality measure. Recently, some methods such as three-level control chart have been developed for monitoring multi attribute processes. Control chart usually has three design parameters: the sample size (n), the sampling interval (h) and the control limit coefficient (k).The design parameters of the control chart are generally specified according to statistical or/and economic criteria. The variable sampling interval (VSI) control scheme has been shown to provide an increase to the detecting efficiency of the control chart with fixed sampling rate (FRS). In this paper a method is proposed to conduct the economic-statistical design for variable sampling interval of the three-level control charts. We use the cost model developed by Costa and Rahim and optimize this model by genetic algorithm approach. We compare the expected cost per unit time of the VSI and FRS 3-level control charts. Results indicate that the proposed chart has improved performance.
    Keywords: three-level control chart, the variable sampling interval (VSI) control scheme, economic- statistical design (ESD), genetic algorithm (GA)
  • Bahman Tarvirdizade, Nader Nematollahi ∗ Pages 59-76

    In this article, we consider the problem of estimating the stress-strength reliability $Pr (X > Y)$ based on upper record values when $X$ and $Y$ are two independent but not identically distributed random variables from the power hazard rate distribution with common scale parameter $k$. When the parameter $k$ is known, the maximum likelihood estimator (MLE), the approximate Bayes estimator and the exact confidence intervals of stress-strength reliability are obtained. When the parameter $k$ is unknown, we obtain the MLE and some bootstrap confidence intervals of stress-strength reliability. We also apply the Gibbs sampling technique to study the Bayesian estimation of stress-strength reliability and the corresponding credible interval. An example is presented in order to illustrate the inferences discussed in the previous sections. Finally, to investigate and compare the performance of the different proposed methods in this paper, a Monte Carlo simulation study is conducted.

    Keywords: Bayes estimation, Maximum likelihood estimation, Monte Carlo simulation, Power hazard rate distribution, Record values, Stress-strength reliability
  • Farzad Eskandari * Pages 77-97
    Imprecise measurement tools produce imprecise data. Interval-valued data is usually used to deal with such imprecision. So interval-valued variables are used in estimation methods. They have recently been modeled by linear regression models. If response variable has any statistical distributions, interval-valued variables are modeled in generalized linear models framework. In this article, we propose a new consistent estimator of a parameter in generalized linear models with regard to distributions of response variable in the exponential family. A simulation study shows that the new estimator is better than others on the basis of particular distributions of response variable. We present optimal properties of the estimators in this research
    Keywords: Interval-valued data, Generalized linear models, Consistent estimator, Simulation, Optimal properties
  • MohammadHossein Naderi, Mohammad Bameni Moghadam *, Asghar Seif Pages 99-127

    A proper method of monitoring a stochastic system is to use the control charts of statisticalprocess control in which a drift in characteristics of output may be due to one or several assignable causes. In the establishment of X charts in statistical process control, an assumption is made that there is no correlation within the samples. However, in practice, there are many cases where the correlation does exist within the samples. It would be more appropriate to assume that each sample is a realization of a multivariatenormal random vector. Using three di erent loss functions in the concept of quality control charts with economic and economic statistical design leads to better decisions in the industry. Although some research works have considered the economic design of control charts under single assignable cause and correlated data, the economic statistical design of X control chart for multiple assignable causes and correlated data under Weibull shock model with three di erent loss functions have not been presented yet. Based on theoptimization of the average cost per unit of time and taking into account the di erent combination valuesof Weibull distribution parameters, optimal design values of sample size, sampling interval and control limitcoecient were derived and calculated. Then the cost models under non-uniform and uniform samplingscheme were compared. The results revealed that the model under multiple assignable causes with correlatedsamples with non-uniform sampling integrated with three di erent loss functions has a lower cost than themodel with uniform sampling.

    Keywords: Economic statistical design, X control chart, Multiple assignable causes
  • Ehsan Ormoz * Pages 129-141
    In the meta-analysis of clinical trials, usually the data of each trail summarized by one or more outcome measure estimates which reported along with their standard errors. In the case that summary data are multi-dimensional, usually, the data analysis will be performed in the form of a number of separated univariate analysis. In such a case the correlation between summary statistics would be ignored. In contrast, a multivariate meta-analysis model, use from these correlations synthesizes the outcomes, jointly to estimate the multiple pooled effects simultaneously. In this paper, we present a nonparametric Bayesian bivariate random effect meta-analysis.
    Keywords: Bayesian Nonparametric, Gibbs algorithm, Meta-analysis, Bivariate Distribution, Bayesian Model Selection
  • Esmaeil Shirazi * Pages 143-158
    Estimation of a quantile density function from biased data is a frequent problem in industrial life testingexperiments and medical studies. The estimation of a quantile density function in the biased nonparametric regression model is inves-tigated. We propose and develop a new wavelet-based methodology for this problem. In particular, anadaptive hard thresholding wavelet estimator is constructed. Under mild assumptions on the model, weprove that it enjoys powerful mean integrated squared error properties over Besov balls. The performanceof proposed estimator is investigated by a numerical study.In this study, we develop two types of wavelet estimators for the quantile density function when datacomes from a biased distribution function. Our wavelet hard thresholding estimator which is introducedas a nonlinear estimator, has the feature to be adaptive according to q(x). We show that these estimatorsattain optimal and nearly optimal rates of convergence over a wide range of Besov function classes.
    Keywords: Adaptivity, Biased Data, Quantile density estimation, Wavelets
  • Mozhgan Taavoni * Pages 159-170
    This paper considers an extension of the linear mixed model, called semiparametric mixed effects model, for longitudinal data, when multicollinearity is present. To overcome this problem, a new mixed ridge estimator is proposed while the nonparametric function in the semiparametric model is approximated by the kernel method. The proposed approache integrates ridge method into the semiparametric mixed effects modeling framework in order to account for both the correlation induced by repeatedly measuring an outcome on each individual over time, as well as the potentially high degree of correlation among possible predictor variables. The asymptotic normality of the exhibited estimator is established. To improve efficiency, the estimation of the covariance function is accomplished using an iterative algorithm. Performance of the proposed estimator is compared through a simulation study and analysis of CD4 data.
    Keywords: Kernel, Longitudinal Data, Mixed Effect, Ridge Regression, Semiparametric
  • Sima Naghizadeh * Pages 171-188
    The Bayesian variable selection analysis is widely used as a new methodology in air quality control trials and generalized linear models. One of the important and, of course,controversial topics in this area is selection of prior distribution of unknown model parameters. The aim of this study is presenting a substitution for mixture of priors which besidespreservation of benefits and computational efficiencies obviate the available paradoxes andcontradictions. In this research we pay attention to two points of view; empirical and fullyBayesian. Especially, a mixture of priors and its theoretical characteristics is introduced.Finally, the proposed model is illustrated with a real example.
    Keywords: Bayesian Variable Selection, Mixture of Priors, Bartlett’s Paradox, Information Paradox, Empirical Bayesian analysis