فهرست مطالب
Journal of Data Science and Modeling
Volume:1 Issue: 1, Summer and Autumn 2022
 تاریخ انتشار: 1400/03/11
 تعداد عناوین: 12


Pages 19In various statistical model, such as density estimation and estimation of regression curves or hazardrates, monotonicity constraints can arise naturally. A frequently encountered problem in nonparametricstatistics is to estimate a monotone density function f on a compact interval. A known estimator fordensity function of f under the restriction that f is decreasing, is Grenander estimator, where is the leftderivative of the least concave majorant of the empirical distribution function of the data. Many authorsworked on this estimator and obtained very useful properties from this estimator. Grenander estimatoris a step function and as a consequence it is not smooth. In this paper, we discuss the estimation of adecreasing density function by the kernel smoothing method. Many works have been done due to theimportance and applicability of BerryEsseen bounds for the density estimator. In this paper, we studya Berry Esseen type bound for a smoothed version of Grenander estimator.Keywords: BerryEsseen, Grenander Estimator, Kernel, Least Concave Majorant

Pages 1119In a referendum conducted in the United Kingdom (UK) on June 23, 2016, $51.6\\%$ of the participants voted to leave the European Union (EU). The outcome of this referendum had major policy and financial impact for both UK and EU, and was seen as a surprise because the predictions consistently indicate that the ``Remain'''' would get a majority. In this paper, we investigate whether the outcome of the Brexit referendum could have been predictable by polls data. The data consists of 233 polls which have been conducted between January 2014 and June 2016 by YouGov, Populus, ComRes, Opinion, and others. The sample size range from 500 to 20058.We used Singular Spectrum Analysis (SSA) which is an increasingly popular and widely adopted filtering technique for both short and long time series. We found that the real outcome of the referendum is very close to our point estimate and within our prediction interval, which reinforces the usefulness of SSA to predict polls data.Keywords: Singular Spectrum Analysis, Recurrent SSA forecasting algorithm, Polls data

Pages 2132Partial linear model is very flexible when the relation between the covariates and responses, either parametric and nonparametric. However, estimation of the regression coefficients is challenging since one must also estimate the nonparametric component simultaneously. As a remedy, the differencing approach, to eliminate the nonparametric component and estimate the regression coefficients, can be used. Here, suppose the regression vectorparameter is subjected to lie in a subspace hypothesis. In situations where the use of differencebased least absolute and shrinkage selection operator (DLASSO) is desired for, we propose a restricted DLASSO estimator. To improve its performance, LASSOtype shrinkage estimators are also developed. The relative dominance picture of suggested estimators is investigated. In particular, the suitability of estimating the nonparametric component based on the Speckman approach is explored. A real data example is given to compare the proposed estimators. From the numerical analysis, it is obtained that the partial differencebased shrinkage estimators perform better than the differencebased regression model in average prediction error sense.Keywords: Double shrinking, Partial linear model, Preliminary test LASSO, Restricted LASSO, Steintype Shrinkage LASSO

Pages 3344
This paper presents approximate confidence intervals for each function of parameters in a Banach space based on a bootstrap algorithm. We apply kernel density approach to estimate the persistence landscape. In addition, we evaluate the quality distribution function estimator of random variables using integrated mean square error (IMSE). The results of simulation studies show a significant improvement achieved by our approach compared to the standard version of confidence intervals algorithm. Finally, real data analysis shows that the accuracy of our method compared to that of previous works for computing the confidence interval.
Keywords: nonparametric topological data analysis, persistence landscape, persistence homology, bootstrap method, density estimation 
Pages 4558Traditionally, the statistical quality control techniques utilize either an attributes or variables product quality measure. Recently, some methods such as threelevel control chart have been developed for monitoring multi attribute processes. Control chart usually has three design parameters: the sample size (n), the sampling interval (h) and the control limit coefficient (k).The design parameters of the control chart are generally specified according to statistical or/and economic criteria. The variable sampling interval (VSI) control scheme has been shown to provide an increase to the detecting efficiency of the control chart with fixed sampling rate (FRS). In this paper a method is proposed to conduct the economicstatistical design for variable sampling interval of the threelevel control charts. We use the cost model developed by Costa and Rahim and optimize this model by genetic algorithm approach. We compare the expected cost per unit time of the VSI and FRS 3level control charts. Results indicate that the proposed chart has improved performance.Keywords: threelevel control chart, the variable sampling interval (VSI) control scheme, economic statistical design (ESD), genetic algorithm (GA)

Pages 5976
In this article, we consider the problem of estimating the stressstrength reliability $Pr (X > Y)$ based on upper record values when $X$ and $Y$ are two independent but not identically distributed random variables from the power hazard rate distribution with common scale parameter $k$. When the parameter $k$ is known, the maximum likelihood estimator (MLE), the approximate Bayes estimator and the exact confidence intervals of stressstrength reliability are obtained. When the parameter $k$ is unknown, we obtain the MLE and some bootstrap confidence intervals of stressstrength reliability. We also apply the Gibbs sampling technique to study the Bayesian estimation of stressstrength reliability and the corresponding credible interval. An example is presented in order to illustrate the inferences discussed in the previous sections. Finally, to investigate and compare the performance of the different proposed methods in this paper, a Monte Carlo simulation study is conducted.
Keywords: Bayes estimation, Maximum likelihood estimation, Monte Carlo simulation, Power hazard rate distribution, Record values, Stressstrength reliability 
Pages 7797Imprecise measurement tools produce imprecise data. Intervalvalued data is usually used to deal with such imprecision. So intervalvalued variables are used in estimation methods. They have recently been modeled by linear regression models. If response variable has any statistical distributions, intervalvalued variables are modeled in generalized linear models framework. In this article, we propose a new consistent estimator of a parameter in generalized linear models with regard to distributions of response variable in the exponential family. A simulation study shows that the new estimator is better than others on the basis of particular distributions of response variable. We present optimal properties of the estimators in this researchKeywords: Intervalvalued data, Generalized linear models, Consistent estimator, Simulation, Optimal properties

Pages 99127
A proper method of monitoring a stochastic system is to use the control charts of statisticalprocess control in which a drift in characteristics of output may be due to one or several assignable causes. In the establishment of X charts in statistical process control, an assumption is made that there is no correlation within the samples. However, in practice, there are many cases where the correlation does exist within the samples. It would be more appropriate to assume that each sample is a realization of a multivariatenormal random vector. Using three dierent loss functions in the concept of quality control charts with economic and economic statistical design leads to better decisions in the industry. Although some research works have considered the economic design of control charts under single assignable cause and correlated data, the economic statistical design of X control chart for multiple assignable causes and correlated data under Weibull shock model with three dierent loss functions have not been presented yet. Based on theoptimization of the average cost per unit of time and taking into account the dierent combination valuesof Weibull distribution parameters, optimal design values of sample size, sampling interval and control limitcoecient were derived and calculated. Then the cost models under nonuniform and uniform samplingscheme were compared. The results revealed that the model under multiple assignable causes with correlatedsamples with nonuniform sampling integrated with three dierent loss functions has a lower cost than themodel with uniform sampling.
Keywords: Economic statistical design, X control chart, Multiple assignable causes 
Pages 129141In the metaanalysis of clinical trials, usually the data of each trail summarized by one or more outcome measure estimates which reported along with their standard errors. In the case that summary data are multidimensional, usually, the data analysis will be performed in the form of a number of separated univariate analysis. In such a case the correlation between summary statistics would be ignored. In contrast, a multivariate metaanalysis model, use from these correlations synthesizes the outcomes, jointly to estimate the multiple pooled effects simultaneously. In this paper, we present a nonparametric Bayesian bivariate random effect metaanalysis.Keywords: Bayesian Nonparametric, Gibbs algorithm, Metaanalysis, Bivariate Distribution, Bayesian Model Selection

Pages 143158Estimation of a quantile density function from biased data is a frequent problem in industrial life testingexperiments and medical studies. The estimation of a quantile density function in the biased nonparametric regression model is investigated. We propose and develop a new waveletbased methodology for this problem. In particular, anadaptive hard thresholding wavelet estimator is constructed. Under mild assumptions on the model, weprove that it enjoys powerful mean integrated squared error properties over Besov balls. The performanceof proposed estimator is investigated by a numerical study.In this study, we develop two types of wavelet estimators for the quantile density function when datacomes from a biased distribution function. Our wavelet hard thresholding estimator which is introducedas a nonlinear estimator, has the feature to be adaptive according to q(x). We show that these estimatorsattain optimal and nearly optimal rates of convergence over a wide range of Besov function classes.Keywords: Adaptivity, Biased Data, Quantile density estimation, Wavelets

Pages 159170This paper considers an extension of the linear mixed model, called semiparametric mixed effects model, for longitudinal data, when multicollinearity is present. To overcome this problem, a new mixed ridge estimator is proposed while the nonparametric function in the semiparametric model is approximated by the kernel method. The proposed approache integrates ridge method into the semiparametric mixed effects modeling framework in order to account for both the correlation induced by repeatedly measuring an outcome on each individual over time, as well as the potentially high degree of correlation among possible predictor variables. The asymptotic normality of the exhibited estimator is established. To improve efficiency, the estimation of the covariance function is accomplished using an iterative algorithm. Performance of the proposed estimator is compared through a simulation study and analysis of CD4 data.Keywords: Kernel, Longitudinal Data, Mixed Effect, Ridge Regression, Semiparametric

Pages 171188The Bayesian variable selection analysis is widely used as a new methodology in air quality control trials and generalized linear models. One of the important and, of course,controversial topics in this area is selection of prior distribution of unknown model parameters. The aim of this study is presenting a substitution for mixture of priors which besidespreservation of beneﬁts and computational eﬃciencies obviate the available paradoxes andcontradictions. In this research we pay attention to two points of view; empirical and fullyBayesian. Especially, a mixture of priors and its theoretical characteristics is introduced.Finally, the proposed model is illustrated with a real example.Keywords: Bayesian Variable Selection, Mixture of Priors, Bartlett’s Paradox, Information Paradox, Empirical Bayesian analysis