|Computers, Materials & Continua |
L-Moments Based Calibrated Variance Estimators Using Double Stratified Sampling
1Department of Mathematics and Statistics, International Islamic University, Islamabad, 44000, Pakistan
2Department of Mathematics and Statistics, PMAS-Arid Agriculture University, Rawalpindi, 46300, Pakistan
3Department of Mathematics, College of Science, King Khalid University, Abha, 62529, Saudi Arabia
4Statistical Research and Studies Support Unit, King Khalid University, Abha, 62529, Saudi Arabia
5Department of Mathematics, College of Science, Mustansiriyah University, Baghdad, 10011, Iraq
*Corresponding Author: Usman Shahzad. Email: email@example.com
Received: 19 January 2021; Accepted: 04 March 2021
Abstract: Variance is one of the most vital measures of dispersion widely employed in practical aspects. A commonly used approach for variance estimation is the traditional method of moments that is strongly influenced by the presence of extreme values, and thus its results cannot be relied on. Finding momentum from Koyuncu’s recent work, the present paper focuses first on proposing two classes of variance estimators based on linear moments (L-moments), and then employing them with auxiliary data under double stratified sampling to introduce a new class of calibration variance estimators using important properties of L-moments (L-location, L-cv, L-variance). Three populations are taken into account to assess the efficiency of the new estimators. The first and second populations are concerned with artificial data, and the third populations is concerned with real data. The percentage relative efficiency of the proposed estimators over existing ones is evaluated. In the presence of extreme values, our findings depict the superiority and high efficiency of the proposed classes over traditional classes. Hence, when auxiliary data is available along with extreme values, the proposed classes of estimators may be implemented in an extensive variety of sampling surveys.
Keywords: Variance estimation; L-moments; calibration approach; double sampling; stratified random sampling
Planning is an integral part of the administrative process for the development of any field. Among the most important outputs of the planning process are the plans and programs that institutions seek to execute. One of the most important pillars of planning success is the availability of data and information that enables the decision-maker to conduct scientific analysis. In statistical literature, the additional information attached to each element is referred to as auxiliary (or ancillary, supplementary, supporting, concomitant) information. Whatever type of information is offered, it can be used to identify better sampling strategies. Auxiliary information has been used with sampling techniques for many years. The authors of [1,2] were pioneers in the usage of auxiliary information regarding the development of estimation techniques with high estimation accuracy. Recently, there have been many interesting works using auxiliary information in different ways [3–12].
In all sample surveys, the major concern is the derivation of point estimators for various parameters of interest. Nevertheless, it is equally important to evaluate the performance of these estimators. The importance of variance estimators lies primarily in the fact that the estimated variance, of any estimator, is a major component of its quality. Reference  pointed out that the importance of variance estimation lies in the fact that it offers an indicator of the quality of estimators. It can be used in calculating confidence intervals, and drawing accurate conclusions, and can provide indicators of data quality. The sampling design that underlies a sample survey is one of the most important factors determining both the size of sample and the procedure needed to estimate the variances. More specifically, there are many components of sample designs related to the estimation of variances, including the number of sampling stages. In single or one-stage sample designs, the stage is very direct, and the closed formula can be derived for estimation of variance. In designs with more than one stage, the state becomes complicated since there is more than one source of variance. At each stage, unit sampling (primary, secondary, etc.) leads to an additional component of variance. In cases where all other components of sampling and estimation are rather simple, a closed formula can be obtained by calculating the variance at each stage. However, common practice is to roughly estimate the variance by estimating the variation among the initial sampling units, since this is the dominant component of the overall variance. For example, with double or two-stage sampling, there are two sources of variance such as variation resulting from the selection of the primary sampling units and the variation resulting from the selection of the secondary sampling units (for more details, see ). There are also many studies that have employed double sampling for real data [15–18]. In this paper, we consider double stratified random sampling. With stratified sampling, the population is split into subpopulations that are not overlapping; these are known as strata and typically describe homogeneous subpopulations, resulting in reduced overall variability. A random sample is chosen from each stratum, independently of the other stratum. A stratified sampling pattern may be the same or different from that of other stratum.
Consider and as the auxiliary and study variables associated with a finite population of size , and , where is stratified into R strata with the hth stratum including Nh units. , and . For the first stage, a simple random sample with size is chosen from the stratum without replacement such as . Then the sample nh for the second stage is selected. , represents the observed values of X and Y with , and () and () represent the variances of X and Y for the first and second stage samples, respectively. In view of this double stratified sampling design, the traditional variance estimator is
It is worth noting that is based on traditional moments and hence is highly affected by the presence of extreme values. Note also that is the stratum’s weight.
The analysis of sample data is complex. The complexity of the analysis increases when the data contains unusual points (outliers or extreme values) that affect the robustness of the variance estimation under traditional central moments. One of the solutions to tackle this issue is to use L-moments instead of traditional central moments. L-moments provide a robust statistical framework for the analysis. L-moments  are determined by linear combinations of the expected values of the order statistics (O.S.). Furthermore, calibration estimation is another common statistical approach that relies on the use of auxiliary information to adjust the original weights of the design and improve the accuracy of estimators. The authors of  were pioneers in the use of calibration estimation with survey data and several additional works on mean estimation have been published since (for example, see [21–23]).
In the present paper, our objective is to develop some new classes of variance estimators for a variable of interest, based on L-moments and the calibration approach under double stratified random sampling. The remainder of this article is organized as follows. In Section 2, the L-moments and proposed classes are presented in detail. Numerical illustrations of three populations are offered in Section 3 to evaluate the performance of the new estimators. Finally, Section 4 provides conclusions.
2 L-Moments and Proposed Classes
Reference  described the L-moments as expectations of the order statistics of certain linear combinations. L-moments can be specified for any random variable for which a mean exists. They are used to describe probability distributions and estimate parameters, and their estimates are used for summarizing and describing the samples of observed data. There are many advantages of L-moments over traditional moments: they are linear data functions, they suffer less from the effects of sample change-ability, they are more robust to outliers/extreme values in data, and they enable safer inferences made from small samples about any fundamental population parameter. The general population mathematical forms of first four L-moments for the auxiliary variable X in relation to the stratum h are defined as follows:
Similarly, we can write second-stage sample L-moments of the auxiliary variable as
where represents the dth order statistics with binomial coefficient (:). Similarly, we can write the L-moments expression for the first-stage sample as , , , and . Furthermore, we can write the mathematical expressions of L-moments for the study variable Y by adapting the structure of auxiliary variable X.
2.1 First Proposed Class of Estimators
The authors of [9,10] used robust regression and robust co-variance matrices methodologies for improved estimation of the population’s mean. Their use of robust regression and robust co-variance matrices allows us to utilize robust moments (L-moments) instead of traditional moments. Hence, taking motivation from , we propose the following class of L-moments based calibration estimators of variance under double stratified sampling:
where the calibrated weights are selected to minimize the measure of chi-square distance
is subject to the following calibration constraints
where is the second-stage L-variance of Y; is the first-stage L-location, L-cv, and L-variance; and is the second-stage L-location, L-cv, and L-variance of X.
The Lagrange function is given as
where and are the Lagrange multiples. To obtain the optimum value for the calibration weight, we differentiate the Lagrange function with respect to and set it equal to zero. Thus the weight of calibration can be obtained in the form
Now, and can be obtained by replacing in Eqs. (4) and (5) with its value given by Eq. (7). Thus, we obtain a weight of calibration of
By substituting the value of from Eq. (8) with that from Eq. (2), we can obtain the proposed estimator of the calibration as follows:
The members of the first proposed class are provided in Tab. 1.
2.2 Second Proposed Class of Estimators
By extending the idea of Vai, we propose the second class of estimators of variance under double stratified sampling as given below:
Through using the distance of chi-square,
which is subject to the following three calibration constraints:
The Lagrange function is given as
After taking the derivative of T with respect to and setting it equal to zero, we get
The following equations system can be obtained by substituting Eq. (16) into Eqs. (13)–(15) respectively:
Upon solving the equations system for s, we get
When substituting these s, into Eq. (16) and then Eq. (11), we obtain the following:
where , , and
The members of the second proposed class are listed in Tab. 2.
3 Numerical Illustrations
Here, we evaluate the performance of the proposed estimators through three populations.
3.1 Simulation Design (Population-1)
In this article, we consider the population with size . Utilizing an equal allocation of a sample with size 100 is selected from stratum, and the total sample size . Furthermore, for stratum , random variables and are defined as follows:
where for follows Gamma distributions with parameter values as given below:
follows a standard normal distribution, and , p = 1.6, and K = 2.
Figs. 1–4 show the scatter plots for each stratum. The existence of extreme values is clearly demonstrated by these figures and are therefore fitting for evaluating our proposed estimators.
The simulation steps are as below:
Step 1: Select a random sample with size nh through SRSWOR from stratum h.
Step 2: Find the value of variance estimate, say where and .
Step 3: Repeat Steps 1 and 2 for L = 5000 times. Obtain .
Step 4: Compute the mean square error (MSE) as
Step 5: Compute the percentage relative efficiency (PRE) as
The estimators’ PRE obtained from the above five steps are provided in Tab. 3.
3.2 Real Life Data
The apple fruit is one of the most common types of fruits. It is native to Central Asia, but today it grows worldwide with different colors and sizes. The apple fruit is rich in fiber, vitamins, and antioxidants and has many health benefits.
In the present article, we use collected apple fruit data used by , where
Population-2: of apple trees in 1999, of apple production in 1999.
Population-3: of apple production in 1998, of apple production in 1999.
It should be noted that we consider 477 villages in four strata in 1999, termed (1: Marmarian), (2: Agean), (3: Mediterranean), and (4: Central Anatolia). The scatter plots of extreme values for each stratum are shown in Figs. 5–12. The estimators’ PREs are computed as defined in Subsection 3.1, and are presented in Tabs. 4 and 5. The first-stage samples with sizes , , and are selected, and then from these samples the second-stage samples with sizes , , and are selected:
1: From Tab. 3, we can see that the results of Population-1 indicates that
, w.r.t. Vai
, w.r.t. Vbi.
The proposed estimators Va11 and Vb3 record the highest efficiency compared to other competitor estimators.
2: Meanwhile, the results of Population-2 in Tab. 4 indicate that
, w.r.t. Vai
, w.r.t. Vbi.
The proposed estimators Va11 and Vb5 record the highest efficiency compared to other competitor estimators.
3: The results of Population-3 (see Tab. 5) reveal that
, w.r.t. Vai
, w.r.t. Vbi.
Hence, the proposed estimators Va11 and Vb1 record the highest efficiency of all compared estimators.
4: Comparing the two proposed classes for each population, leads us to the following findings:
5: Overall, all the members of new classes have PRE > 100 with respect to To, and this clearly indicates that the performance of the proposed estimators is better than that of traditional estimators.
6: Furthermore, the proposed variance estimator Va11 is the best estimator among all proposed estimators, having PREs of 478.67, 28051.41, and 77307.88 for populations 1–3, respectively.
The difficulty of data analysis arises from the presence of extreme values that adversely impact the variance estimation based on central moments. One of the ways to solve this issue is to use L-moments that provide a robust statistical structure for analysis. Calibration estimation is a common statistical approach that relies on the use of auxiliary information to adjust the original weights of design and to improve the accuracy of estimators. Motivation by , we propose new classes of estimators to estimate the population variance based on L-moments and present a calibration approach for double stratified random sampling. The percentage relative efficiency is adopted to compare the performance of the proposed estimators through three populations and through a simulation as well as application to real-life data. Our numerical results show that the proposed estimators are always superior and more efficient to existing estimators.
Funding Statement: The authors thank the Deanship of Scientific Research at King Khalid University, Kingdom of Saudi Arabia for funding this study through the research groups program under Project Number R.G.P.1/64/42. Ishfaq Ahmad and Ibrahim Mufrah Almanjahie received the grant.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|