L-Moments Based Calibrated Variance Estimators Using Double Stratified Sampling

: Variance is one of the most vital measures of dispersion widely employed in practical aspects. A commonly used approach for variance estimation is the traditional method of moments that is strongly influenced by the presence of extreme values, and thus its results cannot be relied on. Find-ing momentum from Koyuncu’s recent work, the present paper focuses first on proposing two classes of variance estimators based on linear moments (L-moments), and then employing them with auxiliary data under double stratified sampling to introduce a new class of calibration variance estimators using important properties of L-moments (L-location, L-cv, L-variance). Three populations are taken into account to assess the efficiency of the new estimators. The first and second populations are concerned with artificial data, and the third populations is concerned with real data. The percentage relative efficiency of the proposed estimators over existing ones is evaluated. In the presence of extreme values, our findings depict the superiority and high efficiency of the proposed classes over traditional classes. Hence, when auxiliary data is available along with extreme values, the proposed classes of estimators may be implemented in an extensive variety of sampling surveys.


Introduction
Planning is an integral part of the administrative process for the development of any field. Among the most important outputs of the planning process are the plans and programs that institutions seek to execute. One of the most important pillars of planning success is the availability of data and information that enables the decision-maker to conduct scientific analysis. In statistical literature, the additional information attached to each element is referred to as auxiliary (or ancillary, supplementary, supporting, concomitant) information. Whatever type of information is offered, it can be used to identify better sampling strategies. Auxiliary information has been used with sampling techniques for many years. The authors of [1,2] were pioneers in the usage of auxiliary information regarding the development of estimation techniques with high estimation accuracy. Recently, there have been many interesting works using auxiliary information in different ways [3][4][5][6][7][8][9][10][11][12].
In all sample surveys, the major concern is the derivation of point estimators for various parameters of interest. Nevertheless, it is equally important to evaluate the performance of these estimators. The importance of variance estimators lies primarily in the fact that the estimated variance, of any estimator, is a major component of its quality. Reference [13] pointed out that the importance of variance estimation lies in the fact that it offers an indicator of the quality of estimators. It can be used in calculating confidence intervals, and drawing accurate conclusions, and can provide indicators of data quality. The sampling design that underlies a sample survey is one of the most important factors determining both the size of sample and the procedure needed to estimate the variances. More specifically, there are many components of sample designs related to the estimation of variances, including the number of sampling stages. In single or one-stage sample designs, the stage is very direct, and the closed formula can be derived for estimation of variance. In designs with more than one stage, the state becomes complicated since there is more than one source of variance. At each stage, unit sampling (primary, secondary, etc.) leads to an additional component of variance. In cases where all other components of sampling and estimation are rather simple, a closed formula can be obtained by calculating the variance at each stage. However, common practice is to roughly estimate the variance by estimating the variation among the initial sampling units, since this is the dominant component of the overall variance. For example, with double or two-stage sampling, there are two sources of variance such as variation resulting from the selection of the primary sampling units and the variation resulting from the selection of the secondary sampling units (for more details, see [14]). There are also many studies that have employed double sampling for real data [15][16][17][18]. In this paper, we consider double stratified random sampling. With stratified sampling, the population is split into subpopulations that are not overlapping; these are known as strata and typically describe homogeneous subpopulations, resulting in reduced overall variability. A random sample is chosen from each stratum, independently of the other stratum. A stratified sampling pattern may be the same or different from that of other stratum. Consider X and Y as the auxiliary and study variables associated with a finite population of size N, and = {ν 1 , ν 2 , . . . , ν n }, where is stratified into R strata with the hth stratum including N h units. h = 1, 2, . . . , R, and R h=1 N h = N. For the first stage, a simple random sample with size n * h is chosen from the stratum h without replacement such as R h=1 n * h = n * . Then the sample n h n h < n * h for the second stage is selected. h = 1, 2, . . . , R, (x hi , y hi ) represents the observed values of X and Y with i = 1, 2, . . . , N h , and (s * 2 xh , s 2 xh ) and (s * 2 yh , s 2 yh ) represent the variances of X and Y for the first and second stage samples, respectively. In view of this double stratified sampling design, the traditional variance estimator is It is worth noting that s 2 yh is based on traditional moments and hence is highly affected by the presence of extreme values. Note also that W h = N h N is the stratum's weight.
The analysis of sample data is complex. The complexity of the analysis increases when the data contains unusual points (outliers or extreme values) that affect the robustness of the variance estimation under traditional central moments. One of the solutions to tackle this issue is to use L-moments instead of traditional central moments. L-moments provide a robust statistical framework for the analysis. L-moments [19] are determined by linear combinations of the expected values of the order statistics (O.S.). Furthermore, calibration estimation is another common statistical approach that relies on the use of auxiliary information to adjust the original weights of the design and improve the accuracy of estimators. The authors of [20] were pioneers in the use of calibration estimation with survey data and several additional works on mean estimation have been published since (for example, see [21][22][23]).
In the present paper, our objective is to develop some new classes of variance estimators for a variable of interest, based on L-moments and the calibration approach under double stratified random sampling. The remainder of this article is organized as follows. In Section 2, the L-moments and proposed classes are presented in detail. Numerical illustrations of three populations are offered in Section 3 to evaluate the performance of the new estimators. Finally, Section 4 provides conclusions.

L-Moments and Proposed Classes
Reference [19] described the L-moments as expectations of the order statistics of certain linear combinations. L-moments can be specified for any random variable for which a mean exists. They are used to describe probability distributions and estimate parameters, and their estimates are used for summarizing and describing the samples of observed data. There are many advantages of L-moments over traditional moments: they are linear data functions, they suffer less from the effects of sample change-ability, they are more robust to outliers/extreme values in data, and they enable safer inferences made from small samples about any fundamental population parameter. The general population mathematical forms of first four L-moments for the auxiliary variable X in relation to the stratum h are defined as follows: Similarly, we can write second-stage sample L-moments of the auxiliary variable as where x h(d) represents the dth order statistics with binomial coefficient (:). Similarly, we can write the L-moments expression for the first-stage sample as l * 1xl , l * 2xl , l * 3xl , and l * 4xl . Furthermore, we can write the mathematical expressions of L-moments for the study variable Y by adapting the structure of auxiliary variable X.

First Proposed Class of Estimators
The authors of [9,10] used robust regression and robust co-variance matrices methodologies for improved estimation of the population's mean. Their use of robust regression and robust co-variance matrices allows us to utilize robust moments (L-moments) instead of traditional moments. Hence, taking motivation from [21], we propose the following class of L-moments based calibration estimators of variance under double stratified sampling: where the calibrated weights are selected to minimize the measure of chi-square distance is subject to the following calibration constraints is the first-stage L-location, L-cv, and L-variance; and is the second-stage L-location, L-cv, and L-variance of X . The Lagrange function is given as where μ 11 and μ 12 are the Lagrange multiples. To obtain the optimum value for the calibration weight, we differentiate the Lagrange function with respect to γ h and set it equal to zero. Thus the weight of calibration can be obtained in the form Now, μ 11 and μ 12 can be obtained by replacing γ h in Eqs. (4) and (5) with its value given by Eq. (7). Thus, we obtain a weight of calibration of By substituting the value of γ h from Eq. (8) with that from Eq. (2), we can obtain the proposed estimator of the calibration as follows: where The members of the first proposed class are provided in Tab. 1.

Second Proposed Class of Estimators
By extending the idea of V ai , we propose the second class of estimators of variance under double stratified sampling as given below: Through using the distance of chi-square, which is subject to the following three calibration constraints: The Lagrange function is given as After taking the derivative of T with respect to γ and setting it equal to zero, we get The following equations system can be obtained by substituting Eq. (16) into Eqs. (13)-(15) respectively: Upon solving the equations system for μs, we get When substituting these μs, into Eq. (16) and then Eq. (11), we obtain the following: The members of the second proposed class are listed in Tab. 2.

Numerical Illustrations
Here, we evaluate the performance of the proposed estimators through three populations.

Simulation Design (Population-1)
In this article, we consider the population with size N = 1000. Utilizing an equal allocation of a sample with size 100 is selected from hth stratum, and the total sample size n h = 400. Furthermore, for stratum h, random variables X h and Y h are defined as follows: where X h for h = 1, 2, 3, 4 follows Gamma distributions with parameter values as given below: ε follows a standard normal distribution, and δ = 5, p = 1.6, and K = 2.
Figs. 1-4 show the scatter plots for each stratum. The existence of extreme values is clearly demonstrated by these figures and are therefore fitting for evaluating our proposed estimators. The simulation steps are as below: Step 1: Select a random sample with size n h through SRSWOR from stratum h.
Step 4: Compute the mean square error (MSE) as Step 5: Compute the percentage relative efficiency (PRE) as The estimators' PRE obtained from the above five steps are provided in Tab. 3.

Real Life Data
The apple fruit is one of the most common types of fruits. It is native to Central Asia, but today it grows worldwide with different colors and sizes. The apple fruit is rich in fiber, vitamins, and antioxidants and has many health benefits.
In the present article, we use collected apple fruit data used by [24], where Population-2: X = number of apple trees in 1999, Y = level of apple production in 1999.
Population-3: X = level of apple production in 1998, Y = level of apple production in 1999.
It should be noted that we consider 477 villages in four strata in 1999, termed (1: Marmarian), The proposed estimators V a11 and V b3 record the highest efficiency compared to other competitor estimators.
The proposed estimators V a11 and V b5 record the highest efficiency compared to other competitor estimators.
Hence, the proposed estimators V a11 and V b1 record the highest efficiency of all compared estimators.
4: Comparing the two proposed classes for each population, leads us to the following findings: 5: Overall, all the members of new classes have PRE > 100 with respect to T o , and this clearly indicates that the performance of the proposed estimators is better than that of traditional estimators.
6: Furthermore, the proposed variance estimator V a11 is the best estimator among all proposed estimators, having PREs of 478.67, 28051.41, and 77307.88 for populations 1-3, respectively.

Conclusion
The difficulty of data analysis arises from the presence of extreme values that adversely impact the variance estimation based on central moments. One of the ways to solve this issue is to use L-moments that provide a robust statistical structure for analysis. Calibration estimation is a common statistical approach that relies on the use of auxiliary information to adjust the original weights of design and to improve the accuracy of estimators. Motivation by [21], we propose new classes of estimators to estimate the population variance based on L-moments and present a calibration approach for double stratified random sampling. The percentage relative efficiency is adopted to compare the performance of the proposed estimators through three populations and through a simulation as well as application to real-life data. Our numerical results show that the proposed estimators are always superior and more efficient to existing estimators.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.