|Computer Systems Science & Engineering |
Generalized Class of Mean Estimators with Known Measures for Outliers Treatment
1Department of Mathematics, College of Science, King Khalid University, Abha, 62529, Saudi Arabia
2Statistical Research and Studies Support Unit, King Khalid University, Abha, 62529, Saudi Arabia
3Department of Mathematics, Faculty of Science, Al al-Bayt University, Mafraq, 25113, Jordan
4Department of Statistics, Michael Okpara University of Agriculture, Umudike, Abia, Nigeria
5Jammu and Kashmir Institute of Mathematical Sciences, Srinagar, 190008, India
*Corresponding Author: Amer Ibrahim Al-Omari. Email: email@example.com
Received: 14 December 2020; Accepted: 14 January 2021
Abstract: In estimation theory, the researchers have put their efforts to develop some estimators of population mean which may give more precise results when adopting ordinary least squares (OLS) method or robust regression techniques for estimating regression coefficients. But when the correlation is negative and the outliers are presented, the results can be distorted and the OLS-type estimators may give misleading estimates or highly biased estimates. Hence, this paper mainly focuses on such issues through the use of non-conventional measures of dispersion and a robust estimation method. Precisely, we have proposed generalized estimators by using the ancillary information of non-conventional measures of dispersion (Gini’s mean difference, Downton’s method and probability-weighted moment) using ordinary least squares and then finally adopting the Huber M-estimation technique on the suggested estimators. The proposed estimators are investigated in the presence of outliers in both situations of negative and positive correlation between study and auxiliary variables. Theoretical comparisons and real data application are provided to show the strength of the proposed generalized estimators. It is found that the proposed generalized Huber-M-type estimators are more efficient than the suggested generalized estimators under the OLS estimation method considered in this study. The new proposed estimators will be useful in the future for data analysis and making decisions.
Keywords: Product estimators; ratio estimators; regression estimators; ordinary least square; Huber M; mean squared error; efficiency;
MSC: 62D05; 62G35
For obtaining proficient estimators in sampling theory, a multiplicity of techniques has been used and the commonly one is the simple random sampling without replacement (SRSWOR) to obtain an estimator for the population mean, when auxiliary information is not available. But when auxiliary information is available and even has a relationship with study variable, there are lots of methods by which this auxiliary information can be incorporated viz., ratio, product, difference and regression, etc. Utilizing this auxiliary information for parameters will increase the estimation efficiency. The utilization of auxiliary information has been made in a number of ways for achieving the improved estimates of population parameters. Some latest uses of auxiliary information are provided in [1–4]. As data collected from different fields, which is the basis for statistical inference, most of the time, the data will not be symmetrical and may contain outliers. The latter can distort results since the classical methods are sensitive to outliers . However, , and [7–9] have recommended different estimators that adopted different robust regression techniques when the correlation is positive. For more details of robust regression methods for obtaining mean estimation of sensitive variables by using auxiliary information, see [10–12]. In this study, we focus on a more generalized form of estimators when outliers are presented. On how to deal with that situation, we first proposed generalized estimators utilizing the auxiliary information of non-conventional measures of scattering using OLS and then finally adopting the Huber M-estimation technique on the suggested estimators, in the presence of outliers. Then, we adopted the Huber M-estimation instead of ordinary least square on the recommended generalized estimators in order to get valid findings so that our inference will be valuable for future analysis or application. Hence, the importance of our present paper is that this work uses the robust (Huber M) estimation method and non-conventional measures of dispersion, which can curb the influence of outliers in the estimation of population mean.
The rest of the paper is organized as follows. In Section 2 shows the generalized estimator, outliers present, negative correlation exist and the adaptation of the OLS method with the expressions of Bias and the mean squared error (MSE) derived up to the second degree of approximation. The generalized estimators based on adopting Huber M estimation instead of OLS and their bias and MSE equations are proposed in Section 3. Efficiency comparisons between the proposed and existing estimators are considered in Section 4. The results of the numerical examples are reported in Section 5. Discussion is devoted to Section 6, and the paper is concluded in the last section.
2 Proposed Generalized Estimators Using OLS
Let be a finite population of size M units. Let and be the response and ancillary variables, respectively. Let m be the sample size m (m < M) drawn using SRSWOR to estimate . Based on the m observations, let be the sample means which are unbiased estimators of the population means . The usual ratio and product estimators for are, respectively, and where and When , the ratio estimator is proficient and when , the product method is proficient (). Here, , and are the coefficients of variation of and and the correlation coefficient between and , respectively. Hence,
, , , , , .
Reference  proposed ratio estimators of the mean based on the simple random sampling (SRS) method as and , where and are the sample means of the variable of interest and the auxiliary variable, and represent the first and third quartiles, respectively, of the auxiliary variable Also,  introduced ratio estimators of the population mean using extreme ranked set sampling. Later,  investigated some ratio estimators of population mean using auxiliary information based on simple random sampling and the median ranked set sampling methods. Reference  investigated some ratio estimators of the population mean with missing values using the ranked set sampling method. The dual to ratio estimator is introduced firstly by Srivenkataramna , dual to ratio product estimator is discussed by Bandyopadhyay  and ratio cum product estimators are due to the valuable efforts of  and . The efforts on ratio, dual to ratio and dual to product estimators for estimation population mean using OLS method are due to  and . Reference  used the dual auxiliary information to develop a new optimal estimator. For another method using some statistical tests to construct an estimator for the finite population mean, see .
Reference  and [26–32], and ultimately, suggested generalized estimator using ancillary information for estimating the population parameters such as the mean in SRSWOR. Motivated by their works, our proposed estimators are given as
where , is a reasonably selected constant, is unknown constant and is also a suitably chosen constant, where , , and , the Gini’s mean difference, Downton’s method, probability weighted moments, respectively, or their functions. It is assumed that the population mean of the auxiliary variable is known. The is obtained by the OLS method. To determine the MSEs together with the bias, the proposed generalized estimators using OLS, where the members of this generalized class of estimator are given in Tab. 1, we let
Eqs. (1) and (2) can be transformed as
Using Taylor expansion of order 2 of for Eq. (3) we have
Therefore, the bias of the estimator is
The MSE of the proposed estimator in (1) can be obtained by using the Taylor series approximation as:
3 Proposed Class of Estimators using Huber M-Estimation
The main issue on which we focus in the present study is the proposition of a generalized class of ratio and product estimators that are suitable for data with the existence of outliers. To deal with this situation, we have adopted the Huber M-estimation technique to the developed generalized class of estimators, displayed in (1), to obtain valid results while estimating parameters in that situation, i.e.,
In adopting the Huber M-estimates, the outlier’s negative effect is reduced and valid results are obtained; hence, valid inferences will be drawn from the results. The compromise between and is the function which is used in Huber M-estimator; is the error term in regression model being the constant of the model. The function has the form
where is a tuning constant that controls the robustness of the estimator and the value of regression coefficient is obtained by minimizing
with respect to and To determine the MSE together with the bias of the developed generalized estimators using Huber M-estimation, we use Eq. (2) and transform it into Eq. (6) to obtain
Then, using the Taylor expansion of order 2 of in (7) we determine
Hence, the bias of the estimator is
and the MSE of (7) can be obtained based on the Taylor series approximation as
Substituting the different values of , , and results in some class members of this family of estimators. Also, the use of the robust measure (non-parametric) of the regression coefficient and the different non-conventional measures of dispersion helps in producing estimators that are not really affected by outliers and these estimators are mentioned in Tab. 2, that may be used when a set of data contains outliers.
4 Comparison of Efficiencies
The efficiencies of the generalized estimators using ancillary information when OLS is adopted are compared with the generalized estimator using the same ancillary information but with Huber M-estimation. For to be more efficient than , we have
Since, , either and . This implies that
When the conditions given in (14) or (15) are satisfied, a proposed class of estimators in which Huber-M is adopted is more proficient than the generalized estimators in which OLS is taken.
5 Application and Numerical Illustration
In this section, we consider three real data populations and their descriptive statistics are summarized in Tab. 3. The first population (Pop.) is taken from . The second population data is taken from the book entitled “Advanced Sampling Theory with Applications” by Singh , p. 147, Example (22.214.171.124). This second data is collected from a little town in the USA in which Psychologist want to estimate, in average, the sleep duration (in minutes) during the night for people of 50 years old and more. It is realized that there are 30 people living in the town matured 50 and over. Rather than asking everyone, the clinician chooses a SRSWOR sample of six people of this age gathering and records the data. The third population data set is taken from Myers,  in which the study is conducted on transistor gain between emitter and collector in an integrated circuit device (hFC), where emitter drive-in time (in minutes) is denoted by and gain or hFC is denoted by .
We applied to these data different class members of estimators using both proposed methods with the same auxiliary information; OLS and Huber M-estimation technique. The bias, mean squared error and percent relative efficiency (PRE) of some product types estimators for populations 1, 2 and 3 are given in Tabs. 4–6, respectively. The Tabs. 7–9 present the values of bias, MSE and PRE of some ratio types estimators for the populations 1, 2 and 3, respectively.
From Tabs. 1 and 2, it can be seen that the generalized class members of estimators can deliver various kinds of product and ratio estimators utilizing different auxiliary information under the adoption of OLS and Huber-M methods, respectively. Tabs. 4–6 present the numerical delineation of the productivity of certain members from these generalized classes of estimators. From these tables, it is found that while utilizing the same auxiliary information in the case of OLS and Huber M-estimations through product method of estimation, Huber-M-type (robust) estimators provide more efficient results than the OLS-type estimators when outliers are presented in the data. It is also observed that the Huber-M product regression estimator has the least MSE in all the populations under consideration. This is seconded by . Similarly, from Tabs. 7–9, it is found that while utilizing the same auxiliary information in case of OLS and Huber M-estimations through ratio method of estimation, Huber-M-type (robust) estimators still provide more efficient results than the OLS-type estimators when in the presence of outliers in the data. It is also observed that the Huber-M ratio regression estimator has the smallest MSE in all the populations under investigation. In the present study, we have also shown that the Huber-M-type classes of estimators have higher efficiencies than the OLS-type estimators, mainly where there exists the influence of outliers in the data. One can also generate different ratio and product estimators from the generalized class of estimators by substituting different parameters of auxiliary variable when outliers are existing in the data.
Based on the above discussion and numerical study, we can conclude that adopting Huber M instead of OLS, especially when outliers are presented, has superiority in precision (see Tabs. 7–9). The main feature of adopting the Huber M-estimation method that it provides an estimator that is easy to compute in practice with more efficient results. Beside these facts, our new proposed estimators will be useful in future study for data analysis and making decisions. Thus, a valid inference could be drawn from accurate results for future study or application, and, hence, providing better alternative estimators in practical situations. The proposed generalized estimators in this paper can be modified using different robust regression techniques  under different sampling techniques such as , systematic, two-Phase, and may be based on ranked set sampling methods [39–45].
Acknowledgement: The authors would like to thank the editor in chief and worthy referees for valuable suggestions for giving the final shape of the manuscript.
Funding Statement: The authors extend their appreciation to Deanship of Scientific Research at King Khalid University for funding this work through Research Groups Program under grant number R.G.P. 2/82/42. I.M.A. who received the grant, www.kku.edu.sa.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|