Topp-leone odd fréchet generated family of distributions with applications to COVID-19 data sets

Recent studies have pointed out the potential of the odd Fréchet family (or class) of continuous distributions in fitting data of all kinds In this article, we propose an extension of this family through the so-called “Topp-Leone strategy”, aiming to improve its overall flexibility by adding a shape parameter The main objective is to offer original distributions with modifiable properties, from which adaptive and pliant statistical models can be derived For the new family, these aspects are illustrated by the means of comprehensive mathematical and numerical results In particular, we emphasize a special distribution with three parameters based on the exponential distribution The related model is shown to be skillful to the fitting of various lifetime data, more or less heterogeneous Among all the possible applications, we consider two data sets of current interest, linked to the COVID-19 pandemic They concern daily cases confirmed and recovered in Pakistan from March 24 to April 28, 2020 As a result of our analyzes, the proposed model has the best fitting results in comparison to serious challengers, including the former odd Fréchet model © 2020 Tech Science Press All rights reserved


Introduction
The ability of a statistical model to fit data depends on the flexibility of some characteristics of the related probability distribution (overall asymmetry, pliancy of the modes, heaviness of the tails…). In order to draw solid conclusions from a data analysis, efficient models are required. This motivated the development of versatile probability distributions based on various mathematical techniques. A myriad of families (or classes) of continuous probability distributions has been proposed in this regard. A complete overview on this subject can be found in [1]. Recent developments include families based on a cumulative distribution function (CDF) having the following exponential form: where T(y; Π) is a function defined on (0,1) which extends or modifies the following "odd transformation": odd(y) = y/(1 − y) such that: (i) it is increasing with respect to y, (ii) lim y!0 T ðy; ÅÞ ¼ þ1 and (iii) lim y!1 T ðy; ÅÞ ¼ 0, Π represents a possible generic set (or vector) of parameters, G(x; ) denotes a CDF of a parent continuous distribution, with as possible set of parameters.
With appropriate choices for the parent distributions, these "post year 2018" families offer distributions enjoying versatile shapes (symmetric, left and right skewed…), kurtosis (lepto or meso or platy-kurtic) and tails (light or heavy tails…). Thanks to the tractable exponential expression in Eq. (1), most of them are of relative simplicity in their definitions, as well as their mathematical and statistical treatments. Hence, it is reasonable to investigate some simple extensions of them, maintaining the fragile balance between the overall gain and complexity. As an immediate idea, such simple extensions can be based on a proper transformation of the CDF in Eq. (1). We can think of simple transformations involving new parameters aimed at improving certain adjustment functionalities or at making the choice of parent distribution less delicate, opening up new modeling perspectives.
In this study, we introduce a simple extension of the OFr-G family as pioneered by [2], i.e., defined by the CDF given as Fðx; h; Þ ¼ exp À½oddðGðx; ÞÞ Àh n o , x 2 R. Our extension is based on the so-called "Topp-Leone strategy". This strategy consists in transforming F(x; θ, ) by composition of the CDF of the Topp-Leone distribution specified by F(x;α) = [1 − (1 − x) 2 ] α = x α (2 − x) α , x ∈ (0,1), with α > 0. This yields the CDF expressed by The name of the related family is chosen as "Topp-Leone odd Fréchet-generated (TLOFr-G) family". The basic idea of the Topp-Leone strategy is to flexibilize a given CDF by introducing a shape parameter α, while keeping the following uniform ordering: 2 α x α ≥ F(x; α) ≥ x α , x ∈ (0,1). Thus, it provides a simple one-parameter transformation alternative to the one proposed by the power transformation. Another motivation also comes from its numerous successes in the existing literature highlighting the importance of the parameter α for improving the fitting capabilities of standard models. In this regards, we can evoke the Topp-Leone-generated (TL-G) family pioneered by [8], power Topp-Leone-generated (PTL-G) family by [9] and new power Topp-Leone-generated (NPTL-G) by [10]. The Topp-Leone strategy is also used in a less direct way to construct the generalized Topp-Leone-generated (GTL-G) family by [11], type II Topp-Leone-generated (TIITL-G) family by [12], type II generalized Topp-Leone-generated (TIIGTL-G) family by [13], and type II power Topp-Leone-generated (TIIPTL-G) family by [14].
However, to our knowledge, the use of this strategy remains new in the context of the OFr-G family, and the potential for new applications motivates this study. In this perspective, the main probabilistic and statistical characteristics of the TLOFr-G family are described, with an emphasis on the modeling framework. We provide diverse theoretical, graphical and numerical evidence on the importance of the new parameter α in these regards. For the applied side, a special focus is put on a specific TLOFr-G model, constructed with the exponential distribution as parent. It is selected for its simplicity and its overall elasticity allowing the fitting of positive left-skewed data of various kinds. In our study, as a modern contribution, we illustrate this aspect by considering two different right-skewed data sets coming from observations of the COVID-19 pandemic in Pakistan. Quite favorable results are obtained; the new model reveals to have a better fit power than five competitive models of the literature. In particular, it is better than the OFr-G model defined with the same parent model, emphasizing the advantage of using the Topp-Leone strategy. This validates the use of the considered TLOFr-G model for further analysis of the COVID-19 pandemic in other countries, among other applications of interest.
The following sections composed the rest of the study. Section 2 is devoted to the prime definition of the new family. Some mathematical facts are derived in Section 3. Inference on the model is discussed in Section 4. Applications to the COVID-19 data sets are developed in Section 5. Section 6 concludes the paper with some remarks.

Functions of the TLOFr-G Family
A first approach of the TLOFr-G family is possible by defining some functions, which is the aim of this section.

Main Probability Functions
As described in the introduction, the TLOFr-G family is defined by the CDF in Eq. (2). In expanded form, one can also expressed as Here, α and θ are both positive shape parameters. We recall that G(x; ) represents a CDF of a parent continuous distribution, with as possible set of parameters. A random variable X having this CDF will be denoted by X ∼ TLOFr-G.
As secondary essential function, the probability density function (PDF) follows by differentiation of F(x; α, θ, ) with respect to x; it is given as where g(x; ) denotes the PDF corresponding to the parent CDF G(x; ). Note that the power parameter α − 1 is negative if α ∈ (0,1), but without technical problem in the definition; this case is allowed in all the study.
From the PDF f(x; α, θ, ), the TLOFr-G family is characterized by the following property: for X ∼ TLOFr-G and any set S R, the probability that X belongs to S is In particular, one has F(x; α, θ, ) = P(X ∈ (−∞, x]).

Hazard and Reliability Functions
We now express some hazard and reliability functions which are involved in many concepts in probability and statistics. The overall theory and general definitions can be found in [15]. The survival function and hazard rate function (HRF) of the TLOFr-G family are given by and hðx; a; h; Þ ¼ 2ah respectively. Also, the reversed hazard rate and cumulative hazard rate functions of the TLOFr-G family are given by rðx; a; h; Þ ¼ 2ah Hðx; a; h; Þ ¼ À log 1 À exp Àa respectively.
Hereafter, a random variable X following the TLOFr-Ex distribution will be denoted by X ∼ TLOFr-Ex. The PDF corresponding to F(x; α, θ, ) is given as and f(x; α, θ, λ) = 0 for x ≤ 0.
We now present the corresponding HRF for its importance in modelling of lifetime phenomena. Formally, it is given as hðx; a; h; Þ ¼ 2ahe Àhx ð1 À e Àx Þ ÀhÀ1 exp Àaðe x À 1Þ and h(x; α, θ, λ) = 0 for x ≤ 0. In particular, from Fig. 1a, we see that the PDF of the TLOFr-Ex distribution is unimodal, possibly decreasing (revered J shape) or bell shape, mainly rightly skewed. Also, Fig. 1b indicates that the corresponding HRF is very flexible; a wide panel of shapes are observed, such as increasing, decreasing, U, upside-down U and constant shapes. These characteristics are important for the modelling of versatile random phenomena from the observed data.

Properties
This section discusses the following properties of the TLOFr-G family: first-order stochastic dominance, asymptotic properties, quantile function, expansions of the CDF and PDF, crude moments, order statistics, and multidimensional extensions.

First-Order Stochastic Dominance
Some stochastic relations between the TLOFr-G family and other known families exist. In particular, the following first-order stochastic dominance result holds: where F * (x; α, θ, ) is the exponentiated CDF of the OFr-G family with power parameter α, i.e., . This means that the TLOFr-G and OFr-G families can produce different kinds of models, with a certain hierarchical order.

Asymptotic Properties with Application
Some asymptotic properties of the CDF, PDF and HRF of the TLOFr-G family are described below, depending on whether G(x; ) tends to 0 or 1. In particular, this aims to clarify the role of the parameters in the limit properties. When G(x; ) tends to 0, we immediately get Let us now investigate the case where G(x; ) tends to 1. Owing to the standard equivalence: when y these equivalence together and by substituting y by G(x; ), we obtain corresponds to the HRF of the parent distribution. These results are applied to the TLOFr-Ex distribution below.
Application. The CDF, PDF and HRF of the TLOFr-Ex distribution given as Eqs. (7)-(9), respectively, possesses the following asymptotic properties. When x tends to 0, we get In this case, f(x; α, θ, λ) and h(x; α, θ, λ) tend to zero with an exponential decay which mainly depends on θ, for all values of the parameters. Also, when x tends to +∞, we get Hence, f(x; α, θ, λ) tends to zero with an exponential decay which mainly depends on θ and λ, whereas h (x; α, θ, λ) tends to a constant.

Quantile Function with Application
The quantile function of the TLOFr-G family is defined by the inverse function Q(y; α, θ, ) = F −1 (y; α, θ, ), y ∈ (0,1). Based on the expression in Eq. (3) and after some algebra, we arrive at where G −1 (y; ) denotes the quantile function of the parent distribution.
Hence, the quantile function of the TLOFr-G family is fully available and easily manipulable. This allows the determination of the standard quartiles, i.e., Q k = Q(k/4; α, θ, ), k ∈ {1,2,3}, the generation of values from the TLOFr-G family via the inversion method, and the definition of several measures of asymmetry (skewness), among others. We may refer to the book of [16], for instance.

Application. The quantile function of the TLOFr-Ex distribution is given by
Random values from X ∼ TLOFr-Ex will be generated by the use of this quantile function in Subsection 4.2.

Expansion of the CDF with Applications
Also, one can express F(x; α, θ, ) as a mixture of CDFs of the OFr-G family by using the (generalized) binomial formula. Indeed, based on Eq. (3), we have Thus, some properties of the former OFr-G family can be exploited to derive those of the proposed family. One can go further this decomposition by expressing F * (x; k + α, θ, ) according to exponentiated functions of the parent distribution. In this regard, let us consider the survival function of the parent distribution given as S(x; ) = 1 − G(x; ) which is often more easy to handle than G(x; ) in some situations. Then, it follows from the exponential series expansion and the (generalized) binomial formula that By virtue of Eqs. (10) and (11), F(x; α, θ, ) can be expressed as where A precise approximation follows by substituting the infinite limit by any large integer.
Application 2. The CDF of the TLOFr-Ex distribution given as Eq. (7) can be expressed by a simple series expansion involving exponential functions as

Expansion of the PDF with Applications
By differentiation of F(x; α, θ, ) in Eq. (12) with respect to x, we get the following series expansion for the corresponding PDF: where Ã ¼ fðk; '; mÞ 2 N 3 ; ' þ m > 0g. Thus, we express a complicated PDF as a series expansion involving more simple functions. From the computational point of view, this series expression can be more productive than processing directly with the analytical expression of the PDF. This aspect is developed in the book of [16], among others. Two direct applications of Eq. (13) are given below.
Application 1. The PDF of the TLOFr-Ex distribution specified by Eq. (8) can be expressed by a tractable series expansion involving exponential functions as We can use it to derive series expansions of various essential probabilistic and statistical tools, including moments of all kinds, as explained in a full generality in the next application.
Application. Thanks to Eq. (13), the expectation of a transformation of a random variable X ∼ TLOFr-G can be expressed in a straightforward manner. Indeed, the transfer theorem says that, for any function K(x) and X ∼ TLOFr-G, the expectation of K(X) is where provided that all the terms exist in the convergence sense, as assumed in the rest of the study.

Crude Moments and Application
The s-th crude moment of X ∼ TLOFr-G is obtained as Equivalently, by making the change of variable y = G(x; ), we can define it by dy: In the general case, there is no standard integration technique allowing nice expression for this integral. However, numerical techniques can be employed. where The coefficient u k,l,m (α, θ) as well as the integral I l,m (θ, , s) remain more practical to manipulate than the former integral definition. From the crude moments, several important measures can be derived, beginning with the mean, variance and standard deviation of X defined by l ¼ l , respectively. Also, the coefficients of skewness, kurtosis and of variation are defined by where v k,l,m (α, θ, λ) are defined by Eq. (14) and U ';m ðh; ; sÞ ¼ R þ1 0 x s e Àðh'þmÞx dx ¼ s! ÀsÀ1 ðh' þ mÞ ÀsÀ1 .
In Tabs. 1 and 2, we provide numerical values for the following measures of X ∼ TLOFr-Ex: μ, μ ′ 2 , μ ′ 3 and μ ′ 4 , σ 2 , SK, KU and CV.  Tabs. 1 and 2 indicate the possible range of values of the considered measures, as well as some empirical monotonic tendencies subject to the considered values. For instance, in Tab. 1, at λ = 1.5 and θ = 5, all the measures have tendency to increase when α increases, except the variance which decreases, and the CV too as logical consequence. In Tab. 2, we see that, at λ = 1.5 and α = 4, all the measures have tendency to decrease when θ increases, except KU between θ = 10 and θ = 12. This exception shows the complex relations between some characteristics and the values of the parameters.

Order Statistics
Some basics on the concept of order statistics in the context of the TLOFr-G family are described below.

General Formula
First of all, the PDF of the r-th order statistic of X ∼ TLOFr-G, denoted by X (r) , is defined as Hence, owing to Eqs. (3) and (4), after simplifications, we get In particular, the PDF of the maximum order statistics corresponding to r = n is given as ; x 2 R and the one of the minimum order statistics corresponding to r = 1 is obtained as ; x 2 R: These functions find numerous applications in various applied probability and statistics areas, such as reliability and survival analysis, modelling lifetime of various practical series or parallel systems. We may refer the reader to the book of [17].

Expansion of the PDF and Application
Since the functions related to the order statistics are complicated to manipulate, one can consider tractable series expansions of them. A tractable series expansion for the PDF of the r-th order statistic is described below.

Multidimensional Extensions
There are several ways to extend the dimensional applicability of the TLOFr-G family, for multivariate data analysis purposes among others. The most common way is to consider a multivariate parent distribution. That is, a natural d-dimensional TLOFr-G family is defined by the CDF given by where G(x 1 , …, x d ; ) denotes a d-dimensional CDF of a continuous distribution.
A more refined strategy can be derived through the use of copula. In our context, the copula can ensure that all the marginal distributions belong to the TLOFr-G family while keeping the control on the dependence of the corresponding marginal random random variables. Indeed, based on d CDFs of the TLOFr-G family, say F(x; α 1 , θ 1 , 1 ), …, F(x; α d , θ d , d ), we can defined a new d-dimensional distribution by considering the following CDF: where C(u 1 , …, u d ) denotes a d-dimensional copula, such as the Gaussian copula or those belonging to the Archimedean copulas family (Clayton, Frank, Gumbel…). Then, the first marginal random variable has the CDF F(x 1 ; α 1 , θ 1 , 1 ), the second marginal random variable has the CDF F(x 2 ; α 2 , θ 2 , 2 ), and so on. Also, the copula C(u 1 , …, u d ) can modulate the dependence between these random variables; the independent case corresponding to C(u 1 , …, u d ) ¼ Q d i¼1 u i . We may refer the reader to [19] for further details on the concept of copula, and all the possible applications.

Inference on the TLOFr-Ex Model
We now focus on some inferential properties of the TLOFr-Ex model, with the use of the maximum likelihood method, validated numerically by a simulation study.

On the Maximum Likelihood Method
We consider n independent observations x 1 , …, x n from X ∼ TLOFr-Ex. Then, based on Eq. (8), the loglikelihood function is obtained as Then, the maximum likelihood estimates (MLEs) of α, θ, and λ are defined by ðâ;ĥ;Þ ¼ argmax ða;h;Þ2ð0;þ1Þ 3 'ða; h; Þ: Precisely, the "ideal" MLEs can be obtained by solving simultaneously the scores equations given by The analytical solutions of these equations are not possible; they require numerical optimization techniques such as the Newton-Raphson like techniques (see [20], for instance).
The well-known asymptotic theory on the MLEs, as described in [21]  ; whose elements can be expressed analytically, with mathematical efforts. From this asymptotic result, one can construct various estimation tools for the parameters, including two sided asymptotic intervals as described below. Let us set ðvâ; vĥ; vÞ ¼ diagðJ À1 Þ. Then, the asymptotic two-sided confidence interval (CI) of α at the level 100(1 − γ)% is obtained as where where F Z (x) denotes the CDF of the (standard) normal distribution N ð0; 1Þ. The CIs of θ and λ can be expressed in a similar manner, by replacing α by θ or λ in Eq. (21), respectively. All the details on the general theory of the maximum likelihood method of estimation can be found in [21].

Monte-Carlo Simulation Study
This section provides a Monte-Carlo simulation study to validate the use of the MLEs as described above. In this regards, root mean square errors (RMSEs), standard errors (SEs), along with the lower bounds (LBs), upper bounds (UBs), average lengths (ALs) of CIs at the levels 90% and 95% are calculated. The software Mathematica 9 is employed. As a first step, we generate 5000 random samples of size n = 50, 100, 200, 500 and 1000 from X ∼ TLOFr-Ex distribution. We consider the following target values: Then, the values of the MLEs, RMSEs, and SEs, as well as those of the LBs, UBs and ALs of the ICs for the considered target values of the parameters are listed in Tabs. 3-5.
From Tabs. 3-5, we notice that the MLEs are close to the fixed values of the parameters; they near coincide for n = 1000, among other. Moreover, the RMSEs, SEs and ALs decrease to 0 when n increases, proving the efficiency of the maximum likelihood method in the framework of the TLOFr-Ex model.

Data Analysis
The prime interest of the TLOFr-Ex model is to be applied for data analysis purposes, making it useful in many applied fields. Here, we illustrate this aspect by considering two data sets based on numerical observations of the COVID-19 pandemic. We recall that the COVID-19 pandemic has spread across the whole globe since the end of 2019, claiming a considerable number of victims. The analysis of related data from various countries has been considered through various approaches. See, for instance, the exponential trend model by [22], the SIR model by [23] and the deep-learning approach by [24]. As far as we know, the first probabilistic model was proposed by [25], providing the first steps to the analysis of such data through a semi-parametric statistical approach. We contribute to the subject by applying the TLOFr-Ex model to analyze the following two datasets on the COVID-19 pandemic in Pakistan, never considered before.
The first data set, called COVID-19 data set I, is obtained from the following official electronic address: http://covid.gov.pk/stats/pakistan It contains the daily confirmed cases of COVID-19 in Pakistan from 24 March to 28 April 2020 (36 days). The considered values are collected in Tab. 6.
The second data set, called COVID-19 data set II, has the same source, i.e., http://covid.gov.pk/stats/pakistan It contains the daily recovered cases of COVID-19 in Pakistan from 24 March to 28 April 2020 (36 days). The considered values are collected in Tab. 7.
An histogram analysis shows that data sets I and II are right-skewed, with a heavy-tail for the second data set (see the histograms in Figs. 2a and 3a). These characteristics can well handled by the TLOFr-Ex model has attested by the possible shapes of the corresponding PDF exhibited in Fig. 1a. This motivates the use of the TLOFr-Ex model to adjust these two data sets. In addition, we prove that its fit power is better to those of notable "three or less parameters" competitor models, namely: Weibull-exponential (WEx) model by [26], gamma-exponentiated exponential (GEx) model by [27], cosine-sine exponential (CEx) model by [28], Odd Fréchet exponential (OFr-Ex) model by [2] and the standard exponential (Ex) model (see [29]). "Better" is employed in a sense that needs to be clarified; the details are given below.
Then, via the MLEs, we can valuate the fit behavior of the models for comparison purposes. After analysis, the (estimated) TLOFr-Ex model reveals to be the best among the competitors for the two data sets because it has the smallest values for the following goodness-of-fit measures: À', Akaike information criterion (AIC), Bayesian information criterion (BIC), Cramer-von Mises (CVM), Anderson-Darling (AD), and Kolmogorov-Smirnov (KS). Also, it has the largest KS p-value (PV). This can be deduced from Tabs. 10 and 11 for COVID-19 data sets I and II, respectively.
From the values of the criteria in Tabs. 10 and 11, we see see that the TLOFr-Ex model is far from the concurrence. Indeed, for data set I, for instance, it satisfies AIC ≈ 486 against AIC ≈ 489 for the second best    A graphical check of the fits of the all the estimated models is proposed in Figs. 2 and 3 for COVID-19 data sets I and II, respectively. More specifically, we plot the curves of the estimated PDFs and CDFs, over the corresponding histograms and empirical CDFs, respectively.
The above practical results show that the TLOFr-Ex model is efficient to adjust various kinds of data, as those of the considered COVID-19 cases in Pakistan. We believe that it can be applied for similar COVID-19 data from other countries, allowing model comparisons and, in full modesty, a better understanding of some aspects of the COVID-19 pandemic.

Conclusions
Since its creation, the odd Fréchet family introduced by [2] has shown great applicability in statistics. In this study, a new extension of this family was proposed following the "Topp-Leone strategy". We investigate the theoretical and practical interests of the new family, supported by graphical and numerical tools. An emphasis is placed on a special distribution of the family, extending the exponential distribution, through the use of three tuning parameters. Among others, it shows desirable characteristics for various statistical approaches, including data fitting. Then, the related model is considered to analyze two different data sets related to the COVID-19 pandemic in Pakistan. In this regard, new models are introduced, with better performance in comparison to other competitors, including the parent model and its main challenger: the odd Fréchet model defined with the same parent. Of course, the proposed models can be applied for the analysis of data in various fields, beyond the scope of this paper.