Acessibilidade / Reportar erro

Binomial-exponential 2 Distribution: Different Estimation Methods with Weather Applications

ABSTRACT

In this paper, we have considered different estimation methods of the unknown parameters of a binomial-exponential 2 distribution. First, we briefly describe different methods of estimation such as maximum likelihood, method of moments, percentile based estimation, least squares, method of maximum product of spacings, method of Cramér-von-Mises, methods of Anderson-Darling and right-tail Anderson-Darling, and compare them using extensive simulations studies. Finally, the potentiality of the model is studied using three real data sets related to the total monthly rainfall during April, May and September at São Carlos, Brazil.

Keywords:
binomial-exponential 2; maximum likelihood estimation; Cramér-von-Mises type minimum distance estimators; right-tail Anderson-Darling estimators

RESUMO

Neste trabalho, apresentamos diferentes métodos de estimação para os parâmetros da distribuição binomial-exponencial 2, tais como, estimador de máxima verossimilhança, método dos momentos, método percentil, estimador de mínimos quadrados, método do máximo produto espaçado, estimador de Cramér-von-Mises, estimador de Anderson-Darling e o estimador de Anderson-Darling com cauda a direita são apresentados. Com base em um estudo de simulação numérica, verificamos que o estimador de Anderson-Darling retorna estimativas mais eficientes se comparado com os outros estimadores. Por fim, nossa proposta é aplicada em três conjuntos de dados relacionados à pluviosidade total mensal ao longo dos meses de Abril, Maio e Setembro em São Carlos, Brasil.

Palavras-chave:
distribuição exponencial-binomial 2; estimador de máxima verossimilhança; estimador de Cramér-von-Mises; estimador de Anderson-Darling

1 INTRODUCTION

The binomial-exponential 2 (BE2) distribution has been introduced by Bakouch et al.66 H.S. Bakouch, M.A. Jazi, S. Nadarajah, A. Dolati & R. Roozegar. A lifetime model with increasing failure rate. Applied Mathematical Modelling, 38 (2014), 5392-5406. as a distribution of a random sum of independent exponential random variables when the sample size has a zero truncated binomial distribution. The BE2 distribution has the probability density function (pdf)

f ( x ; θ , λ ) = 1 + λ x - 1 θ 2 - θ λ e - λ x , (1.1)

and the cumulative distribution function (cdf)

F ( x ; θ , λ ) = 1 - 1 + λ θ x 2 - θ e - λ x , (1.2)

where 0 ≤ θ ≤ 1 is the shape parameter and λ > 0 is the scale parameter. The BE2 distribution has an increasing and constant failure rate property. A generalization of the BE2 distribution was discussed by Asgharzadeh et al.44 A. Asgharzadeh, H.S. Bakouch & M. Habibi. A generalized binomial exponential 2 distribution: modeling and applications to hydrologic events. Journal of Applied Statistics, (2016), 1-20. Available online: http://dx.doi.org/10.1080/02664763.2016.1254729
http://dx.doi.org/10.1080/02664763.2016....
.

Bakouch et al.66 H.S. Bakouch, M.A. Jazi, S. Nadarajah, A. Dolati & R. Roozegar. A lifetime model with increasing failure rate. Applied Mathematical Modelling, 38 (2014), 5392-5406. in their paper only considered the maximum likelihood estimation (MLE) method to estimate the parameters of the BE2 distribution. However it is of interest to compare the MLE method with other estimation procedures such as the method of moments, ordinary least-squares estimation (OLSE), weighted least-squares estimation (WLSE), percentile estimation (PCE), maximum product of spacings estimation (MPS), Cramé r-von-Mises type minimum distance estimation (CME), Anderson-Darling (ADE) and Right-tail Anderson-Darling estimation (RADE).

We have several estimation methods available for the parametric distribution in the literature, some of the estimation methods are well researched on theoretical aspect. However, it is worth noting that in the case of small samples, there is often evidence that the maximum likelihood method does not perform well. Therefore, other estimating methods have recently been developed. The appeal of the estimation methods vary from user to user and area of application. For instance, one may prefer to use the moment estimator even when it does not have a closed form expression. The objective of the article is to develop a guideline for choosing the best estimation method for the BE2 distribution, which would be of interest to applied statisticians. Comparisons of estimation methods for other distributions have been investigated in the literature, see e.g.,11 M.R. Alkasasbeh & M.Z. Raqab. Estimation of the generalized logistic distribution parameters:Comparative study. Statistical Methodology, 6 (2009), 262-279.), (55 A. Asgharzadeh, R. Rezaie & M. Abdi. Comparisons of methods of estimation for the half-logistic distribution. Selcuk Journal of Applied Mathematics, Special Issue (2011), 93-108.), (1111 S. Dey, T. Dey & D. Kundu. Two-parameter Rayleigh distribution: different methods of estimation. American Journal of Mathematical and Management Sciences, 33 (2014), 55-74.), (1313 R.D. Gupta & D. Kundu. Generalized exponential distribution: Different method of estimations. Journal of Statistical Computation and Simulation, 69 (2001), 315-337.), (1717 F. Louzada, P.L. Ramos & G.S. Perdoná. Different estimation procedures for the parameters of theextended exponential geometric distribution for medical data. Computational and Mathematical Methods in Medicine, 2016 (2016). Article ID 8727951, 12 pages. doi:10.1155/2016/8727951.
https://doi.org/10.1155/2016/8727951...
), (1919 J. Mazucheli, F. Louzada & M. Ghitany. Comparison of estimation methods for the parameters of the weighted Lindley distribution. Applied Mathematics and Computation, 220 (2013), 463-471.), (2121 P. Ramos & F. Louzada. The generalized weighted Lindley distribution: Properties, estimation and applications. Cogent Mathematics, 3 (2016), 1-18.), (2626 M. Teimouri, S.M. Hoseini & S. Nadarajah. Comparison of estimation methods for the Weibull distribution. Statistics, 47 (2013), 93-109..

The main goal of this paper is two fold: First is to show how different frequentist estimators of the proposed distribution perform for different sample sizes and second is to show that the distribution outperforms at least two-parameter distributions with respect to three real data sets.

Other motivation to use the BE2 distribution comes from the fact that stochastic models that accommodate zero value has vast importance in practical applications, for example in forecast models when we observe the monthly rainfall precipitation, it is common in dry periods the non occurrence of precipitation, therefore the occurrence of zero value can be observed in different measures such as the average, maximum and minimum. Popular models such as Gamma, Weibull, Lognormal and Generalized Exponential distributions do not accommodate such characteristic. In this paper we demonstrate that the BE2 distribution allows the occurrence of zero value, becoming a simple alternative to be used in weather forecast models.

The paper is organized as follows. In Section 2, we present some notes and properties for the model. In Section 3, we discuss the nine estimation methods considered in this paper. In Section 4 a simulation study is presented in order to identify the most efficient estimators. In Section 5 we apply our proposed methodology to three real data sets related to the total monthly rainfall during April, May and September at São Carlos, Brazil. Finally in Section 6 we conclude the paper.

2 NOTES AND PROPERTIES

Note that the family of Lindley distributions is a subfamily of the BE2 family for θ=22+λ. Also, for another motivation, recall that the p.d.f. of the BE2 distribution can be expressed as a two-component mixture of an exponential distribution (with scale parameter λ) and a gamma distribution (with shape 2 and scale λ), i.e. f(x; p, λ,) = pλ2 xe x + (1 - pe x where the mixing proportion p=θ2-θ.

Let X ~ BE2(θ, λ), the raw moments of X about the origin is given by

E ( X r ) = r ! λ r 1 + r θ 2 - θ , (2.1)

and the survival function of X is given by

S ( x ; θ , λ ) = 1 + λ θ x 2 - θ e - λ x .

Many distributions such as Gamma, Weibull, Lognormal, to list a few, do not allow occurrence of zero values. The following proposition prove that the BE2 distribution can be used as a model with occurrence of zero value.

Proposition 1Let X be a random variable with BE2 distribution then fX (0; θ, λ) ≥ 0 for all 0 ≤ θ ≤ 1 and λ > 0.

Proof. Note that

f X ( 0 ; θ , λ ) = d d x F X ( x ; θ , λ ) | 0 = 2 - 2 θ 2 - θ λ (2.2)

where fX (0; θ, λ) ≥ 0 for all 0 ≤ θ ≤ 1 and λ > 0.

This result allows us to use the BE2 distribution as a simple alternative in the problems with occurrence of zero value.

3 METHODS OF ESTIMATION

In this section, nine estimation procedures are discussed to obtain the estimates of the BE2 distribution parameters.

3.1 Maximum Likelihood Estimation

The method of maximum likelihood is the most frequently used method of parameter estimation. The method’s success stems no doubt from its many desirable properties including consistency, asymptotic efficiency, normality, invariance and simply its intuitive appeal. Let x 1, ..., xn be a random sample of size n from (1.1), the likelihood function of the density (1.1) is given by

L ( θ , λ ; x ) = i = 1 n f ( x i , θ , λ ) = λ n exp - λ i = 1 n x i i = 1 n 1 + ( λ x i - 1 ) θ 2 - θ (3.1)

The log-likelihood function without constant terms is given by

( θ , λ ; x ) = n log λ - λ i = 1 n x i - n log ( 2 - θ ) + i = 1 n log ( 2 - 2 θ + λ θ x i ) . (3.2)

From the expressions θl(θ,λ;x)=0,λl(θ,λ;x)=0, the likelihood equations are

n λ - i = 1 n x i + i = 1 n θ x i 2 - 2 θ + λ θ x i = 0 (3.3)

and

n 2 - θ + i = 1 n λ x i - 2 2 - 2 θ + λ θ x i = 0 . (3.4)

The maximum likelihood estimator θ^ and λ^ are obtained by solving the non-linear equations (3.3) and (3.4). It is important to point out that, non-linear optimization algorithms such as the quasi-Newton algorithm, can be used to maximize directly the likelihood function given in (3.1).

3.2 Moments Estimators

The method of moments is fairly simple procedure and has been widely used for estimating parameters in statistical models. The moments estimators (MEs) of the BE2 distribution can be obtained by equating the theoretical moments of (1.1) with the sample moments. Consider that

E ( X | θ , λ ) = 2 λ ( 2 - θ ) a n d V a r ( X | θ , λ ) = 2 ( 2 - θ 2 ) λ 2 ( 2 - θ ) 2 · (3.5)

are the theoretical moments of the BE2 distribution. Note that, the population coefficient of variation given by

C V ( X | θ , λ ) = 2 ( 2 - θ 2 ) 2

is independent of the scale parameter λ. So, the estimator θ^MME for θ and λ^MME for λ, can be easily obtained by solving

θ ^ M M E = 2 - 2 s x ¯ 2 , a n d λ ^ M M E = 2 x ¯ 2 - 2 - 2 s x ¯ 2 (3.6)

where x¯ and s are the sample mean and sample standard deviation respectively.

3.3 Least-Square Estimators

The ordinary least square and the weighted least square are well known methods used for estimating the unknown parameters2525 J.J. Swain, S. Venkatraman & J.R. Wilson. Least-squares estimation of distribution functions in Johnson’s translation system. Journal of Statistical Computation and Simulation, 29 (1988), 271-297.. Let F(x) be the distribution function of the random variables {X 1, X 2, ..., Xn } and X (1) < X (2) < ... < X ( n ) be ordered random variables. The least square estimators of θ and λ, denoted by θ^LSE and λ^LSE can be obtained by minimizing the function

L S θ , λ = i = 1 n F x ( i ) θ , λ - i n + 1 2 (3.7)

with respect to θ and λ, where F(⋅) is given by (1.2). Equivalently, they can be obtained by solving the following non-linear equations:

i = 1 n F x ( i ) θ , λ - i n + 1 η 1 x ( i ) θ , λ = 0 , i = 1 n F x ( i ) θ , λ - i n + 1 η 2 x ( i ) θ , λ = 0 .

Consider the following weighted function (see Gupta & Kundu1313 R.D. Gupta & D. Kundu. Generalized exponential distribution: Different method of estimations. Journal of Statistical Computation and Simulation, 69 (2001), 315-337.)

w i = 1 V a r ( F ( x ( i ) ) ) = n + 1 2 n + 2 i n - i + 1 .

The WLSEs, θ^WLSE and λ^WLSE , can be obtained by minimizing

WLS θ , λ = i = 1 n n + 1 2 n + 2 i n - i + 1 F x ( i ) θ , λ - i n + 1 2 . (3.8)

These estimators can also be obtained by solving:

i = 1 n n + 1 2 n + 2 i n - i + 1 F x ( i ) θ , λ - i n + 1 η 1 x ( i ) θ , λ = 0 , i = 1 n n + 1 2 n + 2 i n - i + 1 F x ( i ) θ , λ - i n + 1 η 2 x ( i ) θ , λ = 0 ,

where

η 1 x ( i ) θ , λ = - 2 λ x ( i ) e - λ x ( i ) ( 2 - θ ) 2 , (3.9)

and

η 2 x ( i ) θ , λ = x ( i ) e - λ x ( i ) ( 1 + λ θ x ( i ) 2 - θ ) - θ x ( i ) e - λ x ( i ) 2 - θ . (3.10)

3.4 Percentile Estimators

The percentile estimators is originally suggested by Kao1515 J.H. Kao. Computer methods for estimating Weibull parameters in reliability studies. IRE Transactions on Reliability and Quality Control, 13 (1958), 15-22.), (1616 J.H. Kao. A graphical estimation of mixed Weibull parameters in life-testing of electron tubes. Technometrics, 1 (1959), 389-407.. This method is commonly used to estimate the unknown parameters from the distribution functions that has a closed form of the quantile function. The percentile estimates (PCEs) can be obtained by minimizing with respect unknown parameters, the Euclidean distance between the ordered sample points and ordered theoretical points, computed throughout the quantile function. Since,

F ( x , θ , λ ) = 1 - 1 + λ θ x 2 - θ e - λ x

therefore, the quantile function is given by

x p = 1 λ log 2 - θ + λ θ x p 2 - θ 1 - p .

Let X ( j ) be the jth order statistics, i.e., X (1) < X (2) < ... < X ( n ). If pj denotes some estimators of F(x ( j ); θ, λ), then the estimators of θ and λ can be obtained by minimizing

j = 1 n x ( j ) - 1 λ ln 2 - θ + λ θ x p 2 - θ 1 - p j 2 (3.11)

with respect to θ and λ. The percentile estimators θ^PCE and λ^PCE can be obtained by solving the following nonlinear equations

j = 1 n x j - 1 λ log ( 2 - θ + λ θ x p ( 2 - θ ) ( 1 - p j ) x p ( 2 - θ + λ θ x p ) ( 2 - θ ) ) = 0 , j = 1 n x j - 1 λ log ( 2 - θ + λ θ x p ( 2 - θ ) ( 1 - p j ) 1 λ 2 log ( 2 - θ + λ θ x p ) ( 2 - θ ) ( 1 - p j ) - 1 λ θ x p ( 2 - θ + λ θ x p ) = 0 ,

respectively. In this paper, we consider the estimator of pj as pj=jn+1. However, different estimators can be used instead, see for example Mann, et al. (1974).

3.5 Method of Maximum Product of Spacings

The maximum product spacing (MPS) method has been introduced by Cheng & Amin1919 J. Mazucheli, F. Louzada & M. Ghitany. Comparison of estimation methods for the parameters of the weighted Lindley distribution. Applied Mathematics and Computation, 220 (2013), 463-471. as an alternative to MLE for the estimation of the unknown parameters of continuous univariate distributions. The MPS method was also derived independently by Ranneby2222 B. Ranneby. The maximum spacing method. An estimation method related to the maximum likelihood method. Scandinavian Journal of Statistics, 11 (1984), 93-112. as an approximation to the Kullback-Leibler measure of information. This method is as efficient as the MLE estimators and consistent under more general conditions. Using the same notations in subsection 3.3, define the uniform spacings of a random sample from the BE2 distribution as:

D i ( θ , λ ) = F x i : n θ , λ - F x i - 1 : n θ , λ , i = 1 , 2 , , n ,

where F(x0:n | θ, λ) = 0 and F(xn+1:n| θ, λ) = 1. Clearly i=1n+1Di(θ,λ)=1.

The maximum product of spacings estimators θ^MPS and λ^MPS, of the parameters θ and λ are obtained by maximizing the geometric mean of the spacings:

G θ , λ = i = 1 n + 1 D i ( θ , λ ) 1 n + 1 , (3.12)

or, equivalently, by maximizing the function

g θ , λ = 1 n + 1 i = 1 n + 1 log D i ( θ , λ ) . (3.13)

with respect to θ and λ. Although Cheng & Amin99 R. Cheng & N. Amin. Estimating parameters in continuous univariate distributions with a shifted origin. Journal of the Royal Statistical Society, Series B (Methodological), 45 (1983), 394-403. proved that the MPS is asymptotically equivalent to the MLE, the authors do not present a motivation in maximizing the geometric mean. However, Cheng & Stephens1010 R. Cheng & M. Stephens. A goodness-of-fit test using Moran’s statistic with estimated parameters. Biometrika, (1989), 385-392. showed that the MPS is also a minimum goodness of fit estimator based on the Moran’s statistics given by

M θ , λ = - i = 1 n + 1 log D i ( θ , λ )

i.e., to find the minimum of the Moran’s statistics is the same as finding the maximum of the geometric mean of the spacings. Hence, the estimators θ^MPS and λ^MPS of the parameters θ and λ can be obtained by solving the nonlinear equations

g θ , λ θ = 1 n + 1 i = 1 n + 1 1 D i ( θ . λ ) η 1 ( x ( i ) | θ . λ ) - η 1 ( x i - 1 : n | θ , λ ) = 0 , (3.14)

g θ , λ λ = 1 n + 1 i = 1 n + 1 1 D i ( θ , λ ) η 2 ( x ( i ) | θ , λ ) - η 2 ( x i - 1 : n | θ , λ ) = 0 , (3.15)

where η 1·| θ, λ) and η 2·| θ, λ) are given by (9) and (10), respectively.

3.6 Methods of Minimum Distances

In this subsection, we present three minimum distance estimators (also called maximum goodness-of-fit estimators) for θ and λ. This class of estimators are based on minimizing any empirical distribution function (EDF) statistics with respect to the unknown parameters1818 A. Luceño. Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators. Computational Statistics & Data Analysis, 51 (2006), 904-917..

3.6.1 Method of Cramér-von-Mises

To motivate our choice of Cramér-von-Mises (CVM) type minimum distance estimators, MacDonald (1971) provided empirical evidence that the bias of the estimator is smaller than the other minimum distance estimators. Thus, the proposed estimators are based on the Cramér-von Mises statistics given by

W n 2 = n - F ( x ( i ) ) - E n ( x ( i ) ) 2 d F ( x ( i ) )

where En (·) is the empirical density function. Boos77 D.D. Boos. Minimum distance estimators for location and goodness of fit. Journal of the American Statistical Association, 76 (1981), 663-670. presented a detailed discussion about this CVM estimator. Moreover, the author presented its computational form which is given by

C ( θ , λ ) = 1 12 n + i = 1 n F x ( i ) θ , λ - 2 i - 1 2 n 2 . (3.16)

Then the CMV estimators are obtained by minimizing (3.16) with respect to θ and λ. These estimators can also be obtained by solving the following non-linear equations:

i = 1 n F x ( i ) θ , λ - 2 i - 1 2 n η 1 x ( i ) θ , λ = 0 , i = 1 n F x ( i ) θ , λ - 2 i - 1 2 n η 2 x ( i ) θ , λ = 0 ,

where η 1·| θ, λ) and η 2·| θ, λ) are given by (9) and (10), respectively.

3.6.2 Methods of Anderson-Darling and Right-tail Anderson-Darling

The Anderson-Darling estimator is another type of minimum distance estimator and is based on an Anderson-Darling statistic (Anderson & Darling,22 T.W. Anderson & D.A. Darling. Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. The Annals of Mathematical Statistics, 23(2) (1952), 193-212.), (33 T.W. Anderson & D.A. Darling. A test of goodness of fit. Journal of the American Statistical Association, 49 (1954), 765-769.). The Anderson-Darling statistic is given by

ADS n 2 = n - F ( x ( i ) ) - E n ( x ( i ) ) 2 F ( x ) ( 1 - F ( x ) ) d F ( x ( i ) )

Boos77 D.D. Boos. Minimum distance estimators for location and goodness of fit. Journal of the American Statistical Association, 76 (1981), 663-670. also discussed the properties of the AD estimators and presented its computational form which is given by

A ( θ , λ ) = - n - 1 n i = 1 n 2 i - 1 log F x ( i ) θ , λ + log F ¯ x ( n + 1 - i ) θ , λ . (3.17)

Therefore, the Anderson-Darling estimators θ^ADE and λ^ADE of the parameters θ and λ are obtained by minimizing (3.17) with respect to θ and λ. Analogously, these estimators can also be obtained by solving the following non-linear equations:

i = 1 n 2 i - 1 η 1 x i | θ , λ F x i | θ , λ - η 1 x n + 1 - i | θ , λ S x n + 1 - i | θ , λ = 0 , i = 1 n 2 i - 1 η 2 x i | θ , λ F x i | θ , λ - η 2 x n + 1 - i | θ , λ S x n + 1 - i | θ , λ = 0 ,

where η 1(· | θ, λ) and η 2(· | θ, λ) are given by99 R. Cheng & N. Amin. Estimating parameters in continuous univariate distributions with a shifted origin. Journal of the Royal Statistical Society, Series B (Methodological), 45 (1983), 394-403. and1010 R. Cheng & M. Stephens. A goodness-of-fit test using Moran’s statistic with estimated parameters. Biometrika, (1989), 385-392., respectively.

Further, Luceño1818 A. Luceño. Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators. Computational Statistics & Data Analysis, 51 (2006), 904-917. discussed modifications of the standard AD statistics. The most used statistic1111 S. Dey, T. Dey & D. Kundu. Two-parameter Rayleigh distribution: different methods of estimation. American Journal of Mathematical and Management Sciences, 33 (2014), 55-74.), (1717 F. Louzada, P.L. Ramos & G.S. Perdoná. Different estimation procedures for the parameters of theextended exponential geometric distribution for medical data. Computational and Mathematical Methods in Medicine, 2016 (2016). Article ID 8727951, 12 pages. doi:10.1155/2016/8727951.
https://doi.org/10.1155/2016/8727951...
), (2121 P. Ramos & F. Louzada. The generalized weighted Lindley distribution: Properties, estimation and applications. Cogent Mathematics, 3 (2016), 1-18. is the Right-tail AD statistics given by

RADS n 2 = n - F ( x ( i ) - E n ( x ( i ) 2 1 - F x ( i ) d F ( x ( i ) ) .

Additionally, its computational form was presented in the form of

R ( θ , λ ) = n 2 - 2 i = 1 n F ( x ( i ) | θ , λ ) - 1 n i = 1 n ( 2 i - 1 ) log S ( x n + 1 - i : n | θ , λ ) (3.18)

Hence, the Right-tail Anderson-Darling estimators θ^RTADE and λ^RTADE of the parameters θ and λ are obtained by minimizing (3.18) with respect to θ and λ These estimators can also be obtained by solving the following non-linear equations:

- 2 i = 1 n η 1 x ( i ) θ , λ F x ( i ) θ , λ + 1 n i = 1 n 2 i - 1 η 1 x n + 1 - i : n θ , λ S x n + 1 - i : n θ , λ = 0 , - 2 i = 1 n η 2 x ( i ) θ , λ F x ( i ) λ , σ + 1 n i = 1 n 2 i - 1 η 2 x n + 1 - i : n θ , λ S x n + 1 - i : n θ , λ = 0 ,

where η 1(· | θ, λ) and η 2(· | θ, λ) are given by99 R. Cheng & N. Amin. Estimating parameters in continuous univariate distributions with a shifted origin. Journal of the Royal Statistical Society, Series B (Methodological), 45 (1983), 394-403. and1010 R. Cheng & M. Stephens. A goodness-of-fit test using Moran’s statistic with estimated parameters. Biometrika, (1989), 385-392., respectively.

4 SIMULATION STUDY

In this section, we conduct Monte Carlo simulation studies to compare the performance of the frequentist estimators discussed in the previous sections. Using the mixture representation described in Section 2, the values of the BE2 distribution were generated using the following algorithm:

1. Generate Ui ~ Uniform(0, 1), i = 1, …, n;

2. Generate Xi ~ Gama(2, λ), i = 1, ..., n;

3. Generate Yi ~ Exponential(λ), i = 1, ..., n;

4. If Uip = θ/(2 - θ), then set Ti = Xi , otherwise, set Ti = Yi , i = 1, ..., n.

We evaluate the performance of the estimators of the BE2 distribution based on bias and MSE. The following procedure is adopted to evaluate the performance of the estimators:

1. Generate pseudo random sample with size n of the BE2(θ, λ)

2. Using the values obtained in step 1, calculate θ^ and λ^ via MLE, ME, MME, LSE, WLSE, PCE, MPS, CME, ADE, RTADE.

3. Repeat the steps 2 and 3 N times.

4. Using Θ^=(θ^,λ^) and Θ =(θ, λ), compute the Bias 1Ni=1NΘ^i,j-Θi,j and the mean square errors (MSE) i=1N(Θ^i,j-Θi,j)2N for j = 1,2.

It is expected that for this approach the Bias and the MSE are closer to zero. The results are computed using the software R (R Core Development Team). The seed used to generate the random values is 2017. The chosen values to perform this procedure are Θ =(1, 0.8), N = 500,000 and n = (15,20,25,…,130).

The estimation methods are put under the same conditions (initial values and random samples). The initial values used to initiate the iterative methods are the true values. The estimates are obtained by applying the maxBFGS function in (3.2), (3.13), (3.17) and (3.18). This function is available in the maxlik package1414 A. Henningsen & O. Toomet. Maxlik: A package for maximum likelihood estimation in R. Computational Statistics, 26 (2011), 443-458. http://dx.doi.org/10.1007/s00180-010-0217-1.
http://dx.doi.org/10.1007/s00180-010-021...
. The nls function available in the stats package and are used in (3.7), (3.8), (3.11) and (3.16). The programs can be obtained, upon request.

In order to present a fair comparison, the estimation procedures are performed under the same conditions. However, for some samples and estimation procedures, the numerical techniques fail in finding the parameters estimates. Hence, a numerical study is conducted to verify the frequency of convergence for each estimation method by counting the number of times that each estimation fails in finding the numerical solution. In Figure 1, we present the proportion of failure of each estimation.

Figure 1
Rate of convergence considering different values of Θ and different estimation procedures: 1-ME, 2-MLE, 3-MPS, 4-ADE, 5-RADE, 6-PCE, 7-LSE, 8-WLSE, 9-CME.

From Figure 1, the PCE, LSE, WLSE and CME estimators show a high proportion of failure in the numerical procedures. As the above estimators result in rate of failures, therefore we discarded these estimators from our simulation study. Figure 2 shows the Bias, MSEs for the estimates of λ and θ obtained by using different estimation methods for 500,000 simulated samples and considering different values of n. We have presented results only for λ = 1 and θ = 0.8 due to space constraint. But the results are similar for other choices for λ and θ.

Figure 2
Bias, MSEs related from the estimates of λ and θ for N simulated samples, considering different values of n obtained using the following estimation method 1-ME, 2-MLE, 3-MPS, 4-ADE, 5-RADE.

From the graphs we can see that the Bias and the MSE of all estimators tend to zero for large n, i.e., the estimates are asymptotically unbiased for the parameters. The MPS estimator shown to be superior than the MLE, this result are consistent with other studies2121 P. Ramos & F. Louzada. The generalized weighted Lindley distribution: Properties, estimation and applications. Cogent Mathematics, 3 (2016), 1-18.), (2424 V.K. Sharma, S.K. Singh, U. Singh & F. Merovci. The generalized inverse Lindley distribution: A new inverse statistical model for the study of upside-down bathtub data. Communications in Statistics-Theory and Methods, 45 (2016), 5709-5729.. It is worth mentioning that, although the ME has closed form expressions, we may have 2sx¯2>2, i.e., θ is a complex number. The ME and the MLE shows positive bias for both the parameters, while the MPS shows a negative bias for the parameters. Overall, the ADE and RTADE provides superior estimates than their counter parts in terms of Bias and MSEs. Although, both ADE and RTADE shows almost the same results, the ADE has desirable properties such as robustness, consistence and normally asymptotic properties77 D.D. Boos. Minimum distance estimators for location and goodness of fit. Journal of the American Statistical Association, 76 (1981), 663-670.), (88 D.D. Boos. Minimum Anderson-Darling estimation. Communications in Statistics-Theory and Methods, 11 (1982), 2747-2774. and may be used for estimating the BE2 distribution parameters. Rodrigues et al.2323 G.C. Rodrigues, F. Louzada & P.L. Ramos. Poisson-exponential distribution: different methods of estimation. Journal of Applied Statistics, (2016), pp. 1-17. Available online: http://dx.doi.org/10.1080/02664763.2016.1268571
http://dx.doi.org/10.1080/02664763.2016....
observed a similar result for the Poisson-exponential distribution where the ADE was the best method of estimation.

Figure 3
Bias, MSEs related from the estimates of λ and θ for N simulated samples, considering different values of n obtained using the following estimation method 1-ME, 2-MLE, 3-MPS, 4-ADE, 5-RADE.

5 APPLICATIONS

Located in southeastern Brazil, São Carlos is a city of 238,958 inhabitants. The city has an active industrial profile and high agricultural importance. Therefore, the study of the behaviour of dry and wet periods has proved to be strategic and economically significant for the regional development. From Figure 4, we observe that the city has rainy periods from October to March, and from June to August exhibit more dry periods.

Figure 4
Average of the total monthly rainfall from January to December at São Carlos, Brazil.

Consequently, prediction of the behavior of the transition periods in rainy sessions (April, May and September) enables the agriculturists to be prepared against different problems, such as water scarcity. In this paper, we consider three real data sets related to the total monthly rainfall during April, May and September at São Carlos. The data sets (see the Appendix for more details) was obtained from the Department of Water Resources and Power agency manager of water resources of the State of São Paulo including a period from 1960 to 2014.

5.1 Initial Values

Finding good initial values to start iterative procedures is an important problem in numerical analysis, while in the Section 4 we have used the true values to start the iterative procedure. In any application these values are unknown. A good choice as initial values would be to use the Mes (3.6), since these estimators have closed-form expressions. However, we have shown in Section 4 that the MEs may not be computed in some cases.

Two problems can arise during the application of the MEs. First, we may have 2sx¯2>2, i.e., θ will be a complex number. Second 2sx¯2<1, i.e, θ to be greater than 1. The first problem can be overcome by taking the absolute value among 2-2sx¯2, while the second can be overcome by taking the minimum value between 2-2sx2 and 1. Therefore, we have chosen the modified estimator and is given by

θ ~ = min 2 - 2 s x 2 , 1 , and λ ~ = 2 θ ~ x ¯ (5.1)

where |x| is the absolute value of x. In this case λ and θ can be computed without any problem. Note that, here we suggest the use of this estimator as initial value to be used in the iterative methods.

5.2 Discrimination Criterion Methods

Here, different discrimination criterion are considered based on log likelihood function. Let k be the number of parameters to be fitted and Θ^ the MLEs of Θ, the discrimination criterion methods are respectively: Akaike information criterion (AIC) computed through AIC = -2l(Θ^;x)+2k, Corrected Akaike information criterion AICC = AIC +(2k (k + 1))/(n - k - 1), Hannan-Quinn information criterion HICQ = -2 l(Θ^;x)+2 k loglog(n) and the consistent Akaike information criterion CAIC = 2 l(Θ^;x)+klog(n)+1. The best model is the one which provides the minimum values of these criteria. The Kolmogorov-Smirnov (KS) test is also considered in order to check the goodness of the fit for the models. This procedure is based on the KS statistic Dn = supx | Fn (x) - F(x; θ, λ)|, where supx is the supremum of the set of distances, Fn (x) is the empirical distribution function and F(x; θ, λ) is c.d.f. Under a significance level of 5% if the data comes from F(x; θ, λ) (null hypothesis), the hypothesis is rejected if the p-value is smaller than 0.05.

For the sake of comparison, the results obtained from the BE2 distribution are compared with the Weibull, Gamma, Lognormal, Gumbel and Generalized Exponential 1313 R.D. Gupta & D. Kundu. Generalized exponential distribution: Different method of estimations. Journal of Statistical Computation and Simulation, 69 (2001), 315-337. distributions and nonparametric survival function.

5.3 Results

The data sets related to May and September have the occurrence of zero values, i.e., non occurrence of precipitation. This type of data does not allow us to fit popular distributions such as Gamma, Weibull, Lognormal and Generalized Exponential distribution since they are defined only for x > 0. To overcome this problem, we approximate 0.0 in the data set to 0.1. Although this is not a standard procedure, yet, without changing this results we will not be able to fit these common distributions. Nadarajah & Haghighi 2020 S. Nadarajah & F. Haghighi. An extension of the exponential distribution. Statistics, 45 (2011),543-558. observed that maximum likelihood estimate of the shape parameter is non-unique for the Gamma, Weibull and Generalized exponential distributions if data set consists of zeros and therefore none of these three distributions can fit this kind of data set. On the other hand the BE2 distribution is defined as x ≥ 0, which allow us to use the original values in the presence of zero.

Table 1 presents the results for AIC, AICC, HQIC and CAIC criteria, for different probability distributions. In the Figure 5, we have the survival function adjusted by different distributions and non-parametric survival estimator.

Table 1
Results of the AIC, AICC, HQIC and CAIC criteria and the p-values of KS statistic for different probability distributions considering the data sets related to the total monthly rainfall during April, May and September at São Carlos.

Figure 5
Survival function adjusted by different distributions and a non-parametric method considering the data sets related to the total monthly rainfall during April, May and September at São Carlos.

Comparing the empirical survival function with the adjusted distributions it is observed that the BE2 distribution fits best among the chosen models. This result is confirmed from AIC, AICC, HQIC and CAIC criteria as the BE2 distribution has the minimum values. Considered the parametric bootstrap confidence intervals 1212 B. Efron & R.J. Tibshirani. An introduction to the bootstrap. CRC press, (1994), 1st Edition, 436 p. in order to build the confidence intervals for the parameters of BE2 distribution using the Anderson-Darling estimates.

Table 2 displays the MLEs and 95% confidence intervals for θ and λ of the BE2 distribution.

Table 2
MLE, 95% confidence intervals for θ and λ considering the data sets related to the total monthly rainfall during April, May and September at São Carlos.

The quantile-quantile (Q-Q) plot is a graphical technique which provides an assessment of goodness of fit. If the data set comes from the proposed distribution, the points should fall approximately along the 45-degree reference line. Figures 4 and 5 display the histogram and the Q-Q plot from the proposed data set.

From the Figure 4, we observe that the points are approximately along the reference line. Therefore, the proposed methodology suggests that the data related to the total monthly rainfall during April, May and September at São Carlos demonstrate the Binomial-exponential 2 distribution.

Figure 6
Histogram from the data sets related to the total monthly rainfall.

6 CONCLUSIONS

In this paper, the model parameters of Binomial-exponential 2 are estimated by nine methods of estimation, namely, maximum likelihood, moments, percentile, least squares, weighted least squares, maximum product of spacing, Cramer-von Mises, Anderson-Darling and right tailed Anderson-Darling. As it is not feasible to compare these methods of estimation theoretically, we have presented the simulation study results in order to identify the most efficient procedure. The simulation results show that the Anderson-Darling estimators outperform other procedures such as the maximum likelihood method for estimating the parameters of the BE2 distribution. The proposed methodology is applied in three real data sets related to the total monthly rainfall during April, May and September at São Carlos, Brazil, demonstrating that the BE2 distribution can be used as alternative to some well known distributions in weather related data.

Figure 7
Q-Q plot from the data sets related to the total monthly rainfall.

ACKNOWLEDGEMENTS

The authors are very grateful to the Editor and the reviewers for their helpful and useful comments that improved the manuscript.

References

  • 1
    M.R. Alkasasbeh & M.Z. Raqab. Estimation of the generalized logistic distribution parameters:Comparative study. Statistical Methodology, 6 (2009), 262-279.
  • 2
    T.W. Anderson & D.A. Darling. Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. The Annals of Mathematical Statistics, 23(2) (1952), 193-212.
  • 3
    T.W. Anderson & D.A. Darling. A test of goodness of fit. Journal of the American Statistical Association, 49 (1954), 765-769.
  • 4
    A. Asgharzadeh, H.S. Bakouch & M. Habibi. A generalized binomial exponential 2 distribution: modeling and applications to hydrologic events. Journal of Applied Statistics, (2016), 1-20. Available online: http://dx.doi.org/10.1080/02664763.2016.1254729
    » http://dx.doi.org/10.1080/02664763.2016.1254729
  • 5
    A. Asgharzadeh, R. Rezaie & M. Abdi. Comparisons of methods of estimation for the half-logistic distribution. Selcuk Journal of Applied Mathematics, Special Issue (2011), 93-108.
  • 6
    H.S. Bakouch, M.A. Jazi, S. Nadarajah, A. Dolati & R. Roozegar. A lifetime model with increasing failure rate. Applied Mathematical Modelling, 38 (2014), 5392-5406.
  • 7
    D.D. Boos. Minimum distance estimators for location and goodness of fit. Journal of the American Statistical Association, 76 (1981), 663-670.
  • 8
    D.D. Boos. Minimum Anderson-Darling estimation. Communications in Statistics-Theory and Methods, 11 (1982), 2747-2774.
  • 9
    R. Cheng & N. Amin. Estimating parameters in continuous univariate distributions with a shifted origin. Journal of the Royal Statistical Society, Series B (Methodological), 45 (1983), 394-403.
  • 10
    R. Cheng & M. Stephens. A goodness-of-fit test using Moran’s statistic with estimated parameters. Biometrika, (1989), 385-392.
  • 11
    S. Dey, T. Dey & D. Kundu. Two-parameter Rayleigh distribution: different methods of estimation. American Journal of Mathematical and Management Sciences, 33 (2014), 55-74.
  • 12
    B. Efron & R.J. Tibshirani. An introduction to the bootstrap. CRC press, (1994), 1st Edition, 436 p.
  • 13
    R.D. Gupta & D. Kundu. Generalized exponential distribution: Different method of estimations. Journal of Statistical Computation and Simulation, 69 (2001), 315-337.
  • 14
    A. Henningsen & O. Toomet. Maxlik: A package for maximum likelihood estimation in R. Computational Statistics, 26 (2011), 443-458. http://dx.doi.org/10.1007/s00180-010-0217-1
    » http://dx.doi.org/10.1007/s00180-010-0217-1
  • 15
    J.H. Kao. Computer methods for estimating Weibull parameters in reliability studies. IRE Transactions on Reliability and Quality Control, 13 (1958), 15-22.
  • 16
    J.H. Kao. A graphical estimation of mixed Weibull parameters in life-testing of electron tubes. Technometrics, 1 (1959), 389-407.
  • 17
    F. Louzada, P.L. Ramos & G.S. Perdoná. Different estimation procedures for the parameters of theextended exponential geometric distribution for medical data. Computational and Mathematical Methods in Medicine, 2016 (2016). Article ID 8727951, 12 pages. doi:10.1155/2016/8727951.
    » https://doi.org/10.1155/2016/8727951
  • 18
    A. Luceño. Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators. Computational Statistics & Data Analysis, 51 (2006), 904-917.
  • 19
    J. Mazucheli, F. Louzada & M. Ghitany. Comparison of estimation methods for the parameters of the weighted Lindley distribution. Applied Mathematics and Computation, 220 (2013), 463-471.
  • 20
    S. Nadarajah & F. Haghighi. An extension of the exponential distribution. Statistics, 45 (2011),543-558.
  • 21
    P. Ramos & F. Louzada. The generalized weighted Lindley distribution: Properties, estimation and applications. Cogent Mathematics, 3 (2016), 1-18.
  • 22
    B. Ranneby. The maximum spacing method. An estimation method related to the maximum likelihood method. Scandinavian Journal of Statistics, 11 (1984), 93-112.
  • 23
    G.C. Rodrigues, F. Louzada & P.L. Ramos. Poisson-exponential distribution: different methods of estimation. Journal of Applied Statistics, (2016), pp. 1-17. Available online: http://dx.doi.org/10.1080/02664763.2016.1268571
    » http://dx.doi.org/10.1080/02664763.2016.1268571
  • 24
    V.K. Sharma, S.K. Singh, U. Singh & F. Merovci. The generalized inverse Lindley distribution: A new inverse statistical model for the study of upside-down bathtub data. Communications in Statistics-Theory and Methods, 45 (2016), 5709-5729.
  • 25
    J.J. Swain, S. Venkatraman & J.R. Wilson. Least-squares estimation of distribution functions in Johnson’s translation system. Journal of Statistical Computation and Simulation, 29 (1988), 271-297.
  • 26
    M. Teimouri, S.M. Hoseini & S. Nadarajah. Comparison of estimation methods for the Weibull distribution. Statistics, 47 (2013), 93-109.

7 APPENDIX A - DATA SET

  • April: 59.00, 102.20, 17.30, 23.00, 50.60, 27.00, 203.00, 40.90, 53.00, 177.40, 94.60, 129.40, 76.00, 93.20, 22.80, 98.80, 77.70, 204.20, 16.90, 55.10, 103.90, 34.90, 39.70, 137.70, 104.20, 117.60, 17.10, 120.80, 164.90, 50.20, 172.80, 58.50, 112.40, 24.50,32.80, 64.00, 72.10, 139.30, 0.50, 70.90, 0.80, 82.70, 108.60, 32.30, 13.60, 25.70, 135.80, 136.80, 89.70, 139.20, 102.80, 97.30, 60.60.

  • May: 63.40, 41.70, 0.00, 0.00, 47.30, 31.50, 172.80, 93.50, 0.00, 60.10, 23.00, 90.10, 50.50, 67.50, 4.70, 7.10, 93.50, 0.20, 82.20, 112.90, 7.10, 35.50, 81.50, 202.60, 56.10, 19.20, 69.10, 133.00, 111.40, 25.90, 33.50, 46.80, 54.60, 43.00, 46.50, 83.60, 73.50, 18.00, 16.30, 70.00, 56.30, 70.90, 183.70, 78.20, 6.20, 86.00, 66.10, 72.80, 20.90, 17.20, 113.90, 169.60, 22.10.

  • September: 26.40, 12.50, 1.00, 44.80, 0.00, 74.20, 179.50, 76.70, 269.50, 49.00, 306.80, 102.70, 73.50, 35.20, 72.70, 28.80, 49.30, 132.00, 151.50, 39.70, 136.20, 112.00, 17.70, 11.60, 225.20, 102.60, 27.10, 17.50, 6.70, 82.20, 40.70, 54.60, 115.50, 89.50, 0.00, 17.00, 127.40, 41.70, 43.10, 84.70, 102.50, 120.90, 80.10, 18.10, 5.30, 59.50, 26.80, 0.00, 34.30, 101.10, 60.30, 31.50, 60.40, 45.30, 49.50, 70.44.

Publication Dates

  • Publication in this collection
    May-Aug 2017

History

  • Received
    08 Sept 2016
  • Accepted
    13 May 2017
Sociedade Brasileira de Matemática Aplicada e Computacional Rua Maestro João Seppe, nº. 900, 16º. andar - Sala 163 , 13561-120 São Carlos - SP, Tel. / Fax: (55 16) 3412-9752 - São Carlos - SP - Brazil
E-mail: sbmac@sbmac.org.br