Acessibilidade / Reportar erro

On Posterior Properties of the Two Parameter Gamma Family of Distributions

Abstract

The gamma distribution has been extensively used in many areas of applications. In this paper, considering a Bayesian analysis we provide necessary and sufficient conditions to check whether or not improper priors lead to proper posterior distributions. Further, we also discuss sufficient conditions to verify if the obtained posterior moments are finite. An interesting aspect of our findings are that one can check if the posterior is proper or improper and also if its posterior moments are finite by looking directly in the behavior of the proposed improper prior. To illustrate our proposed methodology these results are applied in different objective priors.

Key words
Gamma distribution; improper prior; objective prior; posterior property

1 - INTRODUCTION

The Gamma distribution is one of the most well-known distributions used in statistical analysis. Such distribution arises naturally in many areas such as environmental analysis, reliability analysis, clinical trials, signal processing and other physical situations. Let X be a non-negative random variable with the gamma distribution given by

f(x|α,β)=βαΓ(α)xα1eβx,(1)
where α0 and β0 are unknown shape and scale parameters, respectively, and Γ(ϕ)=0exxϕ1dx is the gamma function.

Commonly-used frequentist methods of inference for gamma distribution are standard in the statistical literature. Considering the Bayesian approach, where a prior distribution must be assigned, different objective priors for the gamma distribution have been discussed earlier by Miller 1980MILLER RB. 1980. Bayesian analysis of the two-parameter gamma distribution. Technometrics 22(1): 65-69., Sun & Ye 1996SUN D & YE K. 1996. Frequentist validity of posterior quantiles for a two-parameter exponential family. Biometrika 83(1): 55-65., Berger et al. 2015BERGER JO, BERNARDO JM & SUN D. 2015. Overall objective priors. Bayesian Anal 10(1): 189-221. and Louzada & Ramos 2018LOUZADA F & RAMOS PL. 2018. Efficient closed-form maximum a posteriori estimators for the gamma distribution. J Stat Comput Sim 88(6): 1134-1146.. Although these priors are constructed by formal rules (see, Kass & Wasserman 1996KASS RE & WASSERMAN L. 1996. The selection of prior distributions by formal rules. J Am Stat Assoc 91(435): 1343-1370., Ramos et al. 2019RAMOS PL, ALMEIDA MP, TOMAZELLA VL & LOUZADA F. 2019. Improved Bayes estimators and prediction for the Wilson-Hilferty distribution. An Acad Bras Cienc 91: e20190002.), they are improper, i.e., do not correspond to proper probability distribution and could lead to improper posteriors, which is undesirable. Northrop & Attalides 2016NORTHROP P & ATTALIDES N. 2016. Posterior propriety in Bayesian extreme value analyses using reference priors. Stat Sinica 26(2). argued that “… there is no general theory providing simple conditions under which an improper prior yields a proper posterior for a particular model, so this must be investigated case-by-case". In this study, under the assumption that the obtained sample is independent and identically distributed (iid), we overcome this problem by providing in a simple way necessary and sufficient conditions to check whether or not these objective priors lead to proper posterior distributions. Even if the posterior distribution is proper the posterior moments for the parameters can be infinite. Further, we also provided sufficient conditions to verify if the posterior moments are finite. Therefore, one can easily check if the obtained posterior is proper or improper and also if its posterior moments are finite considering directly the behavior of the improper prior. Our proposed methodology is fully illustrated in more than ten objective priors such as independent uniform priors, Jeffreys’ rule (Kass & Wasserman 1996KASS RE & WASSERMAN L. 1996. The selection of prior distributions by formal rules. J Am Stat Assoc 91(435): 1343-1370.), Jeffreys’ prior (Jeffreys 1946JEFFREYS H. 1946. An invariant form for the prior probability in estimation problems. P Roy Soc A-Math Phy 186(1007): 453-461.), maximal data information (MDI) prior (Zellner 1977ZELLNER A. 1977. Maximal Data Information Prior Distributions. New Meth Appli Bay Meth 211-232., 1984ZELLNER A. 1984. Maximal Data Information Prior Distributions. Bas Iss Econ, 334 p.), reference priors (Berger et al. 2015BERGER JO, BERNARDO JM & SUN D. 2015. Overall objective priors. Bayesian Anal 10(1): 189-221.) and matching priors (Mukerjee & Dey 1993MUKERJEE R & DEY DK. 1993. Frequentist validity of posterior quantiles in the presence of a nuisance parameter: higher order asymptotics. Biometrika 80(3): 499-505. and Tibshirani 1989TIBSHIRANI R. 1989. Noninformative priors for one parameter of many. Biometrika 76(3): 604-608.), to list a few. Finally, the effect of these priors in the posterior distribution is compared via numerical simulation. It is worth mentioning that we only considered improper objective priors, when prior information is available one may consider the use of elicited prior (see for instance, Dey & Moala 2018DEY S & MOALA FA. 2018. Objective and subjective prior distributions for the Gompertz distribution. An Acad Bras Cienc 90: 2643-2661.).

The remainder of this paper is organized as follows. Section 2 presents a theorem that provides necessary and sufficient conditions for the posterior distributions to be proper and also sufficient conditions to check if the posterior moments of the parameters are finite. Section 3 presents the applications of our main theorem in different objective priors. In Section 4, a simulation study is conducted in order to identify the most efficient estimation procedure. Finally, Section 5 summarizes the study.

2 - PROPER POSTERIOR

Let X1,,Xn be an iid sample where X Gamma(α,β),. Then the joint posterior distribution for 𝛉 is given by the product of the likelihood function and the prior distribution π(𝛉) divided by a normalizing constant d(𝐱), resulting in

p(𝛉|𝐱)=π(𝛉)d(𝐱)βnαΓ(α)n{i=1nxiα}exp{βi=1nxi},(2)
where
d(𝐱)=𝒜π(𝛉)βnαΓ(α)n{i=1nxiα}exp{βi=1nxi}d𝛉(3)
and 𝒜={(0,)×(0,)} is the parameter space of 𝛉. For any prior distribution in the form π(𝛉)π1(β)π2(α), our purpose is to find necessary and sufficient conditions for these class of posterior be proper, i.e., d(𝐱). The following propositions will be useful to attain this objective. For the following we let ¯ denote the extended real number line {,} and the subscript * in and ¯ will denote the exclusion of 0 in these sets.

Definition 2.1. Let g:𝒰¯*+ and h:𝒰¯*+, where 𝒰. We say that g(x)h(x) if there exists c0*+ and c1*+ such that c0h(x)g(x)c1h(x) for every x𝒰.

Definition 2.2. Let a¯, g:𝒰+ and h:𝒰+, where 𝒰. We say that g(x)xah(x) if

liminfxag(x)h(x)0 and limsupxag(x)h(x) .
The meaning of the relations g(x)xa+h(x) and g(x)xah(x) for a are defined analogously.

Note that, from the above definiton, if for some c*+ we have that limxag(x)h(x)=c, then it will follow that g(x)xah(x). The following proposition is a direct consequence of the above definition.

Proposition 2.3. For a¯ and r , let f1(x)xaf2(x) and g1(x)xag2(x). Then we have that

f1(x)g1(x)xaf2(x)g2(x) and f1(x)rxaf2(x)r.

The following proposition gives us a relation between Definition 2.1 and Definition 2.2.

Proposition 2.4. Let g:(a,b)+ and h:(a,b)+ be continuous functions on (a,b), where a¯ and b¯. Then g(x)h(x) if and only if g(x)xah(x) and g(x)xbh(x).

Proposition 2.5. Let g:(a,b)+ and h:(a,b)+ be continuous functions in (a,b), where a¯ and b¯, and let c(a,b). Then, if either g(x)xah(x) or g(x)xbh(x), it will follow respectively that

acg(x)dxach(x)dx or cbg(x)dxcbh(x)dx.

Theorem 2.6. Let the behavior of π(β) be given by π(β)βc, for some c. Then we have that:

  1. If c 1, then the posterior distribution (3) is improper.

  2. If c1 and limα0+π(α)αs= s then the posterior distribution (3) is improper.

  3. If c1 and the behavior of π(α) is given by

    π ( α ) α 0 + α s 0 and π ( α ) α α s ,
    where s0 and s, then the posterior distribution (3) is proper if and only if ns0 in case c=1, and is proper if and only if ns01 in case c1.

Proof. See Appendix A. ◻

Theorem 2.7. Let π(α,β)=π(α)π(β), and suppose the behavior of π(β) are π(α) are given by

π ( β ) β c , π ( α ) β 0 + α s 0 and π ( α ) α α s ,

for c, s0 and s. Then, if the posterior of π(α,β) is proper, then the posterior mean of α and β are finite for this prior, as well as all moments.

Proof. Since the posterior is proper, by Theorem 2.6 we have that c1, and moreover ns01 if c1 and ns0 if c=1.

Now let π*(α,β)=απ(α,β). Then π*(α,β)=π*(α)π(β), where π*(α)=απ(α), and it follows that

π(β)βc,π*(α)β0+αs0+1 andπ*(α)ααs+1.

Therefore, since c1, and since ns01(s0+1)1 if c1 and ns0(s0+1) if c=1, it follows from Theorem 2.6 that the posterior

π*(α,β)βnαΓ(α)n{i=1nxiα}exp{βi=1nxi}
relative to the prior π*(α,β) is proper. Therefore
E[α|𝐱]=00απ(α,β)βnαΓ(α)n{i=1nxiα}exp{βi=1nxi}dβdα .
Proceeding analogously it also follows that
E[β|𝐱]=00βπ(α,β)βnαΓ(α)n{i=1nxiα}exp{βi=1nxi}dβdα .

Therefore we have proved that if a prior π(α,β) satisfying the assumptions of the theorem leads to a proper posterior, then the priors απ(α,β) and βπ(α,β) also leads to proper posteriors, and it follows by induction that αrβsπ(α,β) also leads to proper posteriors for any r and s in , which concludes the proof. ◻

Proposition 2.8. Suppose πi(α,β) leads to a proper posterior for n and i=1,,m, and consider the constants ki0 for i=1,,m. Then

  • i=1mkiπi(α,β) leads to a proper posterior

  • i=1mπi(α,β)ki leads to a proper posterior if additionally i=1mki=1.

Proof. The item i) is a direct of consequence of the linearity of the Lebesgue integral while ii) is a direct consequence of the Holder’s inequality. ◻

3 - APPLICATION

In this section, we applied the proposed theorems in different objective priors.

3.1 - Uniform prior

A simple noninformative prior can be obtained considering uniform priors contained in the interval (0,). This prior usually is not attractive due to its lack of invariance to reparameterisation. The uniform prior is given by π1(α,β)1. The joint posterior distribution for α and β, produced by the uniform prior, is

π1(α,β|𝐱)βnαΓ(α)n{i=1nxiα}exp{βi=1nxi}.(4)

Theorem 3.1. The posterior distribution (4) is proper for any sample size, in which case the posterior moments for α and β are finite.

Proof. Since π(β)=β0 and π(α)=α0, it follows that c=0 and s0=s=0 are valid constants for application of Theorem 2.6. Thus, since c 1 and ns01 for all n, the result follows from Theorem 2.6 and Theorem 2.7. ◻

The marginal posterior distribution for α is

π1(α|𝐱)1Γ(α)n{i=1nxiα}0βnαexp{βi=1nxi}dβαΓ(nα)Γ(α)n(i=1nxini=1nxi)nα.

The conditional posterior distribution for β is given by

π1(β|α,𝐱)Gamma(nα+1,i=1nxi).(5)

3.2 - Jeffreys rule

Jeffreys considered different procedures for constructing objective priors. For θ(0,) (see Kass & Wasserman 1996KASS RE & WASSERMAN L. 1996. The selection of prior distributions by formal rules. J Am Stat Assoc 91(435): 1343-1370.), Jeffreys suggested the prior π(θ)=θ1. The main justification for this choice was its invariance under power transformations of the parameters. Since the parameters of the Gamma distribution are contained in the interval (0,), the prior using the Jeffreys rule (Miller 1980MILLER RB. 1980. Bayesian analysis of the two-parameter gamma distribution. Technometrics 22(1): 65-69.) is

π2(α,β)1αβ.(6)

The joint posterior distribution for α and β produced by the Jeffreys rule prior is given by

π2(α,β|𝐱)βnα1αΓ(α)n{i=1nxiα}exp{βi=1nxi}.(7)

Theorem 3.2. The posterior density (7) is proper if and only if n2, in which case the posterior moments for α and β are finite.

Proof. Since π(β)=β1 and π(α)=α1, then c=1 and s0=s=1 are valid constants for application of Theorem 2.6. Thus, since c=1, and since the inequality ns0 holds if and only if n2, the result follows from the Theorem 2.6 and Theorem 2.7. ◻

The marginal posterior distribution for α is given by

π2(α|𝐱)Γ(nα)αΓ(α)n(i=1nxini=1nxi)nα.

The conditional posterior distribution for β is

π2(β|α,𝐱)Gamma(nα,i=1nxi).(8)

3.3 - Jeffreys prior

In a further study, Jeffreys 1946JEFFREYS H. 1946. An invariant form for the prior probability in estimation problems. P Roy Soc A-Math Phy 186(1007): 453-461. proposed a general rule to obtain an objective prior. This prior is obtained through the square root of the determinant of the Fisher information matrix I(α,β) and has been widely used due to its invariance property under one-to-one transformations. For the Gamma distribution, the Jeffreys prior (see Miller 1980MILLER RB. 1980. Bayesian analysis of the two-parameter gamma distribution. Technometrics 22(1): 65-69.) is given by

π3(α,β)αψ(α)1β.(9)

The joint posterior distribution for α and β produced by the Jeffreys prior is

π3(α,β|𝐱)βnα1αψ(α)1Γ(α)n{i=1nxiα}exp{βi=1nxi}.(10)

Theorem 3.3. The posterior density (10) is proper for any sample size, in which case the posterior moments for α and β are finite.

Proof. Here, we have π(β)=β1. Following Abramowitz & Stegun 1972ABRAMOWITZ M & STEGUN IA. 1972. Handbook of Mathematical Functions. 10th ed. Washington, D.C.: NBS, p. 1046., we have that limz0+ψ(z)z2=1 and thus

limα0+αψ(α)1α12=limα0+ψ(α)α2α=1,
which implies that
αψ(α)1α0+α12.
Moreover, following Abramowitz & Stegun 1972ABRAMOWITZ M & STEGUN IA. 1972. Handbook of Mathematical Functions. 10th ed. Washington, D.C.: NBS, p. 1046., we also have that ψ(z)=1z+12z2+o(1z3), and thus
αψ(α)1α1=12+o(1α)limααψ(α)1α12=12,
which implies that
αψ(α)1αα12.

Therefore, c=1 and s0=s=12 are valid constants for application of Theorem 2.6, and since ns0 for all n1, the posterior is proper for any sample size and the posterior moments are finite using Theorems 2.6 and 2.7. ◻

The conditional posterior distribution for β is (8). The marginal posterior distribution for α is given by

π3(α|𝐱)Γ(nα)αψ(α)1Γ(α)n(i=1nxini=1nxi)nα.

3.4 - Miller prior

Miller 1980MILLER RB. 1980. Bayesian analysis of the two-parameter gamma distribution. Technometrics 22(1): 65-69. discussed three objective priors for the parameters of the gamma distribution, where the first two were the Jeffreys Rule and the Jeffreys prior. However, the author chose a prior using the justification that such approach involves less computational subroutines. This prior is given by

π4(α,β)1β.(11)

Note that much progress has been made in computational analysis and many of these computational limitations have been overcome specially after Gelfand and Smith (see Gelfand & Smith 1990GELFAND AE & SMITH AF. 1990. Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85(410): 398-409.) successfully applied the Gibbs sampling in Bayesian Analysis.

The joint posterior distribution for α and β produced by the Miller’s prior is

π4(α,β|𝐱)βnα1Γ(α)n{i=1nxiα}exp{βi=1nxi}.(12)

Theorem 3.4. The posterior density (12) is proper for any sample size, in which case the posterior moments for α and β are finite.

Proof. Since π(β)=β1 and π(α)=α0, then c=1 and s0=s=0 are valid constants for application of Theorem 2.6. Therefore, since c=1 and ns0 for all n, the result follows directly from the Theorem 2.6 and Theorem 2.7. ◻

The conditional posterior distribution for β is (8). The marginal posterior distribution for α is given by

π4(α|𝐱)Γ(nα)Γ(α)n(i=1nxini=1nxi)nα.

3.5 - Reference prior

Bernardo 1979BERNARDO JM. 1979. Reference posterior distributions for Bayesian inference. J Roy Stat Soc B p. 113-147. proposed to maximize the expected Kullback-Leibler divergence between the posterior distribution and the prior to obtain objective prior. They obtained a class of non-informative priors known as reference priors. The reference prior provides posterior distributions with interesting properties such as invariance under one-to-one transformations, consistent marginalization and consistent sampling properties (Bernardo 2005BERNARDO JM. 2005. Reference analysis. Handb Stat 25: 17-90.). The procedure to obtain reference priors is described as follows.

Corollary 3.5. Bernardo 2005BERNARDO JM. 2005. Reference analysis. Handb Stat 25: 17-90.: Let 𝛉=(θ1,θ2) be the vector of parameters and let p(θ1,θ2|x) be the the posterior distribution with asymptotic normal distribution and dispersion matrix S(θ1,θ2)=I1(θ1,θ2). Moreover, let θ1 be the parameter of interest and θ2 the nuisance. Then, if the parameter space of θ2 is independent of θ1 and if the functions s1,1(θ1,θ2),h2,2(θ1,θ2) factorize in the form s1,112(θ1,θ2)=f1(θ1)g1(θ2) and h2,212(θ1,θ2)=f2(θ1)g2(θ2) it will follow that πθ1(θ1,θ2)f1(θ1)g2(θ2) and that there is no need for compact approximations.

3.5.1 - Reference prior when α is the parameter of interest

From Corollary 3.5 the reference prior when α is the parameter of interest and β is the nuisance parameter is given by

π5(α,β)1βαψ(α)1α.(13)

Therefore, the joint posterior distribution for α and β, produced by the reference prior (13) is given by

π5(α,β|𝐱)αψ(α)1αβnα1Γ(α)n{i=1nxiα}exp{βi=1nxi}.(14)

Theorem 3.6. The posterior density (14) is proper if and only if n2, in which case the posterior moments for α and β are finite.

Proof. We proved in Theorem 3.3 that αψ(α)1z0+α12 and αψ(α)1zα12. It follows that

αψ(α)1αz0+α1 and αψ(α)1αzα1.
Then c=1 and s0=s=1, therefore the result follows directly from the Theorem 2.6 and 2.7. ◻

The conditional posterior distribution for β is (8). The marginal posterior distribution for α is given by

π5(α|𝐱)αψ(α)1αΓ(nα)Γ(α)n(i=1nxini=1nxi)nα.

3.5.2 - Reference prior when β is the parameter of interest

The reference prior when β is the parameter of interest and α is the nuisance parameter is given by

π6(α,β)ψ(α)β.(15)

The joint posterior distribution for α and β, produced by the reference prior (15) is given by

π6(α,β|𝐱)βnα1ψ(α)Γ(α)n{i=1nxiα}exp{βi=1nxi}.(16)

Theorem 3.7. The posterior density (14) is proper if and only if n2, in which case the posterior moments for α and β are finite.

Proof. Following Abramowitz & Stegun 1972ABRAMOWITZ M & STEGUN IA. 1972. Handbook of Mathematical Functions. 10th ed. Washington, D.C.: NBS, p. 1046., we have that limα0+ψ(α)α2=1 and limαψ(α)α1=1. Thus, ψ(α)α0+α1 and ψ(α)αα12. Therefore we conclude that c=1, s0=1, s=12 are valid constants for application of Theorem 2.6. Thus, since ns0 if and only if n1 the result follows from the Theorem 2.6 and Theorem 2.7. ◻

The conditional posterior distribution for β is (8). The marginal posterior distribution for α is given by

π6(α|𝐱)ψ(α)Γ(nα)Γ(α)n(i=1nxini=1nxi)nα.

There are different ways to derive the same reference priors in the presence of nuisance parameters, e.g, Liseo 1993LISEO B. 1993. Elimination of nuisance parameters with reference priors. Biometrika 80(2): 295-304., Sun & Ye 1996SUN D & YE K. 1996. Frequentist validity of posterior quantiles for a two-parameter exponential family. Biometrika 83(1): 55-65. and Moala et al. 2013MOALA FA, RAMOS PL & ACHCAR JA. 2013. Bayesian Inference for Two-Parameter Gamma Distribution Assuming Different Noninformative Priors. Rev Colomb Eetad 36(2): 321-338..

3.5.3 - Overall reference prior

The reference priors presented so far consider the presence of nuisance parameters. However, in many situation we are simultaneously interested in all parameters of the model. Sun Ye 1996 considered the Bar-Lev & Reiser 1982BAR-LEV SK & REISER B. 1982. An exponential subfamily which admits UMPU tests based on a single test statistic. Ann Stat 979-989. two parameter exponential family and presented a straightforward procedure to derive overall reference priors. Since the gamma distribution can be expressed as Bar-Lev and Reiser’s two parameter exponential distribution, the overall reference Berger et al. 2015BERGER JO, BERNARDO JM & SUN D. 2015. Overall objective priors. Bayesian Anal 10(1): 189-221. is given by

π7(α,β)1βαψ(α)1α(17)
which is the same as the reference prior when α is the parameter of interest and β is the nuisance parameter.

3.6 - Maximal Data Information prior

Zellner 1977ZELLNER A. 1977. Maximal Data Information Prior Distributions. New Meth Appli Bay Meth 211-232., 1984ZELLNER A. 1984. Maximal Data Information Prior Distributions. Bas Iss Econ, 334 p. introduced another objective prior in which its information is weak comparing with data information. Such prior is known as Maximal Data Information (MDI) prior and can be obtained by solving

π8(α,β)exp(0log(f(t|α,β))f(t|α,β)dt).(18)
Therefore, the MDI prior (18) for the Gamma distribution (1) is given by
π8(α,β)βΓ(α)exp{(α1)ψ(α)α}.(19)

The joint posterior distribution for α and β, produced by the MDI prior, is

π8(α,β|𝐱)βnα+1Γ(α)n+1{i=1nxiα}exp{βi=1nxi+(α1)ψ(α)α}.(20)

Moala et al. 2013MOALA FA, RAMOS PL & ACHCAR JA. 2013. Bayesian Inference for Two-Parameter Gamma Distribution Assuming Different Noninformative Priors. Rev Colomb Eetad 36(2): 321-338. argued that the posterior distribution (20) is improper. However, the authors did not present a proof of such result. The following theorem presents a formally rigorous proof in which confirmed such conjecture.

Theorem 3.8. The joint posterior density (20) is improper for any n.

Proof. Following Abramowitz & Stegun 1972ABRAMOWITZ M & STEGUN IA. 1972. Handbook of Mathematical Functions. 10th ed. Washington, D.C.: NBS, p. 1046., limα0+Γ(α)α1=1 and limα0+ψ(α)α1=1. Thus,

limα0+π(α)αs0=limα0+1Γ(α)e(α1)ψ(α)ααs0=limα0+α1Γ(α)e(α1)ψ(α)αeα1eα1αs01=limα0+1×eαψ(α)αeψ(α)α1eα1αs01=limα0+eψ(α)α1αeψ(α+1)eα1αs01=e1eψ(1)limα0+eα1αs01=e1eψ(1)limueuus0+1=.(21)

Since c=1 and limα0+π(α)αs0= s0, the result follows from the Theorem 2.6. ◻

3.6.1 - Modified MDI prior

Moala et al. 2013MOALA FA, RAMOS PL & ACHCAR JA. 2013. Bayesian Inference for Two-Parameter Gamma Distribution Assuming Different Noninformative Priors. Rev Colomb Eetad 36(2): 321-338., introduces a modified maximal data information (MMDI) prior given by

π9(α,β)βΓ(α)exp{(α1)ψ(α)Γ(α)α}.(22)

The joint posterior distribution for α and β, produced by the MMDI prior, is

π9(α,β|𝐱)βnα+1Γ(α)n+1{i=1nxiα}exp{βi=1nxi+(α1)ψ(α)Γ(α)α}.(23)

Theorem 3.9. The posterior density (23) is proper for every n, in which case the posterior moments for α and β are finite.

Proof. Following Abramowitz & Stegun 1972ABRAMOWITZ M & STEGUN IA. 1972. Handbook of Mathematical Functions. 10th ed. Washington, D.C.: NBS, p. 1046., limα0+Γ(α)α1=1 and limα0+ψ(α)α1=1. Thus limα0+ψ(α)Γ(α)=1 and

limα0+π9(α)α=limα0+1Γ(α)e(α1)ψ(α)Γ(α)αα=limα0+α1Γ(α)e(α1)ψ(α)Γ(α)α=1×e(1)(1)0=e0.(24)

On the other hand, limαψ(α)log(α)=1 and by the Stirling approximation (see Abramowitz & Stegun 1972) we have limα0+Γ(α)αα12eα=2π and limαΓ(α)α2=. Then

limαπ9(α)α12α=limα0+1Γ(α)e(α1)ψ(α)Γ(α)αα12α=limα0+αα12eαΓ(α)e(α1)ψ(α)Γ(α)=12πlimα0+e(11α)ψ(α)log(α)log(α)αα2Γ(α)=12πe1×1×0×0=12π0.(25)

Now, define

π9*(α)={α,if α1α12αif α1andχ(α)={α,if α1α12if α1.(26)

Then, from (24) and (25) we have π9(α)α0+π9*(α) and π9(α)απ9*(α), which implies that π9(α)π9*(α) from Proposition 2.4. However, π9*(α)χ(α) and the prior π9(β)χ(α)=βχ(α) leads to a proper posterior as well as posterior moments for every n by Theorem 2.6 and Theorem 2.7. Therefore αrβsπ9(α,β)αrβsπ9(β)π9*(α)αrβsπ9(β)χ(α) also leads to a proper posterior for every n, s and r which proves the result. ◻

The marginal posterior distribution for α is given by

π9(α|𝐱)(αψ(α)1)αΓ(nα+2)Γ(α)nexp{(α1)ψ(α)Γ(α)α}(i=1nxini=1nxi)nα.

The conditional posterior distribution for β is given by

π9(β|α,𝐱)Gamma(nα+2,i=1nxi).

3.7 - Tibshirani priors

Tibshirani 1989TIBSHIRANI R. 1989. Noninformative priors for one parameter of many. Biometrika 76(3): 604-608. discussed an alternative method to derive a class of objective priors π(θ1,θ2) where θ1 is the parameter of interest so that the credible interval for θ1 has coverage error O(n1) in the frequentist sense, i.e.,

P[θ1θ11α(π;X)|(θ1,θ2)]=1αO(n1),(27)
where θ11α(π;X)|(θ1,θ2) denote the (1α)th quantile of the posterior distribution of θ1. The class of priors satisfying (27) are known as matching priors up to O(n1). Mukerjee & Dey 1993MUKERJEE R & DEY DK. 1993. Frequentist validity of posterior quantiles in the presence of a nuisance parameter: higher order asymptotics. Biometrika 80(3): 499-505. discussed sufficiency and necessary conditions for a class of Tibshirani priors be matching prior up to o(n1).

Sun & Ye 1996SUN D & YE K. 1996. Frequentist validity of posterior quantiles for a two-parameter exponential family. Biometrika 83(1): 55-65. prove that the reference prior (13) is also a Tibshirani prior when α is the parameter of interest and β is the nuisance parameter and the Tibshirani prior when β is the parameter of interest and α is the nuisance parameter with order O(n1). They also proved that when α is the parameter of interest, there is no matching prior up to order o(n1). Finally, they present a Tibshirani prior when β is the parameter of interest that is matching prior up to order o(n1), such prior is given as follows

π10(α,β)αψ(α)1βα.(28)

The joint posterior distribution for α and β, produced by the Tibshirani prior (28) is given by

π10(α,β|𝐱)(αψ(α)1)αβnα1Γ(α)n{i=1nxiα}exp{βi=1nxi}.(29)

Theorem 3.10. The posterior density (29) is proper if and only if n2, in which case the posterior moments for α and β are finite.

Proof. We proved in Theorem 3.3 that αψ(α)1z0+α12 and that αψ(α)1zα12. From that, it follows that

αψ(α)1αz0+α1α12=α32 and αψ(α)1αzα1α12=α32.
Thus c=1 and s0=s=32, therefore the result follows directly from the Theorem 2.6 and Theorem 2.7. ◻

The conditional posterior distribution for β is (8). The marginal posterior distribution for α is given by

π10(α|𝐱)(αψ(α)1)αΓ(nα)Γ(α)n(i=1nxini=1nxi)nα.

3.8 - Consensus prior

A rather natural approach to find an objective prior is to start with a collection of objective priors and take its average. Berger et al. 2015BERGER JO, BERNARDO JM & SUN D. 2015. Overall objective priors. Bayesian Anal 10(1): 189-221. discussed this prior averaging approach under the two most natural averages, the geometric mean and the arithmetic mean.

3.8.1 - Geometric mean

Let πi(α,β),i=3,5,6,7,10 be a collection of objective priors. Such priors were selected conveniently due its invariance property under one-to-one transformations. Then, our geometric mean (GM) prior is given by

π11(α,β)1β(αψ(α)1)52ψ(α)12α3251βαψ(α)1ψ(α)10α310.(30)

Note that, since our prior was constructed as a geometric mean of one-to-one invariant priors then such prior has also invariance property under one-to-one transformations.

The joint posterior distribution for α and β, produced by the consensus prior, is

π11(α,β|𝐱)ψ(α)110αψ(α)1α310βnα1Γ(α)n{i=1nxiα}exp{βi=1nxi}.(31)

Theorem 3.11. The posterior density (31) is proper if and only if n2, in which case the posterior moments for α and β are finite.

Proof. The result follows directly from the Theorem 2.8 and by Theorem 2.7. ◻

The conditional posterior distribution for β is (8). The marginal posterior distribution for α is given by

π11(α|𝐱)ψ(α)110αψ(α)1α310Γ(nα)Γ(α)n(i=1nxini=1nxi)nα.

3.8.2 - Arithmetic mean

Let πi(α,β),i=3,5,6,7,10 be a collection of objective priors. Then, our arithmetic mean (AM) prior is given by

π12(α,β)π12(α)β
where
π12(α)=(2αψ(α)1+αψ(α)+α2ψ(α)α+αψ(α)1α).

The joint posterior distribution for α and β, produced by the consensus prior, is

π12(α,β|𝐱)π12(α)βnα1Γ(α)n{i=1nxiα}exp{βi=1nxi}.(32)

Theorem 3.12. The posterior density (32) is proper if and only if n2, in which case the posterior moments for α and β are finite.

Proof. The result follows directly from the Theorem 2.8 and by Theorem 2.7. ◻

The conditional posterior distribution for β is (8). The marginal posterior distribution for α is given by

π12(α|𝐱)π12(α)Γ(nα)Γ(α)n(i=1nxini=1nxi)nα.

4 - NUMERICAL EVALUATION

A simulation study is presented to compare the influence of different objective priors in the posterior distributions and select an objective prior that return good results in terms of the mean relative errors (MRE) and the mean square errors (MSE), given by

MREi1Nj=1Nθ̂i,jθi and MSEi=j=1N(θ̂i,jθi)2N,i=1,2
where 𝛉=(α,β) and N=10,000 is the number of estimates obtained through the posterior means of α and β. The 95% coverage probability (CP95%) of the credibility intervals for α and β are evaluated. Considering this approach, the best estimators will show MRE closer to one and MSE closer to zero. In addition, for a large number of experiments considering a 95% confidence level, the frequencies of intervals that covered the true values of 𝛉 should be closer to 95%.

The results were computed using the software R. Considering n=(10,20,,120) the results were presented only for 𝛉=((4,2),(0.5,5)) for reasons of space. However, the following results were similar for other choices of α and β. Using the MCMC methods, we computed the posterior mean for α, β and the credibility (confidence) intervals for both parameters. In terms of decision theory, we have considered the squared error loss function (SELF) as the loss function. Moreover, the posterior mean is finite for n2 and has optimality under the Kullback-Leibler divergence. Tables I and II available in Appendix B present the MREs, MSEs and CP95% from the different estimators of α and β .

From these results, for both parameters the posterior mean using the Tibshirani prior indicates better performance than the obtained with other priors in terms of MREs and MSEs. The better performance of this approach is also confirmed through the coverage probability obtained from the credibility intervals. It is worth mentioning that the fact that the Tibshirani prior has frequentist coverage close to the nominal is a consequence of its construction. Although we have presented here only one scenario for the parameters, the results were similar for other choices of 𝛉. Overall, we conclude that the posterior distribution obtained with Tibshirani prior should be used to make inference on the parameters of the Gamma distribution.

5 - DISCUSSION

In this study, we presented a theorem that provides simple conditions under which improper prior yields a proper posterior for the Gamma distribution. Further, we provided sufficient conditions to verify if the posterior moments of the parameters are finite. An interesting aspect of our findings are that one can check if the posterior is proper or improper and also if its posterior moments are finite looking directly at the behavior of the proposed improper prior.

The proposed methodology is applied in different objective priors. The MDI prior was the only one that yield an improper posterior for any sample sizes. An extensive simulation study showed that the posterior distribution obtained under Tibshirani prior provided more accurate results in terms of MRE, MSE and coverage probabilities. Therefore, this posterior distribution should be used to make inference in the unknown parameters of the Gamma distribution. This study can be extended for other distributions, for instance, in a homogeneous Poisson process, the lengths of inter-arrival times can be modeled using an exponential distribution Exp(λ) with the following hierarchical structure

y1,,ynf(y|λ)λGamma(α,β)π(α,β)π(α)π(β).
In this case we have a posterior distribution π(λ,α,β|𝐲) that depends on three parameters (see Papadopoulos 1989PAPADOPOULOS AG. 1989. A hierarchical approach to the study of the exponential failure model. Commun Stat-Theor M 18(12): 4375-4392.). Although the results presented here can not be used to select the best prior due to the additional λ parameter, the same approach will be considered in further research.

ACKNOWLEDGMENTS

The authors are thankful to the Editorial Board and two reviewers for their valuable comments and suggestions which led to this improved version. Pedro L. Ramos is grateful to the São Paulo State Research Foundation (FAPESP Proc. 2017/25971-0). Eduardo Ramos acknowledges financial support from S~ao Paulo State Research Foundation (FAPESP Proc. 2019/27636-9). Francisco Louzada is supported by the Brazilian agencies CNPq (grant number 301976/2017-1) and FAPESP (grant number 2013/07375-0).

REFERENCES

  • ABRAMOWITZ M & STEGUN IA. 1972. Handbook of Mathematical Functions. 10th ed. Washington, D.C.: NBS, p. 1046.
  • BAR-LEV SK & REISER B. 1982. An exponential subfamily which admits UMPU tests based on a single test statistic. Ann Stat 979-989.
  • BERGER JO, BERNARDO JM & SUN D. 2015. Overall objective priors. Bayesian Anal 10(1): 189-221.
  • BERNARDO JM. 1979. Reference posterior distributions for Bayesian inference. J Roy Stat Soc B p. 113-147.
  • BERNARDO JM. 2005. Reference analysis. Handb Stat 25: 17-90.
  • DEY S & MOALA FA. 2018. Objective and subjective prior distributions for the Gompertz distribution. An Acad Bras Cienc 90: 2643-2661.
  • FOLLAND GB. 1999. Real analysis: modern techniques and their applications. 2nd ed. New York: Wiley, 408 p.
  • GELFAND AE & SMITH AF. 1990. Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85(410): 398-409.
  • JEFFREYS H. 1946. An invariant form for the prior probability in estimation problems. P Roy Soc A-Math Phy 186(1007): 453-461.
  • KASS RE & WASSERMAN L. 1996. The selection of prior distributions by formal rules. J Am Stat Assoc 91(435): 1343-1370.
  • LISEO B. 1993. Elimination of nuisance parameters with reference priors. Biometrika 80(2): 295-304.
  • LOUZADA F & RAMOS PL. 2018. Efficient closed-form maximum a posteriori estimators for the gamma distribution. J Stat Comput Sim 88(6): 1134-1146.
  • MILLER RB. 1980. Bayesian analysis of the two-parameter gamma distribution. Technometrics 22(1): 65-69.
  • MOALA FA, RAMOS PL & ACHCAR JA. 2013. Bayesian Inference for Two-Parameter Gamma Distribution Assuming Different Noninformative Priors. Rev Colomb Eetad 36(2): 321-338.
  • MUKERJEE R & DEY DK. 1993. Frequentist validity of posterior quantiles in the presence of a nuisance parameter: higher order asymptotics. Biometrika 80(3): 499-505.
  • NORTHROP P & ATTALIDES N. 2016. Posterior propriety in Bayesian extreme value analyses using reference priors. Stat Sinica 26(2).
  • PAPADOPOULOS AG. 1989. A hierarchical approach to the study of the exponential failure model. Commun Stat-Theor M 18(12): 4375-4392.
  • RAMOS PL, ALMEIDA MP, TOMAZELLA VL & LOUZADA F. 2019. Improved Bayes estimators and prediction for the Wilson-Hilferty distribution. An Acad Bras Cienc 91: e20190002.
  • SUN D & YE K. 1996. Frequentist validity of posterior quantiles for a two-parameter exponential family. Biometrika 83(1): 55-65.
  • TIBSHIRANI R. 1989. Noninformative priors for one parameter of many. Biometrika 76(3): 604-608.
  • ZELLNER A. 1977. Maximal Data Information Prior Distributions. New Meth Appli Bay Meth 211-232.
  • ZELLNER A. 1984. Maximal Data Information Prior Distributions. Bas Iss Econ, 334 p.

APPENDIX A

PROOF OF THEOREM 2.7

Proof. Let

d(𝐱)π(α)βnα+cΓ(α)n{i=1nxiα}exp{βi=1nxi}d𝚯(32)

Since π(α)βnα+cΓ(α)ni=1nxiαexp(βi=1nxi)0, by the Fubini-Tonelli Theorem (see Folland 1999FOLLAND GB. 1999. Real analysis: modern techniques and their applications. 2nd ed. New York: Wiley, 408 p.) we have

d(𝐱)π(α)βnα+cΓ(α)n{i=1nxiα}exp{βi=1nxi}d𝚯=0π(α)Γ(α)n{i=1nxiα}0βnα+cexp{βi=1nxi}dβdα.(33)

The rest of the proof is divided in three items which are given bellow:

Case i): Suppose c <1. Notice that 0xk1ehxdx= for any k0 and h. Then, for 0 <α<c1n we have nα+c <n(c1)n+c=1, and it follows that

d(𝐱)0π(α)Γ(α)n{i=1nxiα}0βnα+cexp{βi=1nxi}dβdα0c1nπ(α)Γ(α)n{i=1nxiα}0βnα+cexp{βi=1nxi}dβdα=0c1nπ(α)Γ(α)n{i=1nxiα}×dα=0c1ndα=.
and the case i) is proved.

Now suppose c1. Denoting

v(α)=π(α)Γ(nα+c+1)Γ(α)n and q(𝐱)=log(1ni=1nxii=1nxin)>0,
we have that q(𝐱)>0 by the inequality of the arithmetic and geometric means, and
d(𝐱)=0v(α)(i=1nxi)α(i=1nxi)nα+c+1dα0v(α)1nnα(i=1nxin)nα(1ni=1nxi)nαdα=0v(α)nnαenq(𝐱)αdα=01v(α)nnαenq(𝐱)αdα+1v(α)nnαenq(𝐱)αdα=d0(𝐱)+d(𝐱),
where d0(𝐱)=01v(α)nnαenq(𝐱)αdα and d(𝐱)=1v(α)nnαenq(𝐱)αdα.

Then d(𝐱) < if and only if d0(𝐱) < and d(𝐱) <. These results lead us to the two remaining cases.

Case ii): Suppose c1 and limα0+π(α)αs= s. From Abramowitz & Stegun (1972), we have Γ(z)z0+1z. Then, if c=1

d0(𝐱)=01π(α)Γ(nα)Γ(α)nnnαenq(𝐱)αdα01π(α)1nα1αn×1×1dα01π(α)αn1dα=1π(u1)un1du=,
where the last equality comes from the fact that limuπ(u1)un1=limα0+π(α)αn+1=. Therefore, d(𝐱)= if c=1.

On the other hand, if c>1 then nα+c+1>0 for α>0, which implies Γ(nα+c+1)α0+1 and

d0(𝐱)=01π(α)Γ(nα+c+1)Γ(α)nnnαenqαdα01π(α)1αn×1×1dα=01π(α)αndα=1π(u1)un2du=.
Therefore, d(𝐱)= if c>1 and the case ii) is proved.

Case iii): Suppose that c1 and the behavior of π(α) is given by

π(α)α0+αs0 andπ(α)ααs,
where s0 and s. Following Abramowitz & Stegun 1972, p. 260, we obtain that Γ(z)zzz12ez and Γ(z+a)zΓ(z)za for a+. Then Γ(nα+c+1)αΓ(nα)(nα)c+1 and
v(α)=π(α)Γ(nα+c+1)Γ(α)nααs(nα)nα12enα(nα)c+1αnαn2enααs+c+1(nα)nα12αnαn2αs+c+n+12nnα.

Therefore

d(𝐱)=1v(α)nnαenq(𝐱)αdα1αs+c+n+12enq(𝐱)αdα=Γ(s+c+n+12,nq(𝐱))(nq(𝐱))s+c+n+12 <,
i.e., d(𝐱) < for all s. Therefore d(𝐱) <d0(𝐱) <.

Now, following the same from case ii), if c=1 we have

d0(𝐱)=01π(α)Γ(nα)Γ(α)nnnαenqαdα01αs01nα1αndα01αs0+n1dα,
i.e., d(𝐱) < if and only if n>s0 when c=1. On the other hand, if c>1
d0(𝐱)=01π(α)Γ(nα+c+1)Γ(α)nnnαenqαdα01αs01αndα=01αs0+ndα,
i.e., d(𝐱) < if and only if n>s01 when c>1 and the proof is completed. ◻

APPENDIX B

Table I
The 𝐂𝐏95% from the estimates of 𝛍 and 𝛀 considering different values of 𝐧 with N = 10,000 simulated samples.
Table II
The MRE(MSE) for for the estimates of α and β considering different sample sizes.

Publication Dates

  • Publication in this collection
    03 Dec 2021
  • Date of issue
    2021

History

  • Received
    22 Nov 2019
  • Accepted
    8 Feb 2020
Academia Brasileira de Ciências Rua Anfilófio de Carvalho, 29, 3º andar, 20030-060 Rio de Janeiro RJ Brasil, Tel: +55 21 3907-8100 - Rio de Janeiro - RJ - Brazil
E-mail: aabc@abc.org.br