Acessibilidade / Reportar erro

On estimation and influence diagnostics for a bivariate promotion lifetime model based on the FGM copula: a fully bayesian computation

Abstracts

In this paper we propose a bivariate long-term model based on the Farlie-Gumbel-Morgenstern copula to model, where the marginals are assumed to be long-term promotion time structured. The proposed model allows for the presence of censored data and covariates. For inferential purpose a Bayesian approach via Markov Chain Monte Carlo is considered. Further, some discussions on the model selection criteria are given. In order to examine outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. The newly developed procedures are illustrated on artificial and real data.

Bayesian approach; case deletion influence diagnostics; copula modeling; long-term survival


Neste artigo nós propomos um modelo bivariado de longa duração baseado na copula de Farlie-Gumbel-Morgenstern, onde assumimos marginais com estrutura de tempo de promoção. O modelo proposto permite a presença de dados censurados e de covariáveis. Para fins inferenciais foi considerada uma abordagem bayesiana usando métodos MonteCarlo em Cadeias de Markov. Além disso, algumas discussões sobre os critérios de seleção de modelos são apresentadas. A fim de detectar outliers e observações influentes, nós apresentamos um método bayesiano de análise de influência de deleção de caso baseado na divergência de Kullback-Leibler. Os procedimentos desenvolvidos são ilustrados em dados artificiais e reais.

Abordagem Bayesiana; Diagnóstico de influência de deleção de caso; modelagem copula; sobrevivência de longa duração


On estimation and influence diagnostics for a bivariate promotion lifetime model based on the FGM copula: a fully bayesian computation

A.K. Suzuki* * Corresponding author: Adriano Kamimura Suzuki ; F. Louzada; V.G. Cancho

Department of Applied Mathematics and Statistics, University of São Paulo, Brazil. E-mails: suzuki@icmc.usp.br; louzada@icmc.usp.br; garibay@icmc.usp.br

ABSTRACT

In this paper we propose a bivariate long-term model based on the Farlie-Gumbel-Morgenstern copula to model, where the marginals are assumed to be long-term promotion time structured. The proposed model allows for the presence of censored data and covariates. For inferential purpose a Bayesian approach via Markov Chain Monte Carlo is considered. Further, some discussions on the model selection criteria are given. In order to examine outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. The newly developed procedures are illustrated on artificial and real data.

Keywords: Bayesian approach, case deletion influence diagnostics, copula modeling, long-term survival.

RESUMO

Neste artigo nós propomos um modelo bivariado de longa duração baseado na copula de Farlie-Gumbel-Morgenstern, onde assumimos marginais com estrutura de tempo de promoção. O modelo proposto permite a presença de dados censurados e de covariáveis. Para fins inferenciais foi considerada uma abordagem bayesiana usando métodos MonteCarlo em Cadeias de Markov. Além disso, algumas discussões sobre os critérios de seleção de modelos são apresentadas. A fim de detectar outliers e observações influentes, nós apresentamos um método bayesiano de análise de influência de deleção de caso baseado na divergência de Kullback-Leibler. Os procedimentos desenvolvidos são ilustrados em dados artificiais e reais.

Palavras-chave: Abordagem Bayesiana, Diagnóstico de influência de deleção de caso,modelagem copula, sobrevivência de longa duração.

1 INTRODUCTION

In survival and reliability analysis we can observe two lifetimes for a same patient or equipment. For instance, in the medical area we can have interest in studying the lifetimes of matched human organs, as kidneys and eyes, and double recurrence of a certain disease. In industrial applications this type of data can occur for systems whose duration times depend on the durability of two components. For instance, the damage of dual generators in a power plant or the lifetime of motors in a twin-engine airplane. The examples set up above are illustrations of bivariate survival data. In literature, an extensive list of papers on modeling multivariate survival data can be found. For instance, we can cite [1, 17, 24] amongst others.

Bivariate survival data in general are correlated and the study of that dependence has been focus of many researches. The most popular approach is the frailty models in that one or more random effects are included in the model in order to model the dependence between the observations (see for instance, [6, 46, 25, 34, 47, 24]). In this case, the marginal times are conditionally independent given the frailty variable. Details on early developments of the such models may be seen in [6]. [40] considered a comparative study between the frailty models and the bivariate models based on the Exponential and Weibull distributions for bivariate survival data according to Bayesian criteria.

Recently, the copula models (see for instance, [16, 45, 33, 11, 10, 15, 49, 29]) have become a popular tools to model the dependence of multivariate data, especially in biological areas, actuarial sciences and finances. [35, 22] show some relationships between frailty models and copula models.

A copula is a function that connects the marginal distributions to restore the joint distribution. Different copula functions represent different dependence structures between variables ([33]). By comparing to the joint distribution approach, a copula model is a more convenient tool in studying the dependence structure. Indeed it is more flexible in applications, since, when the scatter of the data does not fit any known family of joint distributions, it may be difficult to specify the joint distribution. Using copulas, however, we can first estimate the marginal distributions and then estimate the copula. Another advantage of the copula modeling is its relatively mathematical simplicity. Also, it is possible to build a variety of dependence structures based on parametric or non-parametric models for the marginal distributions.

In survival analysis, models based on copulas are considered in [26, 34, 41, 23]. Particularly, from a Bayesian perspective, [39] considered an application of the Archimedian copula family for modeling the dependence of bivariate survival. In their analysis, the author considered a Weibull distribution for the marginal distributions of the bivariate lifetime components.

A difficulty arises if a part of the population is not susceptible to the event of interest. For instance, in bivariate clinical studies a population can respond favorably to a treatment, being considered cured. Considering univariate lifetimes, models which assumes part of the population is cured have been widely developed and are usually called long term survival models. Perhaps the most popular type of cure rate model is the mixture model introduced by [3, 2]. In this model, it is assumed that a certain proportion of the patients, say p, are cured, in the sense that they do not present the event of interest during a long period of time and can be seen as to be immune to the cause of failure under study. Later on, the literature on mixture long-term model is extensive and and interested readers can refer to [31, 37, 38], amongst others.

Long-term promotion time model ([48, 7]) assume that the number of causes or risks of a particular event of interest follows a Poisson distribution with mean θ. If the number of these cases follows a Bernoulli distribution with parameter p, we have the mixture model ([3, 2]), in other words, there is a single cause of occurrence. In this paper, we consider a copula model based on the Farlie-Gumbel-Morgenstern (FGM) distribution (see, [12]) assuming promotion time models as marginal distributions.

The motivation for considering the FGM copula is modeling bivariate data with a weak dependence between marginal times. Also, the FGM copula can measure both positive and negative dependence, while the usual copulas, such as the Archimedian copula family, can only measure positive one. Then, the main objective of the paper is to present the FGM long-term bivariate survival copula model assuming promotion time models as marginal distributions, and to develop diagnostic measures from a Bayesian perspective based on the Kullback-Leibler (K-L) divergence to the proposed modeling as proposed by [36]. For each individual lifetime variable we consider a long-term Weibull promotion time models. For inferential purpose joint estimation was performed and a sampling based approach via Markov Chain Monte Carlo (MCMC) was considered. A simulation study and an application on real data illustrates our approach. We compare the methodology propose with the Positive stable frailty (PSF) model ([25]).

The paper is organized as follows. Section 2 presents a survival bivariate model by considering a FGM copula distribution. Section 3 presents the Bayesian inferential procedure as well as some discussions on the model selection criteria are given. Section 4 a Bayesian case deletion influence diagnostics based on the K-L divergence is presented. Section 5 presents the results of a simulation study and re-analyses the dataset on the efficacy of photocoagulation treatment for proliferative retinopathy on diabetic patients reported by The Diabetic Retinopathy Study ([44]) comparing the methodology propose with the Positive stable frailty (PSF) model ([25]). Final comments in Section 6 conclude the paper.

2 THE BIVARIATE SURVIVAL MODEL BASED ON FGM COPULA

Long-term survivors models have been widely used for fitting data where some individuals may never suffer the cause of failure under study. In this type of modeling, it is assumed that, due to some unobserved prognostic factors, a certain fraction π of the population is immune to the cause of failure under study or a long-term survivor. The survivor function for the entire population can be written as

The long-term survivors cannot be identified but we can infer their presence in a data set if many of the largest times are censored. Common choices for S0(t) are the Gompertz, Exponential and Weibull distributions. For this model, the cure rate π = exp{-θ}. When S0(t) has an Exponential and Weibull distributions, we have the so called exponential promotion time and Weibull promotion time models, respectively.

The model (2.1) can be rewritten as follows (see, [7]),

where

obtaining a mathematical relationship between the model mixture model ([3], [2]).

In order to define a copula we first suppose that Cϕ is a distribution function with density function cϕ on [0, 1]2 for ϕ ∈. Then, let (T1, T2) denote the paired failure times, Spop j and fpop j denoting, respectively, the marginal long-term survival functions and the marginal long-term density function of Tj, j = 1, 2. Consider (T1, T2) comes from the Cϕ copula for some ϕ then the joint survival and density function of (T1, T2) are given

and

respectively. Notice that the marginal distributions and the dependence structure can be visualized separately and this dependence structure is represented by a copula. Following [12], considering the Farlie-Gumbel-Morgenstern distribution, hereafter FGM copula, we have thecopula distribution function given as

where 0 < u, v < 1 e -1 < ϕ < 1.

For the FGM copula, Cϕ given in (5), the Kendall's tau ([33]) is given by

for ϕ = 0, τϕ is equals 0.

Other dependence measures between the variables T1 and T2 can be found in [33]. For instance, if the random variables T1 and T2 have Weibull distribution with density function

it can be proved that the Pearson's correlation coefficient between T1 and T2, is given by

observe when ϕ = 0 the correlation coefficient, ρ is equals 0. When Tj has Exponential distribution, with parameter αj = 1, j = 1,2, the correlation coefficient is given by, ρ = ϕ/4.

Consider (T1, T2) comes from the FGM copula (2.5) then the joint long-term survival of (T1, T2) are given by

where ϕ parameter measures the intensity of the dependence between the lifetimes. Observe that when ϕ = 0, Spop(t1, t2) = Spop1(t1)Spop2(t2), leading to the conclusion that the random variables T1 and T2 are independent.

At least in principle, other choices of Cϕ in (3) can be made straightforwardly, leading to different copula survival models with promotion time models as marginal distributions. For instance, according to the dependence structure, other copula models can be equally considered such as the Gaussian copula as well as the Frank, Clayton and positive stable frailty families (Archimedean copulas) amongst others. The construction of the inferential procedure will be analogous to what shall be presented below, differing in the choice of the a prior distribution for the parameter dependence of the copula chosen. We however, in terns of comparison, consider in the numerical sections the positive stable frailty family.

3 INFERENCE

For inference, we adopt a full Bayesian approach and assume lifetimes in presence of random censoring. The likelihood function, prior distributions for the parameters in the model, details of the MCMC algorithm and the model comparison are described below.

Let (Ti1, Ti2) the ith bivariate lifetime and (Ci1, Ci2) the random censored bivariate lifetimes, for i = 1,..., n. Suppose that (Ti1, Ti2) and (Ci1, Ci2) are independent. For each individual i observed quantities are represented by the random variables tij = min(Tij, Cij) and δij = I(tij = Tij), which denotes a censorship indicator, j = 1, 2. Let Spop(t11) and Spop(t21) be the survival functions of Ti1 and Ti2, respectively, where γ1 and γ2 are parameter vectors of q1and q2 elements associated to each one of the marginal distributions.

Considering the bivariate survival function Spop(t1, t2|ϕ, γ1, γ2) given in (2.9), the contribution of the ith individual for the log-likelihood of θ = (ϕ, γ1, γ2) is given by [30]

3.1 Prior and posterior densities

The use of the Bayesian method besides being an alternative statistical approach, it allows the incorporation of previous knowledge of the parameters through an informative prior distribution. When there is not such previous knowledge one may consider a non-informative prior structure. For instance, in order to carry out a Bayesian inference procedure for the bivariate survival model of giving in (2.9), consider as in Section 2, that the parametric distribution family of the marginal lifetimes T1and T2 are known and indexed by the parameter vectors, γ1 and γ2, respectively. In order to guarantee proper posterior distributions, we adopt proper prior distributions according to the variation of the parametric space, but ensuring non-informativeness according to the fixed hyper-parameters, which lead to such situation. Thus, we consider a prior distribution ofθ = (ϕ, γ1, γ2) given by

Equation (3.2) implies that (1 - ϕ)/2 follows a Beta(r1, r2) distribution, π(γj) is the prior distribution of γj and that γ1, γ2 and ϕ are mutually independent, also ϕ ∈ (-1, 1).

Combining (3.2) with the likelihood function, L(θ) = exp((θ)), where i(θ) is given in (3.1), we straightforwardly obtain the joint posterior distribution of θ), π(θ)|), where is the observed dataset.

For cure rate model (2.1), the parameter vectors γj, consists of the parameters of the distribution chosen for S0(t) and the parameter θj, j = 1, 2. Following [7] we incorporate covariates for the parametric cure rate model (2.1) through the cure rate parameter θ as follows: For each subject i, i = 1,..., n, and for each time j, j = 1, 2, we consider θji≡ θ(x'iβj) = exp(x'iβj), where xi' = (xi1,..., xiq) denote the q × 1 vector of covariates for the ith subject and βj = (βj1,..., βjq)' denote the corresponding vector of regression coefficients. We choose the following independent prior distributions: β0j ~ Normaloj, ), β1j ~ Normal1j, ) j = 1, 2.

3.2 Computation

In the Bayesian approach, the target distribution for inference is the posterior of the parameters of interest. For this, we need to obtain the marginal posterior densities of each parameter, which are obtained by integrating the joint posterior density with respect to each parameter.

We point out that for any marginal distribution of T1and T2 the joint posterior distribution is not tractable analytically but Markov chain Monte Carlo (MCMC) methods such as the Gibbs sampler, can be used to draw samples, from which features of the marginal posterior distributions of interest can be inferred [21]. For the estimation procedure we consider joint estimation whereall the model parameters are estimated simultaneously in the MCMC algorithm. The Gibbssampler is an iterative procedure of a broad class of methods generically named Markov Chain Monte Carlo (MCMC). Many practical aspects of the MCMC methodology are described in [18, 17]. This method is applicable in situations where one is not able to generate samples directly from the joint posterior density. It however requires the full conditional densities for generating samples.

For each time Tj we assume that S0(tj) given in (2.1) has the Weibull distribution given in (2.7) with parameters αj and λj, j = 1, 2. We choose the following independent prior distributions, αj ~ Gamma(aj, bj), λj ~ Gamma(cj, dj) j = 1, 2.

Combining the likelihood function (3.1) and the prior distribution (3.2), we can obtain the joint posterior density of all unobservables which is not tractable analytically but MCMC methods such as the Gibbs samples, can be used to draw samples, from which features of marginal posterior distributions of interest can be inferred. Considering = (t1, t2, d1, d2, x), where for each individual i, i = 1, 2, ti and di, are the observed vectors of lifetimes and censoring tij and δij for j = 1,2, respectively.

The full conditional posterior densities for each parameter are given by

where

and

with

and

The conditional densities above do not belong to any known parametric density family. In order to generate our samples we then implement an Metropolis-Hasting algorithm within Gibbs iterations ([9]). The simulations were performed using the OpenBUGS software ([43]). OpenBUGS codes are available by mailing to one of the authors.

3.3 Model comparison criteria

In the literature, there are various methodologies which intend to analyze the suitability of a distribution, as well as selecting the best fit among a collection of distributions. In this paper we shall inspect some of the Bayesian model selection criteria; namely, the deviance information criterion (DIC) proposed by [42], the expected Akaike information criterion (EAIC) by [4], and the expected Bayesian (or Schwarz) information criterion (EBIC) by [5] was used. These criteria are based on the posterior mean of the deviance, E{D(θ)}, which is also a measure of fit and can be approximated from the MCMC output by

where the index v indicates the v-th realization of a total of V realizations and

where g(.) is pdf corresponding to our model. For observed data we have that g(t1i, t2i) = f1(t1i)f2(t2ii, for i = 1,...,n, with Λi = 1 + α(1 - 2Spop1(t1i) - 2Spop2(t2i) + 4Spop1(t1i)Spop2(t2i)), fpopj(.) and Spopj(.), j = 1, 2, are Weibull promotion density function and Weibull promotion survival function, respectively. The EAIC, EBIC and DIC criteria can be calculated using the MCMC output by means of = Dbar + 2q, = Dbar + q log(n) and = Dbar + = 2Dbar - Dhat, respectively, where q is the number of parameters in the model and ρD is the effective number of parameters, defined as E{D(θ)} - D{E(θ)}, where D{E(θ)} is the deviance evaluated at the expected values of the posterior distributions, which can be estimated as

Comparing alternative models, the preferred model is the one with the smallest criteria values.

Another criteria which is one of the most used in applied works is derived from the conditional predictive ordinate (CPO) statistics. For a detailed discussion on the CPO statistics and its applications to model selection, see [19]. Let the full data and (-i) denote the data with the ith observation deleted. We denote the posterior density of θ given (-i) by π(θ|(-i)), for i = 1,...,n. For the ith observation, the CPOi can be written as

For the proposed model a closed form of the CPOi is not available. However, a Monte Carlos estimate of CPOi can be obtained by using a single MCMC sample from the posterior distribution π(θ |). Let θ1, θ2,...,θQ be a sample of size Q of π(θ|) after the burn-in. A Monte Carlo approximation of CPOi ([8]) is given by

A summary statistics of the CPOi's is B = . The larger is the value of B, the better is the fit of the model.

4 BAYESIAN CASE INFLUENCE DIAGNOSTICS

The best known pertubation schemes are based on case deletion ([13]), in which the effects are studied of completely removing cases from the analysis. This reasoning will form the basis for our Bayesian global influence methodology and in doing so it will be possible to determine which subjects might be influential for the analysis. In order to investigate if some of the observations are influential for the analysis, we considered a Bayesian case influence diagnostic procedure based on the Kullback-Leibler (K-L) divergence between P and P(-i), denoted by K(P, P(-i)), where P denotes the posterior distribution of θ for the full data, and P(-i) denotes the posterior distribution of θ dropping the i-th observation. Specifically,

The K(P, P(-i)) measures the effect of deleting of i-th observation from the full data on the joint posterior distribution of θ. As pointed by [36], calibration of K(P, P(-i)) can be done by solving for pi the equation K(P, P(-i)) = K[B(0.5), B(pi)] = -log[4pi(1-pi)]/2 where B(p) denotes the Bernoulli distribution with success probability p. This implies that describing outcomes using π(θ|) instead of π(θ|(-i)) is compatible with describing an unobserved event as having probability pi when correct probability is 0.5. After some algebraic manipulation it can be shown that pi = . This equation implies that 0.5 < pi < 1.

That is, if pi >> 0.5 then the ith case is considered influential.

For our case it can be shown that (4.1) can be expressed as a posterior expectation

where Eθ|(.) denotes the expectation with respect to the joint posterior π(θ|). Thus (4.2) can be estimated by sampling from the posterior distribution of θ via MCMC methods. Let θ1, θ2,..., θQ be a sample of size Q of π(θ|). Then, a Monte Carlo estimate of K(P, P(-i)) is given by

5 APPLICATIONS

In this section, results from simulation studies and a real data example are presented in order to illustrate the performance of the proposed methodology.

5.1 Artificial Data

In the study we considered the survival model given in (9), assuming that S0(tj) has the Weibull distribution given in (2.7) with parameters αj and λj, j = 1,2. The covariates xi were generated from a Bernoulli distribution with parameter 0.5 and the artificial bivariate data (Ti1, Ti2), i = 1,... ,n, were generated assuming n = 160 according to the following steps: First we generated Ti1 = (-log(1 - ui1)/λi1) 1/α1 where ui1 ~ U(0, 1). Next Ti2 was generated using a random variable ui2 ~ U(0, 1) and the solution of the nonlinear equation, wi + ϕ(2u1i - 1)(wi2 - wi) - ui2 = 0, considering Ti2 = (-log(1 - wi)/λi2)1/α2. For each covariate value k, k = 0, 1, we fixed the cure rate πkj = expθkj value, j = 1, 2. For tkj simulating a cutoff point which was chosen so that the higher times represent πkj% of the lifetimes, and were assumed censored. The following cure rate values was fixed: π01 = 0.4, π11 = 0.3, π02 = 0.3 and π12 = 0.2. We considered the following values for the other parameters of the model α1 = 2, λ1 = 0.5, α2 = 1.5, λ2 = 0.2 and ϕ = 0.6.

For sake of illustration we also compare our FGM bivariate copula model with the well known positive stable frailty (PSF) model ([25]). The joint survival function of the PSF model isgiven by

When ϕ → 1, we obtain Spop(t1, t2) = Spop(t1)Spop(t2). The Kendall's tau7 is given by τϕ = 1 - ϕ.

For more details on frailty models where the random effects have a positive stable distribution interested readers can refer to [28, 32].

The following independent priors were considered to perform the Gibbs sampler:

βij ~ N(0, 103), λj ~ Gamma(1, 0.001) and αj ~ Gamma(1, 0.001),

where i = 0, 1 and j = 1, 2. For parameter of PSF model we assume ϕ ~ Beta(1, 1) and for FGM copula that 1 - 2π(ϕ) ~ Beta(1, 1), such choices guarantee that ϕ ∈ (-1, 1) and ensure non-informativeness.

The simulations were performed using the OpenBUGS software ([43]). For each generated data set we simulate two chain of size 50,000 for each parameter, disregarding the first 10,000 iterations to eliminate the effect of the initial values and to avoid autocorrelation problems, we consider a spacing of size 20, obtaining a effective sample of size 4,000 upon which the posterior inference is based on. For each sample the posterior mean of the parameter and the EAIC, EBIC, DIC and B are recorded. The MCMC convergence was monitored according to the methods recommended by [14] (CODA package). The number of iterations is considered sufficient for the approximate convergence since in all cases the Gelman-Rubin diagnostic is veryclose to 1.

Table 1 presents the summary for the FGM bivariate long-term survival model parameters.Table 2 presents the Bayesian criteria for the FGM and PSF models. The FGM model outperforms the PSF one in all considered criteria.

5.2 Influence of outlying observations

One of our main goals in this study is to show the need for robust models to deal with the presence of outliers in the data. In order to do so, we generated a sample of length 160 with fixed parameters α1 = 2, λ1 = 0.5, α2 = 1.5, λ2 = 0.2 and ϕ = 0.6. The percentage of censure was controlled as follows: for each covariate value k, k = 0,1, we fixed the cure rate πkj = exp{θkj} value, j = 1,2. For tkj simulating a cutoff point which was chosen so that the higher times represent πkj% of the lifetimes, and were assumed censored.

We selected cases 90, 107 and 120 for perturbation. The perturbation scheme were structured as following. To create influential observation artificially in the dataset, we choose one, two or three of these selected cases. For each case we perturbed one or both lifetimes as follows = ti + 7St, i = 1, 2, where St is the standard deviations of the ti's. For case 107 we perturbed only the lifetime t1 and for case 90, the lifetime t2, and for case 120, both lifetimes were perturbed.

The MCMC computations were made in a similar fashion to those in the last section and further to monitor the convergence of the Gibbs samples we also used the methods recommended by [14]. Table 3 shows that the posterior inferences are sensitive to the perturbation of the selected case(s). In Table 3, Dataset a denotes the original simulated data set with no perturbation, and Datasets b to g denote datasets with perturbed cases.

Table 4 displays the fit of different cases of the perturbed data set. We can observe that the original simulated data (Dataset a) had the best fit.

Now we consider the sample from the posterior distributions of the parameters of the model based on the FGM bivariate survival model to compute the K-L divergence and calibration of this divergence, as describe in Section 4. The results in Table 5 show, before perturbation (Dataset a), that all the selected cases are not influential with small K(P, P(-i)) and related calibration close to 0.596. However, after perturbations (Dataset b to i), the K(P, P(-i)) increases and the corresponding calibrations become larger than 0.5, indicating those cases are influential.

In the Figure 1 shows the K(P, P(-i)) for the FGM bivariate survival model. Clearly we can see that K(P, P(-i)) performed well to identifying influential case(s), providing larger K(P, P(-i)) when compared to the other cases.


We point out that, the way in which our copula survival model was constructed, we expect that K(P, P(-i)) detects the perturbed observations if the data is generated from a model assuming a particular copula and is adjusted assuming the other copulas. This is due to the fact that the marginals are the same, independent of the assumed copula function, differing only in the structure of dependence between them. Nevertheless, we considered the data sets generated according to the of FGM copula according to Subsection 5.1, with their perturbed observations and fitted the model assuming the FGM copula, as well as the Archimedean copulas PS, Frank and Clayton. In all situations K(P, P(-i)) detects the influential points in the same manner.

5.3 Real data

In this section we consider a Bayesian analysis of the dataset presented in [44] that consist of follow up times for 197 diabetic patients under 60 years of age. The main purpose of the study is to assess the efficacy of photocoagulation treatment for proliferative retinopathy. The treatment was randomly assigned to one eye of each patient and the other, was considered as a control. The first component is a vector of lifetimes which represents the time up to visual loss for the treatment eye (T1), while the second component (T2) is the time up to visual loss for the control eye. The subjects could be censored, which happened for 73% of the treated eyes and 49% of the untreated eyes. As [39] we consider as covariate the dichotomized patient age (less than 20 years and older than 20 years). For sake of illustration the perturbation scheme were structured as following. To create an influential observation artificially in the dataset we selected the case 100 and perturbed both lifetimes as follows: = ti + 7St, i = 1, 2, where St is the standard deviations of the ti's.

The data set was then reanalyzed though our approach by adopting the model in (2.1) with promotion time Weibull as marginal. The Bayes estimates were based on posterior samples recorded every 20th iteration from 50,000 Gibbs samples after a burn-in of 10,000 samples. Similarly to what was considered when the simulation study, we choose the following independent proper prior distributions: βij ~ N(0, 103), λj ~ Gamma(1, 0.001) and αj ~ Gamma (1, 0.001), where i = 0,1 and j = 1, 2. For parameter of PSF model we assume ϕ ~ Beta(1, 1) and for FGM copula that 1 - 2π(ϕ) ~ Beta(1, 1), such choices guarantee that ϕ ∈ (-1, 1) and ensure non-informativeness. To monitor the convergence of the Gibbs samples we uses the methods recommended by [14].

In Table 6 we report posterior summaries for the parameters under the FGM long-term bivariate survival copula model considering Weibull and Exponential promotion marginals for S0(t) given in (2.1), considering the original sample as well as the perturbed one. Following [20], we have checked the sensitivity of the routine use for gamma prior on the variance components and found that results are fairly robust under different prior. We also checked the sensitivity analysis for the variance components parameters for various choices of prior parameters by changing only on parameter at a time and keeping all other parameters constant to their default values. The posterior summaries of the parameters do not present remarkable difference and not impair the results in Table 6. Considering both the original and perturbed samples, we observed a small change in the posterior estimates of all parameters, showing the importance of verifying the posterior inference sensitivity to perturbations. Also, for both samples, the HPD credibility intervals for the location parameter of the marginal Weibull distributions contains the 1, leading to evidence in favor of a Exponential distribution to the marginal ones. Then, considering the FGM long-term bivariate survival copula model with exponential promotion marginals for S0(t) given in (2.1), the Kendall's tau (2.6) are equals to 0.184 and 0.173, respectively, for the original sample and the perturbed one, leading to the the existence of a weak dependence between the marginal quantities.

Considering the FGM long-term bivariate survival copula model with Exponential promotion marginals for S0(t) given in (1), we compute the K-L divergences and related calibrations for the original sample, but do not find highly influential cases. The K(P, P(-i)) are smaller that 0.096 and the corresponding calibrations are smaller than 1. For the perturbed sample however, for the case 100, the K(P, P(-i)) is equals to 1.019 and the corresponding calibration is equal to 0.966 (detecting the influence point). For the remaining cases, K(P, P(-i)) are smaller that 0.069 with corresponding calibrations smaller than 1. Figure 2 shows the index plot of K(P, P(-i)) from fitting the bivariate survival model based on FGM copula considering both samples (original and perturbed).

Table 7 presents the model comparison criteria discussed in Section 3 for comparing the FGM long-term bivariate survival copula model with Weibull promotion marginal distributions with its particular independence cases. There is positive evidence in favor The FGM model at the expense of the model which considers independence. Also, there is evidence in favor of the FGM long-term bivariate survival copula model with Exponential promotion marginal distributions.

Besides, we fitted the PSF model for the diabetic retinopathy study data considering both Weibull and Exponential marginal promotion distributions. Based on all Bayesian criteria, there is positive evidence in favor the FGM model at the expense of the PSF model, indicating that the FGM long-term bivariate survival copula model can be seen as a competitor to the PSF model, which is a well known model commonly used in literature for fitting bivariate lifetime data. As expected, all the criteria presented evidence in favor the original sample (without the influential point).

6 SOME FINAL REMARKS

In this paper, we presented the FGM long-term bivariate survival copula model. Parameter estimation is based on a Bayesian approach via MCMC. We have used Bayesian case influence diagnostic based on the Kullback-Leibler divergence in order to study the sensitivity of the Bayesian estimates under perturbations in the model/data. Finally, we demonstrate our approach with an artificial and a real dataset.

In the two-step approach, the marginal parameters are estimated firstly and then, the copula parameter is estimated in a second step. This approach provides consistent, but not efficient estimators using a frequentist approach. However, from the Bayesian perspective, one can estimate all the model parameters simultaneously via the MCMC algorithm such that the assumption of independence in the first step is avoided. We however also considered a two-step estimation method for all study. We omitted the results obtained since they are similar to those obtained by considering the joint estimation approach which we prefer.

Here we assume lifetimes in presence of random censoring. However, at least in principle, the proposed methodology can be develop straightforwardly for the type I and type II censoring particular cases. Interval and left censoring are also important and could be considered further.

ACKNOWLEDGMENTS

The researchers of Francisco Louzada and Vicente G. Cancho are supported by the Brazilian organization CNPq.

Received on November 28, 2012

Accepted on November 1, 2013

  • [1] H. Aslanidou, D.K. Dey & D. Sinha. Bayesian analysis of multivariate survival data using Monte Carlo methods. Canadian Journal of Statistics, 1 (2008), 33-48.
  • [2] J. Berkson & R.P. Gage. Survival cure for cancer patients following treatment. Journal of the American Statistical Association, 47(259) (1952), 501-515.
  • [3] J.W. Boag. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. Journal of the Royal Statistical Society B, 11(1) (1949), 15-53.
  • [4] S.P. Brooks. Discussion on the paper by Spiegelhalter, Best, Carlin, and van der Linde. Journal Royal Statistical Society B, 64(3) (2002), 616-618.
  • [5] B.P. Carlin & T.A. Louis. Bayes and Empirical Bayes Methods for Data Analysis, 2nd ed., Chapman & Hall/CRC, Boca Raton (2001).
  • [6] D.G. Clayton. A model for association in bivariate life-tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika, 65 (1978), 141-151.
  • [7] M.H. Chen, J.G. Ibrahim & D. Sinha. A new Bayesian model for survival data with a surviving fraction. Journal of the American Statistical Association, 94(447) (1999), 909-919.
  • [8] M.H. Chen, Q.M. Shao & J. Ibrahim. Monte Carlo Methods in Bayesian Computation. Springer-Verlag, New York (2000).
  • [9] S. Chib & E. Greenberg. Understanding the metropolis-hastings algorithm. The American Statistician, 49 (1995), 327-335.
  • [10] S.C. Chiou & R.S. Tsay. A Copula-based Approach to Option Pricing and Risk Assessment. Journal of Data Science, 6 (2008), 273-301.
  • [11] G. Colby & P. Rilstone. Simplified estimation of multivariate duration models with unobservedheterogeneity. Computational Statistics, 22 (2007), 17-29.
  • [12] D.A. Conway. Farlie-Gumbel-Morgenstern distributions, in Encyclopedia of Statistical Sciences 3,S. Kotz & N.L. Johnson, eds., John Wiley & Sons, New York, (1983), 28-31.
  • [13] R.D. Cook & S. Weisberg. Residuals and Influence in Regression. Chapman & Hall/CRC, BocaRaton (1982).
  • [14] M.K. Cowless & B.P. Carlim. Markov chain Monte Carlo convergence diagnostics: a comparative review. Journal of the American Statistical Association, 91 (1996), 883-904.
  • [15] G. De Luca & G. Rivieccio. Archimedean copulae for risk measurement. Journal of Applied Statistics, 36(8) (2009), 907-924.
  • [16] P. Embrechts, F. Linskog & A. McNiel. Modelling dependence with copulas and applications to risks management, available at http://www.math.ethz.ch/baltes/ftp/papers.html (2003).
  • [17] D. Gamerman & H.F. Lopes. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, 2nd ed., Chapman & Hall, Boca Raton (2006).
  • [18] A.E. Gelfand & A.F.M. Smith. Sampling-Based Approaches to Calculating Marginal Densities. Journal of the American Statistical Association, 85(410) (1990), 398-409.
  • [19] A.E. Gelfand, D.K. Dey & H. Chang. Model determination using predictive distributions with implementation via sampling-based methods. Bayesian Statistics 4, Peñíscola, Oxford Univ. Press,New York, (1991), 147-167.
  • [20] A. Gelman, J. Carlin & D. Rubin. Bayesian Data Analysis. Chapman & Hall/CRC, New York (2006).
  • [21] W.R. Gilks, S. Richardson & D.J. Spiegelhater. Markov Chain Monte Carlo in Practice. Chapman & Hall, London (1996).
  • [22] K. Goethals, P. Janssen & L. Duchateau. Frailty models and copulas: similarities and differences. Journal of Applied Statistics, 35(9) (2008), 1071-1079.
  • [23] P. Gustafson, D. Aeschliman & A.R. Levy. A simple approach to fitting Bayesian survival models. Lifetime Data Analysis, 9 (2003), 5-19.
  • [24] D. Hanagal. Modeling Survival Data Using Frailty Models. Chapman & Hall/CRC, Boca Raton (2011).
  • [25] P. Hougaard. A class of multivariate failure time distributions. Biometrika, 73 (1986), 671-678.
  • [26] P. Hougaard. Fitting a multivariate failure time distribution. IEEE Transactions on Reliability, 38 (1989), 444-448.
  • [27] P. Hougaard. Analysis of Multivariate Survival Data. Springer, Heidelberg (2000).
  • [28] J.G. Ibrahim, M-H. Chen & D. Sinha. Bayesian Survival Analysis. Springer Verlag, New York (2001).
  • [29] P. Jaworski. Copula Theory and Its Applications: Proceedings of the Workshop Held in Warsaw, 25-26 September 2009 Volume 198 de Lecture Notes in Statistics, eds. P. Jaworski, F. Durante, W. Härdle & T. Rychlik, Springer (2010).
  • [30] J.F. Lawless. Statistical Models and Methods for Lifetime Data. New York: Wiley & Sons (2003).
  • [31] R.A. Maller & X. Zhou. Survival Analysis with Long-Term Survivors. Wiley, New York (1996).
  • [32] A.K. Manatunga & D. Oakes. Parametric Analysis of Matched Pair Survival Data. Lifetime Data Analysis, 5 (1999), 371-387.
  • [33] R. Nelsen. An Introduction to Copulas, 2nd ed., Springer, New York (2006).
  • [34] D. Oakes. Bivariate survival models induced by frailties. Journal of the American Statistical Association, 84 (1989), 487-493.
  • [35] D. Oakes. On Frailty Models and Copulas Recent Advances in Statistical Methods. Proceedings of Statistics 2001 Canada: The 4th Conference in Applied Statistics, Montreal, (2001), 218-224.
  • [36] F. Peng & D. Dey. Bayesian analysis of outlier problems using divergence measures. The Canadian Journal of Statistics. La Revue Canadienne de Statistique, 23(2) (1995), 199-213.
  • [37] Y. Peng, K.B.G. Dear & J.W. Denham. A generalized F mixture model for cure rate estimation. Statistics in Medicine, 17 (1998), 813-830.
  • [38] J. Rodrigues, M. de Castro, V.G. Cancho & F. Louzada Neto. On the unification of long-survival models. Statistics and Probabilities Letters, 79(1) (2009), 753-759.
  • [39] J.S. Romeo, N.I. Tanaka & A.C. Pedroso de Lima. Bivariate survival modeling: a Bayesian approach based on Copulas. Lifetime Data Analysis, 12 (2006), 205-222.
  • [40] S.K. Sahu & D.K. Dey. A comparison of frailty and other models for bivariate survival data. Lifetime Data Analysis, 6 (2000), 207-228.
  • [41] J.H. Shih & T.A. Louis. Inferences on the association parameter in copula models for bivariatesurvival data. Biometrics, 51 (1995), 1384-1399.
  • [42] D.J. Spiegelhalter, N.G. Best, B.P. Carlin & A. van der Linde. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society B, 64 (2002), 583-639.
  • [43] D. Spiegelhalter, A. Thomas, N. Best & D. Lunn. OpenBUGS User Manual, version 3.0.2, MRC Biostatistics Unit, Cambridge; software available at http://mathstat.helsinki.fi/openbugs
  • [44] The Diabetic Retinopathy Study Research Group. Preliminary report on the efeect of photo-coagulation therapy. American Journal of Ophthalmology, 81 (1976), 383-396.
  • [45] P.K. Trivedi & D.M. Zimmer. Copula modelling: an introduction for practitioners. Foundations and Trends in Econometrics, 1 (2005), 1-111.
  • [46] J.W. Vaupel, K.G. Manton & E. Stallard. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography, 16 (1979), 439-454.
  • [47] A. Wienke. Frailty Models in Survival Analysis. Chapman & Hall/CRC, Boca Raton (2011).
  • [48] A.Y. Yakovlev & A.D. Tsodikov. Stochastic Models of Tumor Latency and Their BiostatisticalApplications. World Scientific, Singapore (1996).
  • [49] S. Zhang, Y. Zhang, K. Chaloner & J.T. Stapleton. A copula model for bivariate hybrid censored survival data with application to the MACS study. Lifetime Data Analysis, 16 (2010), 231-249.
  • *
    Corresponding author: Adriano Kamimura Suzuki
  • Publication Dates

    • Publication in this collection
      07 Mar 2014
    • Date of issue
      Dec 2013

    History

    • Received
      28 Nov 2012
    • Accepted
      01 Nov 2013
    Sociedade Brasileira de Matemática Aplicada e Computacional Rua Maestro João Seppe, nº. 900, 16º. andar - Sala 163 , 13561-120 São Carlos - SP, Tel. / Fax: (55 16) 3412-9752 - São Carlos - SP - Brazil
    E-mail: sbmac@sbmac.org.br