Acessibilidade / Reportar erro

GENERALIZED ADDITIVE MODEL FOR COUNT TIME SERIES: AN APPLICATION TO QUANTIFY THE IMPACT OF AIR POLLUTANTS ON HUMAN HEALTH

ABSTRACT

The generalized additive model (GAM) has been used in many epidemiological studies where frequently the response variable is a nonnegative integer-valued time series. However, GAM assume that the observations are independent, which is generally not the case in time series. In this paper, an autoregressive moving average (ARMA) component is incorporated to the GAM. The resulting GAM-ARMA model is based on the generalized linear autoregressive moving average (GLARMA) model where some linear components are replaced by natural splines. Numerical simulations are presented and show that the ARMA component influences the estimation. In a real data analysis of the effects of air pollution on respiratory disease in the metropolitan area of Belo Horizonte, Brazil, it is shown that the proposed model presents a better fit when compared to the classical GAM approach, that does not take into account the autocorrelation of the data.

Keywords:
GAM; ARMA model; semiparametric model; Poisson-valued time series

1 INTRODUCTION

Epidemiological data are frequently treated as time series of counts because they record the relative frequency of certain events that occur in successive time intervals and the observations are correlated.

Many epidemiological studies have been carried out to investigate the impact of ambient air pollution concentrations and meteorological conditions on human health. Kelsall et al. (199727 KELSALL J, SAMET J, ZEGER S & XU J. 1997. Air pollution and mortality in Philadelphia. Am. J. Epidemiol., 146: 750-762.), Ostro et al. (199932 OSTRO B, ESKELAND G, SANCHEZ J & FEYZIOGLU T. 1999. Air pollution and health efects: A study of medical visits among children in Santiago, Chile. Environ. Health Persp., 107: 69-73.), Goldberg et al. (200316 GOLDBERG M, BURNETT R, VALOIS M, FLEGEL K, BAILAR J & BROOKS J. 2003. Associations between ambient air pollution and daily mortality among persons with congestive heart failure. Environ. Res., 91: 8-20.) and other authors found significant association between daily pollutant concentration levels and mortality. Alonso et al. (20105 ALONSO J, ACHCAR J & HOTTA L. 2010. Climate changes and their effects in the public health: use of Poisson regression models. Pesquisa Operacional, 30: 427-442.) studied the impact of atmosphere pressure, air humidity, and temperature on the number of hospitalizations. Besides that, Roberts (200435 ROBERTS S. 2004. Interactions between particulate air pollution and temperature in air pollution mortality time series study. Environmental Research, 96: 328-337.), Stafoggia et al. (200839 STAFOGGIA M, SCHWARTZ J, FORASTIERE F, PERUCCI C & GROUP S. 2008. Does temperature modify the association between air pollution and mortality? A multicity casecrossover analysis in Italy. Am. Journal of Epidemiology, 167: 1476-1485.) and other authors found the evidence of interactive effects between temperature and air pollution (e.g., particulate matter and ozone) on mortality and adverse health outcomes. Such studies are an alert about the importance of controlling and reducing air pollutant emissions, and provide support for health departments in resource allocation.

Nevertheless, most of these studies try to model the relation between the occurrence of a disease and the air pollutants using procedures that are not able to capture the dependence inherent to the observations, such as the generalized linear model (GLM) (Nelder and Wederburn, 197231 NELDER J & WEDDERBURN R. 1972. Generalized linear models. J. Roy.Statist. Soc. Ser. A, 135: 370-384.) and GAM (Hastie and Tibishirani, 199023 HASTIE T & TIBSHIRANI R. 1990. Generalized additive models. London: Chapman and Hall.). New methodologies were then proposed to model time series of counts. Shephard (199537 SHEPHARD N. 1995. Generalized Linear Autoregressions. Technical report, Nuffield College.) introduced the GLARMA model, then generalized by Davis et al. (200312 DAVIS R, DUNSMUIR W & STREETT S. 2003. Observation driven models for Poisson counts. Biometrika, 90.). This methodology adds an ARMA structure to the GLM and is able to model time series belonging to the exponential family. In the same vain, Benjamin et al. (20038 BENJAMIN M, RIGBY R & STASINOPOULOS D. 2003. Generalized autoregressive moving average models. Journal of the American Statistical association, pp. 214-223.) proposed the generalized ARMA model. Mckenzie (198530 MCKENZIE E. 1985. Some simple models for discrete variate time series. Water Resources Bulletin, 21: 645-650.) and Al-Osh and Alzaid (1987)3 AL-OSH M & ALZAID A. 1987. First order integer valued autoregressive (INAR (1)) process. Journal of Time Series Analysis, pp. 261-275. introduced the integer-valued autoregressive model. Heinen (200325 HEINEN A. 2003. Modelling Time Series Count Data: Autoregressive Conditional Poisson Model observations. Munich Personal RePEc Archive, .) proposed the autoregressive conditional Poisson model for counting data with time dependency and over-dispersion. Gamerman et al. (201315 GAMERMAN D, SANTOS T & FRANCO G. 2013. A non-Gaussian family of state-space models with exact marginal likelihood. Journal of Time Series Analysis, 34: 625-645.) proposed a family of non-gaussian state space models that allows the marginal likelihood to be calculated in an exact way.

The above models assume that the relation between the response variable and the covariates is linear. The GAM offers more flexibility and has been used by many authors to solve real problems in the environmental context, see e.g. Schwartz (200036 SCHWARTZ J. 2000. Harvesting and long term exposure effects in the relationship between air pollution and mortality. Am. J. Epidemiol. , pp. 440-448.), Aldrin and Haff (20054 ALDRIN M & HOBÆK HAFF I. 2005. Generalised additive modelling of air pollution, traffic volume and meteorology. Atmospheric Environment, 39: 2145-2155.), and Belusic et al. (20157 BELUSIC A, HERCEG-BULIC I & KLAIC Z. 2015. Comparing estimates of the effects of air pollution on human mortality obtained using different regression methodologies. Geofizika, 32: 47-77.). Despite its widespread use, care is required when GAM is used in time series due to the serial correlation present in the data. Very few works are concerned with this issue, in particular Yang et al. (201243 YANG L, QIN G, ZHAO N, WANG C & SONG G. 2012. Using a generalized additive model with autoregressive terms to study the effects of daily temperature on mortality. Medical Research Methodology, 12.) who proposed GAM with autoregressive terms. Souza et al. (201838 SOUZA J, REISEN V, FRANCO G, ISPANY M, BONDON P & SANTOS J. 2018. Generalized additive models with principal component analysis: an application to time series of respiratory disease and air pollution data. Journal of the Royal Statistical Society. Series C: Applied Statistics, 67: 453-480.) have also proposed a hybrid model, including GAM, principal component analysis, and vector autoregression to address the multicolinearity problems that can occur when including several air pollutants in the analysis.

In this work a more general model for count data is proposed, which is able to handle both the autocorrelation structure of the time series and the nonlinearity existing in the covariates. This model is composed of a GAM with an ARMA component and is called a GAM-ARMA model. The non-parametric components are estimated through some smoothed functions, such as splines. Numerical simulations are performed to access the accuracy of parameter estimation in small sample size series following a Poisson distribution. Finally, a real-time series is analyzed without taking and taking into account the autocorrelation of the data. The example includes the fit of a GAM-ARMA model to evaluate the impact of air pollutants and meteorological variables on the number of chronic obstructive pulmonary disease cases in the metropolitan area of Belo Horizonte, Brazil.

The paper is organized as follows. Section 2 presents the GAM-ARMA model, detailing some properties and the inference procedure. Section 3 shows the simulation study. Section 4 presents the analysis of a real series of pulmonary disease counts. Section 5 concludes the work.

2 THE GAM-ARMA MODEL

2.1 Presentation of the model

We combine the GAM with the ARMA model proposed by Box and Jenkins (19769 BOX G & JENKINS G. 1976. Time series analysis. San Francisco: Holden-Day.) to model linear and nonlinear relations between the response variable and the covariates, and the time correlation of the response. The advantage of this methodology is the possibility to adjust semiparametric and non-parametric models to the data, capturing either linear and non-linear relationships, and thus obtaining better estimates.

As in the GLARMA model, the conditional distribution of the observation yt given the past information Ft-1y=σ{ys,st-1} follows a Poisson distribution, i.e.,

y t F t - 1 y ~ Poi ( μ t ) , (1)

where μt=E(ytFt-1y). Here, the predictor ηt=ln(μt) follows the model

η t = β 0 + j = 1 k β j x t , j + j = 1 l s j ( w t , j ) + Z t , (2)

where (xt,1, . . . , xt,k) denotes the covariates related linearly to ηt, (wt,1, . . . , wt,l) denotes the covariates related to ηt via smooth functions s1, . . . , sl, and Zt modelises the time correlation. Following Davis et al. (200312 DAVIS R, DUNSMUIR W & STREETT S. 2003. Observation driven models for Poisson counts. Biometrika, 90.),

Z t = i = 1 τ i ε t - i , (3)

where, for some λ(0,1],

ε t = ( y t - μ t ) μ t - λ = ( y t - e η t ) e - λ η t , (4)

and the parameters τi’s are the coefficients in the power series expansion

i = 1 τ i z i = 1 - i = 1 p ϕ i z i - 1 1 + i = 1 q θ i z i - 1 , | z | 1 , (5)

where the polynomials ϕ(z)=1-ϕ1z-ϕpzp and θ(z)=1+θ1z+θqzq have no common zeroes and have all their zeros outside the unit circle. It follows from (3) and (5) that Zt can be calculated recursively with the difference equation

Z t = ϕ 1 ( Z t - 1 + ε t - 1 ) + + ϕ p ( Z t - p + ε t - p ) + θ 1 ε t - 1 + + θ q ε t - q . (6)

According to (4), E(εtFt-1y)=μt-λ(E(ytFt-1y)-μt)=0. Now, let Ft-1ε=σ{εs,st-1}, (4) implies that Ft-1εFt-1y. Therefore,

E ( ε t F t - 1 ε ) = E [ E ( ε t F t - 1 y ) F t - 1 ε ] = 0 ,

which shows that (εt) is a martingale difference sequence. Hence, cov(εs,εt)=0 for st, and the variance of εt is

var ( ε t ) = E ( ε t 2 ) = E [ E ( ε t 2 F t - 1 y ) ] = E ( μ t - 2 λ E [ ( y t - μ t ) 2 F t - 1 y ] ) = E ( μ t 1 - 2 λ ) . (7)

Now, (2), (6) and (7) imply that

E ( η t ) = β 0 + j = 1 k β j x t , j + j = 1 l s j ( w t , j ) , var ( η t ) = i = 1 τ i 2 E ( μ t - i 1 - 2 λ ) ,

and

cov ( η t , η t + h ) = i = 1 τ i τ i + h E ( μ t - i 1 - 2 λ ) , if h 0 , i = 1 τ i τ i - h E ( μ t + h - i 1 - 2 λ ) , if h < 0 ,

When λ=0.5, (εt) are the Pearson residuals and the covariances of (ηt) do not depend on t, even if (ηt) is not strictly stationary.

2.2 Parameter estimation

There are several approaches in the literature to estimate functions sj’s. Recent studies have used reduced rank approaches due to the low computational cost and facilities to obtain good estimators of the sj’s. Wood (200642 WOOD S. 2006. Generalized Additive Models: An Introduction With R. Boca Raton, FL: Chapman and Hall/CRC Press.) presents a review of methods for choosing the sj’s using the GAM methodology and some approaches as thin plate regression splines (Wood, 200341 WOOD S. 2003. Thin plate regression splines. Royal Statistical Society, 65: 95-114.), B-splines and basis splines (De Boor, 197813 DE BOOR C. 1978. A practical guide to Splines. Berlin: Springer.; Dierckx, 199314 DIERCKX P. 1993. Curve and surface fitting with Splines. Berlin: Springer .), among others.

In this work, the B-spline curves were used given their simplicity to obtain flexible smoothing. B-splines are constructed from polynomial pieces, joined at control points called knots. By definition, the B-spline Bi,d depends on the knots titi+d+1, where d is the order of the polynomial. If the knot vector is (t1, t2, . . . , tm+d+1) for some positive integer number m, it is possible to form m B-splines B1,d, . . . , Bm,d of degree d associated with this knot vector. A spline function s j is a linear combination of B-splines, i.e.,

s j = i = 1 m α i , j B i , d , (8)

where the reals α1,j, . . . , αm,j are called the B-spline coefficients of sj. For more properties, see De Boor (1978)13 DE BOOR C. 1978. A practical guide to Splines. Berlin: Springer.. Here, we take d=3 and we use natural cubic splines. In this case, the polynomials before the first knot and after the last knot are modeled through linear functions, which means that the second derivative at the two end points are zero. General accounts about splines can be found in the books by Hastie et al. (200824 HASTIE T, TIBSHIRANI R & FRIEDMAN J. 2008. The elements of statistical learning. California: Springer.), and Ahlberg et al. (19671 AHLBERG J, NILSON E & WALSH J. 1967. The Theory of Splines and Their Application. New York: Academic Press Inc.). The choice of the optimal number of knots is based on the work of Harrell (200419 HARREL F. 2004. Bioestatistical Modeling. Nashvile, TN.) and depends on the sample size n. Typically, when n100, three or four knots usually generate good fitting and a balanced model in relation to flexibility and loss of accuracy. For large n, five knots is a good starting point. The Akaike’s information criterion (AIC) can be used to choose the number of knots, see Akaike (19732 AKAIKE H. 1973. Information Theory and an Extension of the Maximum Likelihood Principle. Proceedings of the 2nd International Symposium on Information Theory, pp. 267-281.).

Combining (2) and (8), and dropping d=3 in the notation, the model of the predictor can be written as

η t = β 0 + j = 1 k β j x t , j + j = 1 l i = 1 m α i , j B i ( w t , j ) + Z t , (9)

where Zt is given by (6). Thus, for a fixed integer m and fixed knots (t1, t2, . . . , tm+4), the parameter vector of the GAM-ARMA model is defined by

δ = ( β 0 , , β k , α 1 , 1 , , α m , l , ϕ 1 , , ϕ p , θ 1 , , θ q ) .

According to (1), the conditional log-likelihood function is

L n ( δ ) = t = 1 n ( y t η t ( δ ) - e η t ( δ ) ) ,

where ηt(δ) is given by (9) and Zt(δ) is obtained by (6). The maximization of Ln(δ) can be performed by Newton’s method initialized with zero values for all parameters. In practice, the convergence occurs approximately within 10 iterations.

Goodness-of-fit measures for the proposed methodology can be calculated with the AIC and the bayesian information criterion (BIC) defined by

BIC = - 2 ln ( L n ( δ ^ n ) ) + r ln ( n ) ,

where δ^n are the parameter values that maximize Ln(δ) and r is the number of parameters estimated by the model.

The relative risk (RR) is widely used to measure the impact of air pollution on human health, see Baxter et al. (19976 BAXTER L, FINCH S, LIPFERT F & YU Q. 1997. Comparing estimates of the effects of air pollution on human mortality obtained using different regression methodologies. Risk Anal., 17(13): 273-278.). RR for the pollutant covariate xj=(xt,j) in (9) is the relative change in the expected count of respiratory disease event per ξ-unit change in xj while keeping the other covariates fixed, and is given by

RR ^ x j ( ξ ) = exp ( β ^ j ξ ) .

RR and its confidence interval (CI) of level 1-α are estimated as follows,

RR ^ x j ( ξ ) = exp ( β ^ j ξ ) , (10)

CI ^ { RR x j ( ξ ) } = exp ( β ^ j ξ ± z α / 2 se ( β ^ i ) ξ ) , (11)

where β^j is the conditional maximum likelihood estimator β^j,n of βj, se(β^j) is the estimated standard deviation (s.d.) of β^j, and zα/2 denotes the (1-α/2)-quantile of the standard normal distribution.

3 SIMULATION STUDY

In our numerical experiment, the sample size is n=100, the number of replications is N=1000, λ=0.5 in (4), (p,q)=(1,0) in (6) and (k,l,m)=(2,1,3) in (9). The predictor model is given by

η t = β 0 + β 1 x t , 1 + β 2 x t , 2 + α 1 B 1 ( w t ) + α 2 B 2 ( w t ) + α 3 B 3 ( w t ) + Z t , (12)

where the Bi’s compose the B-spline basis for natural cubic splines and

Z t = ϕ [ Z t - 1 + ( y t - 1 - e η t - 1 ) e - η t - 1 / 2 ] . (13)

The covariates (xt,1, xt,2) are simulated (one time) with the ARMA models, xt,1=0.42xt-1,1+ut+0.13ut-1 and xt,2=0.30xt-1,2+vt-0.76vt-1-0.17vt-2 where (ut, vt) is a sequence of independent Gaussian random variables with zero-mean and unit variance. The covariate (wt) is the real time series of daily minimum temperature in Vitória, Brazil, between April 10, 2005 and July 19, 2005. The parameter values are

β 0 = 0 . 8 , β 1 = 0 . 1 , β 2 = - 0 . 2 , α 1 = 0 . 5 , α 2 = - 1 . 0 , α 3 = 0 . 8 ,

and three different values of ϕ are considered, ϕ=0.1,0.4,0.6 corresponding respectively to increasing values of the autocorrelation in the response variable.

In Table 1, μ^δ^j represents the average of the N estimates of the parameter δj and the corresponding mean squared errors (MSE) in parenthesis for ϕ=0.1,0.4,0.6. We see that the estimates are close to the true values of the parameters. In general, the values of MSE are small, but increase as φ increases.

Table 1
Parameter estimates in Model (12)-(13) with MSE in parenthesis.

Figure 1 presents the histograms of the N estimates of ϕ and the βj’s for ϕ=0.1,0.4,0.6. While the empirical distribution of the estimates of ϕ is approximately symmetric about the true value when ϕ=0.1,0.4, this distribution is asymmetric when ϕ=0.6. The empirical distribution of the estimates of β0 is asymmetric about the true value for all values of ϕ. Concerning β1 and β2, the distributions are approximately symmetric about their true values, even when ϕ=0.6.

Figure 1
Histograms of parameter estimates of ϕ and the βj’s in Model (12)-(13).

4 RESULTS

Here, we fit a GAM-ARMA model to the monthly number of chronic obstructive pulmonary disease (COPD) cases, popularly known as acute bronchitis, in the metropolitan area of Belo Horizonte, Brazil, between January 2007 and December 2013 (n=84). According to the department of information technology of the Brazilian public health system, each hour three Brazilian citizens die as a result of this disease. The objective of this analysis is to evaluate the association among the concentration of atmospheric pollutants and meteorological conditions with the occurrence of COPD in Belo Horizonte.

Studies concerning air pollution in Belo Horizonte are relatively rare, even rarer regarding the relation between pollutant series and respiratory diseases. Information about the concentration of pollutants in this region is very limited, with all the series presenting missing observations. Among the pollutants measured at the state environment and water resources institute, we select the nitrogen monoxide (NO) as the explanatory variable in this study since it presents the largest significative correlation coefficient ρ=0.3 related to COPD. Some data imputations are performed before fitting the model, in order to handle the missing observations. We use a robust procedure for imputation in time series using Kalman smoothing and state space model (Harvey, 198920 HARVEY A. 1989. Forecasting Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press.) and the package “imputeTS” from software R (Moritz, S., Package “imputeTS” Time series missing value imputation).

Figure 2 presents the time series of COPD cases, NO concentration, minimum temperature (Tmin) and relative humidity (RH) of the air. A positive trend can be detected in the number of COPD cases and NO concentration. Furthermore, all time series present a seasonal behaviour. Table 2 contains some descriptive statistics of the data, where Q1 and Q3 denote the first and third quartile, respectively.

Figure 2
Number of COPD cases, concentration of NO, minimum temperature and relative humidity of the air in the metropolitan area of Belo Horizonte, Brazil, between January 2007 and December 2013.

Table 2
Descriptive statistics of the data.

In our model, NO concentration is related linearly to ηt, while Tmin and RH have a non-linear relation with ηt. Besides these explanatory variables, a trend component and sine and cosine functions are also incorporated in the model. The trend is included to modelise the slight positive trend in the cases of COPD. The sine and cosine functions are necessary to handle the annual and semi-annual seasonality in the response variable. Therefore, the model writes

η t = β 1 x t , 1 + β 2 sin ( 2 π t / 12 ) + β 3 cos ( 2 π t / 12 ) + β 4 sin ( 2 π t / 6 ) + β 5 cos ( 2 π t / 6 ) + β 6 t + α 1 , 1 B 1 ( w t , 1 ) + α 2 , 1 B 2 ( w t , 1 ) + α 3 , 1 B 3 ( w t , 1 ) + + α 1 , 2 B 1 ( w t , 2 ) + α 2 , 2 B 2 ( w t , 2 ) + α 3 , 2 B 3 ( w t , 2 ) + Z t , (14)

where t is the month number, xt,1 is the NO concentration, (wt,1) is Tmin and (wt,2) is RH. A simple GAM model where Zt is removed in (14) is also adjusted, to show the benefict of modeling the data autocorrelation through Zt in the GAM-ARMA model. The choice of the optimal number of knots is based on the sample size. Thus, as recommended in Section 2, three and four knots are tested, and comparing the AIC, the best model is obtained with three knots.

Table 3 presents the estimates β^i's of the parameters βi’s in the fitted GAM model with the corresponding standard errors given by the software R. All estimates are significant at 5% level of significiance. On the other hand, the value of BIC is 1297.514 and the in-sample MSE between the fitted values and the observed values of COPD cases (see figure 4) is 531.642.

Table 3
Parameter estimates of a GAM model (14) (Zt=0) fitted to the COPD cases.

Figure 3
Sample ACF and PACF of the residuals in the GAM and GAM-AR(1) models.

Figure 4
Fits of GAM and GAM-AR(1) models to the number of COPD cases.

Figure 3(a) plots the sample autocorrelation function (ACF) and sample partial autocorrelation function (PACF) of the residuals in the GAM model. Some correlation is still present in these residuals, indicating the need for a more elaborated model.

Applying the GAM-ARMA methodology, the best fit is obtained with a GAM-AR(1) model. Table 4 shows the estimates β^i's and ϕ^ of the parameters βi’s and ϕ in the fitted GAM-AR(1) model with the corresponding standard errors given by the software R. Again, all estimates are significant at 5% level of significiance. The value of BIC is 1155.059 and the in-sample MSE between the fitted values and the observed values of COPD cases (see figure 4) is 356.169. Both values are smaller than the corresponding values obtained with the GAM model. Furthermore, the sample ACF and PACF plots in figure 3(b) show no difference with a white noise which reveals a good adjustment of the GAM-AR(1) model.

Table 4
Parameter estimates of a GAM-AR(1) model (14) fitted to the COPD cases.

Figure 4 shows that the GAM-AR(1) model fits better the observed number of COPD cases than the GAM model.

The RR for the NO is an important information for the regulatory agencies to quantify the impact of this pollutant on the population health. Table 5 presents the estimated RR and CI for the NO, RR^ and CI^ given by (10) and (11) where α=5%, respectively, obtained with the GAM and GAM-AR(1) models. In both cases, RR^ is significant which means that NO contributes significantly to the increase in the number of COPD cases; RR^ is slightly smaller for the GAM-AR(1) model. Although RR^ are comparable in the two models, the adjustment with the GAM-AR(1) model is the best in view of the measures of BIC and MSE, and the correlation of the residuals.

Table 5
Estimated RR and 95% CI for the NO in the GAM and GAM-AR(1) models.

5 CONCLUSIONS

In this work, a new methodology called GAM-ARMA was proposed, based on the GLARMA model introduced by Davis et al. (200312 DAVIS R, DUNSMUIR W & STREETT S. 2003. Observation driven models for Poisson counts. Biometrika, 90.). The GAM-ARMA model allows the fitting of semiparametric models, accommodating covariates with linear and non-linear relation with the response variable in count data with time correlation.

A numerical simulation study showed that the estimates of the parameters are close to the true values for a moderate sample size of n=100, and that the preciseness of the estimation degrades as the correlation in the data increases.

The model was applied to the monthly number of COPD cases in Belo Horizonte, Brazil, to quantify the impact of NO concentrations and meteorological variables on the occurrence of this disease. The best fit was obtained with a GAM-AR(1) model. This model presented white noise residuals and smaller measures of BIC and MSE compared to the GAM. The RR analysis revealed that NO contributed significantly to the increase of COPD cases.

Acknowledgements

The authors thank the Brazilian Federal Agency for the Support and Evaluation of Graduate Education (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-CAPES), National Council for Scientific and Technological Development (Conselho Nacional de Desenvolvimento Científico e Tecnológico CNPq), Minas Gerais State Research Foundation (Fundação de Amparo à Pesquisa do estado de Minas Gerais - FAPEMIG), and Espírito Santo State Research Foundation (Fundação de Amparo à Pesquisa do Espírito Santo - FAPES). This research was partially supported by CentraleSupélec and by the iCODE Institute, research project of the IDEX Paris-Saclay, and by the Hadamard Mathematics LabEx (LMH) through the grant number ANR11-LABX-0056-LMH in the Programme des Investissements d’Avenir. The authors are very grateful to the anonymous referee and the editor for their comments, which improved this paper.

References

  • 1
    AHLBERG J, NILSON E & WALSH J. 1967. The Theory of Splines and Their Application. New York: Academic Press Inc.
  • 2
    AKAIKE H. 1973. Information Theory and an Extension of the Maximum Likelihood Principle. Proceedings of the 2nd International Symposium on Information Theory, pp. 267-281.
  • 3
    AL-OSH M & ALZAID A. 1987. First order integer valued autoregressive (INAR (1)) process. Journal of Time Series Analysis, pp. 261-275.
  • 4
    ALDRIN M & HOBÆK HAFF I. 2005. Generalised additive modelling of air pollution, traffic volume and meteorology. Atmospheric Environment, 39: 2145-2155.
  • 5
    ALONSO J, ACHCAR J & HOTTA L. 2010. Climate changes and their effects in the public health: use of Poisson regression models. Pesquisa Operacional, 30: 427-442.
  • 6
    BAXTER L, FINCH S, LIPFERT F & YU Q. 1997. Comparing estimates of the effects of air pollution on human mortality obtained using different regression methodologies. Risk Anal., 17(13): 273-278.
  • 7
    BELUSIC A, HERCEG-BULIC I & KLAIC Z. 2015. Comparing estimates of the effects of air pollution on human mortality obtained using different regression methodologies. Geofizika, 32: 47-77.
  • 8
    BENJAMIN M, RIGBY R & STASINOPOULOS D. 2003. Generalized autoregressive moving average models. Journal of the American Statistical association, pp. 214-223.
  • 9
    BOX G & JENKINS G. 1976. Time series analysis. San Francisco: Holden-Day.
  • 10
    CHOCK D & CHEN C. 2000. A study of the association between daily mortality and ambient air polluatant concentrations in Pittsburg, Pennsylvania. J, Air Waste Manage, Assoc,, 50: 1481-1500.
  • 11
    CIFUENTES L, KOPFER K & LAVE L. 2000. Effect of the fine fraction of particulate matter versus the coarse mass and other pollutants on daily mortality in Santiago, Chile. J, Air Waste Manage, Assoc, , 50: 1287-1298.
  • 12
    DAVIS R, DUNSMUIR W & STREETT S. 2003. Observation driven models for Poisson counts. Biometrika, 90.
  • 13
    DE BOOR C. 1978. A practical guide to Splines. Berlin: Springer.
  • 14
    DIERCKX P. 1993. Curve and surface fitting with Splines. Berlin: Springer .
  • 15
    GAMERMAN D, SANTOS T & FRANCO G. 2013. A non-Gaussian family of state-space models with exact marginal likelihood. Journal of Time Series Analysis, 34: 625-645.
  • 16
    GOLDBERG M, BURNETT R, VALOIS M, FLEGEL K, BAILAR J & BROOKS J. 2003. Associations between ambient air pollution and daily mortality among persons with congestive heart failure. Environ. Res., 91: 8-20.
  • 17
    GREENAWAY-MCGREVY R & SUL D. 2012. Estimating the number of common factors in serially dependent approximate factor models. Econ. Lett., 116: 531-534.
  • 18
    HAMILTON J. 1994. Time Series Analysis. Princeton, NJ: Princeton University Press.
  • 19
    HARREL F. 2004. Bioestatistical Modeling. Nashvile, TN.
  • 20
    HARVEY A. 1989. Forecasting Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press.
  • 21
    HARVEY A. 1993. Time Series Models. 2nd ed.. Cambridge, MA: MIT Press.
  • 22
    HARVEY A & FERNANDES C. 1989. Time series models for count or qualitative observations. Journal of Business Economic Statistics, 7: 407-417.
  • 23
    HASTIE T & TIBSHIRANI R. 1990. Generalized additive models. London: Chapman and Hall.
  • 24
    HASTIE T, TIBSHIRANI R & FRIEDMAN J. 2008. The elements of statistical learning. California: Springer.
  • 25
    HEINEN A. 2003. Modelling Time Series Count Data: Autoregressive Conditional Poisson Model observations. Munich Personal RePEc Archive, .
  • 26
    HU Y & TSAY R. 2014. Principal volatility component analysis. J. Bus. Econ. Stat., 32: 153-164.
  • 27
    KELSALL J, SAMET J, ZEGER S & XU J. 1997. Air pollution and mortality in Philadelphia. Am. J. Epidemiol., 146: 750-762.
  • 28
    LI G, SUN J, JAYASINGHE R & PAN X. 2012. Temperature modifies the effects of particulate matter on non-accidental mortality: A comparative study of Beijing, China and Brisbane, Australia. Public health Research, 2: 21-27.
  • 29
    MATTESON D & TSAY R. 2011. Dynamic orthogonal components for multivariate time series. J. Am. Stat. Assoc., 106: 1450-1463.
  • 30
    MCKENZIE E. 1985. Some simple models for discrete variate time series. Water Resources Bulletin, 21: 645-650.
  • 31
    NELDER J & WEDDERBURN R. 1972. Generalized linear models. J. Roy.Statist. Soc. Ser. A, 135: 370-384.
  • 32
    OSTRO B, ESKELAND G, SANCHEZ J & FEYZIOGLU T. 1999. Air pollution and health efects: A study of medical visits among children in Santiago, Chile. Environ. Health Persp., 107: 69-73.
  • 33
    PEARSON K. 1901. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2: 559-572.
  • 34
    REN C. 2007. Evaluation of interactive effects between temperature and air pollution on health outcomes. Ph.D. thesis. School of Public Health, Queensland University of Technology.
  • 35
    ROBERTS S. 2004. Interactions between particulate air pollution and temperature in air pollution mortality time series study. Environmental Research, 96: 328-337.
  • 36
    SCHWARTZ J. 2000. Harvesting and long term exposure effects in the relationship between air pollution and mortality. Am. J. Epidemiol. , pp. 440-448.
  • 37
    SHEPHARD N. 1995. Generalized Linear Autoregressions. Technical report, Nuffield College.
  • 38
    SOUZA J, REISEN V, FRANCO G, ISPANY M, BONDON P & SANTOS J. 2018. Generalized additive models with principal component analysis: an application to time series of respiratory disease and air pollution data. Journal of the Royal Statistical Society. Series C: Applied Statistics, 67: 453-480.
  • 39
    STAFOGGIA M, SCHWARTZ J, FORASTIERE F, PERUCCI C & GROUP S. 2008. Does temperature modify the association between air pollution and mortality? A multicity casecrossover analysis in Italy. Am. Journal of Epidemiology, 167: 1476-1485.
  • 40
    WANG Y & PHAM H. 2011. Analyzing the effects of air pollution and mortality by generalized additive models with robust principal components. Int. J. Syst. Assur. Eng. Manag., 2: 253-259.
  • 41
    WOOD S. 2003. Thin plate regression splines. Royal Statistical Society, 65: 95-114.
  • 42
    WOOD S. 2006. Generalized Additive Models: An Introduction With R. Boca Raton, FL: Chapman and Hall/CRC Press.
  • 43
    YANG L, QIN G, ZHAO N, WANG C & SONG G. 2012. Using a generalized additive model with autoregressive terms to study the effects of daily temperature on mortality. Medical Research Methodology, 12.
  • 44
    ZAMPROGNO B. 2013. PCA in time series with short and long-memory time series. Ph.D. thesis. Programa de Pós-Graduação em Engenharia Ambiental do Centro Tecnológico, UFES, Vitória, Brazil.

Publication Dates

  • Publication in this collection
    11 Oct 2021
  • Date of issue
    2021

History

  • Received
    18 July 2020
  • Accepted
    01 May 2021
Sociedade Brasileira de Pesquisa Operacional Rua Mayrink Veiga, 32 - sala 601 - Centro, 20090-050 Rio de Janeiro RJ - Brasil, Tel.: +55 21 2263-0499, Fax: +55 21 2263-0501 - Rio de Janeiro - RJ - Brazil
E-mail: sobrapo@sobrapo.org.br