\documentclass[11pt]{article} \usepackage{amsmath,amssymb,geometry,textcomp} \usepackage{graphicx} \geometry{margin=1in} \usepackage[sort&compress]{natbib} \bibpunct{(}{)}{,}{n}{}{;} % commas between entries \renewcommand{\bibnumfmt}[1]{#1.} %refs without []'s around numbers %%\renewcommand{\familydefault}{cmss} \renewcommand{\baselinestretch}{1.8} \newcommand{\bsym}{\boldsymbol} \newcommand{\mb}{\mathbf} \newcommand{\PMTen}{PM$_{10}$} \newcommand{\mgmcube}{$\mu$g$/$m$^3$} \date{} \title{Seasonal Analyses of Air Pollution and Mortality in 100 U.S. Cities} \author{Roger D. Peng\and Francesca Dominici\and Roberto Pastor-Barriuso\and Scott L. Zeger\and Jonathan M. Samet} \begin{document} \maketitle \noindent Word count: 3843 \clearpage \begin{abstract} Time series models relating short-term changes in air pollution levels to daily mortality counts typically assume that the effects of air pollution on the log relative rate of mortality do not vary with time. However, these short-term effects might plausibly vary by season. Changes in the sources of air pollution and meteorology can result in changes in characteristics of the air pollution mixture across seasons. The authors develop Bayesian semi-parametric hierarchical models for estimating time-varying effects of pollution on mortality in multi-site time series studies. The methods are applied to the National Morbidity and Mortality Air Pollution Study database for the period 1987--2000, which includes data for 100 U.S. cities. At the national level, a 10~\mgmcube\ increase in \PMTen\ at lag 1 is associated with a $0.15$ (95\% posterior interval: $-0.08$, $0.39$), $0.14$ ($-0.14$, $0.42$), $0.36$ ($0.11$, $0.61$), and $0.14$ ($-0.06$, $0.34$) percent increase in mortality for winter, spring, summer, and fall, respectively. An analysis by geographical regions finds a strong seasonal pattern in the northeast (with a peak in summer) and little seasonal variation in the southern regions of the country. These results provide useful information for understanding particle toxicity and guiding future analyses of particle constituent data.\\ \noindent Word count: 200\\ \noindent MeSH headings: air pollution, epidemiology, models/statistical, mortality \end{abstract} \clearpage \section*{Acknowledgments} This research was supported in part by NIH/NHLBI grant T32HL07024, NIEHS grant R01ES012054, the NIEHS Center for Urban Environmental Health (P30ES03819). Research described in this article was partially supported by a contract and grant from the Health Effects Institute (HEI), an organization jointly funded by the U.S. Environmental Protection Agency (EPA) and automotive manufacturers. The contents of this article do not necessarily reflect the views and policies of HEI, nor do they necessarily reflect the views and policies of EPA, nor motor vehicle or engine manufacturers. Roberto Pastor-Barriuso was supported in part by grant EPY 1261/02 from the Instituto de Salud Carlos III. \clearpage Numerous time series studies have indicated a positive association between short-term variation in ambient levels of particulate matter (PM) and daily mortality counts~\citep[see e.g.][]{pope:dock:schw:1995,dock:pope:1996,bell:same:domi:2004}. The models used in these studies have typically assumed that the association between PM and daily mortality is constant over the study interval. However, the short-term effects of PM on mortality might plausibly exhibit seasonal variation. Studies in a number of locations have shown that that the characteristics of the PM mixture change throughout the year and that the relative and absolute contributions of particular components to PM mass may be different during different times of the year~\citep[][ch.~3 and references therein]{epa:2003}. Patterns of human activity also change from season to season, so that a particular air pollution concentration in one season may lead to a different exposure in another season. Other potential time-varying confounding and modifying factors, such as temperature and influenza epidemics, can also affect estimates of short-term effects of air pollution on mortality differently in different seasons. All of the issues described above indicate a need to extend current models for time series data on air pollution and health to incorporate time-varying pollution effects. The composition of particulate matter is known to vary in the spatial domain as well, suggesting that seasonal patterns should be examined by geographical region~\citep{spen:wils:1996}. For example, in the North West wood burning is a greater source of PM in the colder seasons than in the warmer months. The PM mixture in the Eastern U.S. contains a large fraction of sulfates (almost 40\% of total mass) originating from power plants in the Midwest, while PM in areas of the Western U.S. such as Southern California and the Pacific North West contains more nitrates and organic compounds (approximately 30\% of total mass)~\citep{spen:wils:1996,epa:1996,epa:2003}. In this paper we develop statistical methods for estimating seasonal patterns in the short-term effects of air pollution on mortality in multi-site time series studies. We propose Bayesian semi-parametric hierarchical models for estimating time-varying health effects within each city and for comparing temporal patterns across cities and geographical regions. Using data from the National Morbidity, Mortality, and Air Pollution Study~(NMMAPS)~\citep{same:zege:domi:curri:cour:dock:schw:zano:2000}, we estimate seasonal patterns in the short-term effects of PM less than 10~$\mu$m in aerodynamic diameter (\PMTen) on daily non-accidental mortality. The data have been extended from the original study to include 100 U.S. cities for the period 1987--2000, an addition of 10 cities and 6 years of data. The seasonal patterns are estimated for seven geographical regions and on average for the entire U.S. We explore the sensitivity of estimated seasonal patterns to temperature adjustment, copollutants, exposure lag, and adjustments for long-term mortality trends. \section*{MATERIALS AND METHODS} \label{sec:datamethods} \label{sec:models} The NMMAPS database contains daily time series of mortality, weather, and air pollution assembled from publicly available sources for the largest 100 cities in the United States. A full description of the construction of the database can be found in~\cite{same:zege:domi:curri:cour:dock:schw:zano:2000}. The most recent data are available at \texttt{http://www.ihapss.jhsph.edu}. Within each city, we specify a semi-parametric regression model for the time-varying log relative rate using a generalized additive model framework~\citep{hast:tibs:1990}. More specifically, let $Y^c_t$ be the total number of non-accidental deaths on day $t$ in city $c$. The $Y^c_t$ are Poisson distributed with expectation $\mu^c_t$ and with possible overdispersion $\phi^c$. The general form of the city-specific model is \begin{eqnarray} Y^c_t & \sim & \text{Poisson}(\mu^c_t) \nonumber\\ \text{Var}(Y^c_t) & = & \phi^c\mu^c_t \nonumber\\ \log(\mu_t^c) & = & \beta^c(t)\,x^c_{t-\ell} + \text{confounders} \label{model:general} \end{eqnarray} where $x^c_{t-\ell}$ is the lag $\ell$ \PMTen\ level for day $t$. The function $\beta^c(t)$ in equation~\ref{model:general} represents the time-varying effect of \PMTen\ on mortality and is a yearly periodic function for estimating seasonal patterns. To estimate smooth seasonal patterns in the city-specific log relative rates, we use a sine/cosine model for $\beta^c(t)$ of the form \begin{equation} \beta^c(t) = \beta^c_0 + \beta^c_{1}\sin(2\pi t/365) / c_1 + \beta^c_{2} \cos(2\pi t/365) / c_2 \label{model:periodic} \end{equation} where $\beta^c_0$, $\beta^c_{1}$, $\beta^c_{2}$ are estimated and $c_1$ and $c_2$ are known orthogonalizing constants. In this model, the effect of \PMTen\ is allowed to vary smoothly over the course of a year, but is constrained to be periodic across years~\citep[e.g.][]{stol:stra:ziel:1999}. While it is possible to include higher frequency basis terms for the representation of $\beta^c(t)$ in equation~\ref{model:periodic}, there is little reason to expect there to be much high frequency variation in the seasonal effects of \PMTen. To allow for season-specific \PMTen\ log relative rates, we use a pollutant-season interaction model with indicator functions for each season: \begin{eqnarray} \beta^c(t) = \beta^c_W\, I_{\text{winter}} + \beta^c_{Sp}\, I_{\text{spring}} + \beta^c_{Sm}\, I_{\text{summer}} + \beta^c_{F}\, I_{\text{fall}}, \label{model:seasonal} \end{eqnarray} where winter, spring, summer, and fall are defined as beginning on December 21st, March 21st, June 21st, and September 21st, respectively. %%As in model~\ref{model:main}, the pollutant-season %%interaction model assumes a log-linear effect of \PMTen\ on mortality, %%but it provides separate estimates for each season. Although these seasonal estimates serve as concise summaries, it is unlikely that the effect of \PMTen\ on mortality is discontinuous across seasons. Furthermore, the estimates depend on the specification of the season boundaries which are artificial and can differ considerably across geographic regions. Our main effect model, which does not contain any adjustment for season takes $\beta^c(t)$ to be constant across time, i.e. \begin{equation} \beta^c(t) = \beta^c. \label{model:main} \end{equation} This model assumes a homogeneous log-linear effect of \PMTen\ on mortality, a condition that was found appropriate in previous NMMAPS analyses~\citep{dani:domi:same:zege:2000,domi:mcde:zege:same:2002,domi:dani:zege:same:2002,domi:mcder:zege:same:2003}. Note that the main effect model is nested within the interaction and sine/cosine models, so that if $\beta_{W}=\beta_{Sp}=\beta_{Sm}=\beta_{F}$ in equation~\ref{model:seasonal} and $\beta^c_1 = \beta^c_2 = 0$ in equation~\ref{model:periodic}, both models reduce to equation~\ref{model:main}. The potential confounders included in the equation~\ref{model:general} are similar to those used in previous NMMAPS analyses~\citep[e.g.][]{domi:dani:zege:same:2002,domi:mcder:zege:same:2003} and consist of indicators for the day of the week, age specific intercepts corresponding to the categories of less than 65 years of age, 65--74 years, and 75 years or older, a smooth function of calendar time, and smooth functions of temperature and dew point temperature. In addition to the overall smooth function of time, two separate smooth functions of time are included for the older two age groups. All of the smooth functions are represented by natural cubic splines. The complexity of each of the smooth functions of time and temperature is controlled by the numbers of degrees of freedom assigned to each function. We use 7 degrees of freedom per year for the overall smooth function of time, which removes any fluctuations in mortality at time scales longer than two months. The separate smooth functions of time for the older two age categories each receive 1 degree of freedom per year to capture gradual trends specific to these age groups. For temperature we use 6 degrees of freedom and for dew point we use 3 degrees of freedom. A somewhat larger number of degrees of freedom is necessary for temperature in order to capture the well known ``J-shaped'' nonlinear relationship between temperature and mortality. Others have adjusted for temperature simply by doing separate analyses of the data by season~\citep{mool:lueb:hall:ande:1995,same:zege:kels:xu:kalk:1998,domi:same:zege:2000,same:domi:curr:cour:zege:2000,nmmaps1,same:zege:domi:curri:cour:dock:schw:zano:2000}. All of the above models were fit using quasi-likelihood methods as implemented in the R statistical software package~\citep{R:2004}. The data are available via the NMMAPSdata package~\citep{peng:welt:2004} and code for fitting the models is available on the web at~\texttt{http://www.ihapss.jhsph.edu/data/NMMAPS/R/}. \subsection*{Pooling information across cities} \label{sec:hnormal} After fitting each of the city-specific models we use a hierarchical normal model for pooling information and borrowing strength across cities~\citep[see][]{nmmaps1,domi:same:zege:2000,domi:dani:zege:same:2002}. For a particular model, we have a city-specific maximum likelihood estimate $\widehat{\boldsymbol{\beta}}^c$ which is a scalar for the main effect model in equation~\ref{model:main}, a vector of length four for the pollutant-season interaction model in equation~\ref{model:seasonal}, and a vector of length three for the sine/cosine model in equation~\ref{model:periodic}. $\widehat{\boldsymbol{\beta}}^c$ is assumed to be normally distributed around the true city-specific log relative rates $\boldsymbol{\beta}^c$ with covariance matrix $V^c$, estimated within each city. In addition, the true rates are assumed to vary independently across cities according to a normal distribution, i.e. \begin{eqnarray} \widehat{\boldsymbol{\beta}}^c\mid\boldsymbol{\beta}^c & \sim & \mathcal{N}(\boldsymbol{\beta}^c, V^c)\nonumber\\ \boldsymbol{\beta}^c\mid\boldsymbol{\alpha},\Sigma & \sim & \mathcal{N}(Z^c\,\boldsymbol{\alpha}, \Sigma) \label{model:hnorm} \end{eqnarray} where $\Sigma$ is the covariance matrix describing the between-city variation of $\boldsymbol{\beta}^c$ and $\boldsymbol{\alpha}$ is the overall mean for the cities. $Z^c$ is a matrix of second stage covariates for describing possible differences between cities. To characterize regional differences in seasonal patterns we include as a second stage covariate an indicator for the following seven regions (also used in~\cite{same:zege:domi:curri:cour:dock:schw:zano:2000}): Industrial Midwest (19 cities), North East (17), North West (13), Southern California (7), South East (26), South West (10), and Upper Midwest (8). The final national average estimate $\boldsymbol{\alpha}$ represents the combined information from all of the cities. The diagonal elements of $\Sigma$ measure the heterogeneity across cities and the off-diagonal elements represent the correlation of the estimates between cities. The hierarchical model is fit using the two level normal independent sampling estimation (TLNise) software of~\cite{ever:morr:2000} with uniform priors on $\boldsymbol{\alpha}$ and $\Sigma$. This software provides a sample from the posterior distribution of $\Sigma$ from which one can calculate posterior means and variances of the overall and city-specific pollution effects. \subsection*{Evidence for seasonality in the log relative rates} To quantify the amount of evidence supporting the presence of a seasonal pattern in the national and regional averages we examine the posterior distributions of the pooled log relative rate estimates. In particular, for the sine/cosine model in equation~\ref{model:periodic}, we can check the posterior probability that the coefficients for the harmonic terms are non-zero. While the values of the pooled coefficients $\beta_1$ and $\beta_2$ do not have meaningful interpretations, if either one of these coefficients is non-zero with high probability, then there is strong evidence of a seasonal trend. \section*{RESULTS} \label{sec:results} The daily mortality counts for the years 1987--2000 include approximately 10 million deaths. By city, the daily average ranged from 2 deaths per day in Arlington, VA to 190 per day in New York, NY. The daily mean of \PMTen\ ranged from 13~\mgmcube\ in Coventry, RI to 49~\mgmcube\ in Fresno, CA. Summary statistics for the dataset are shown in Table~\ref{datasum}. \begin{table}[ht] \begin{center} \begin{tabular}{lrrrrr} \hline & Min. & 25\% & Median & 75\% & Max. \\ \hline Avg. Daily Mortality & 2.2 & 7.6 & 12.2 & 20.4 & 190.2 \\ Avg. Daily \PMTen\ (\mgmcube) & 13.2 & 24.7 & 27.1 & 32.0 & 48.7 \\ Avg. Daily Temp. (\textdegree~F) & 37.0 & 51.8 & 58.1 & 64.7 & 77.8 \\ \hline \end{tabular} \end{center} \caption{Summary statistics for average daily mortality, \PMTen, and temperature for 100 U.S. cities, 1987--2000.} \label{datasum} \end{table} Mortality and \PMTen\ levels are known to vary considerably across seasons. Generally, mortality tends to be higher in the winter and fall and lower in the summer and spring. Figure~\ref{mort-top10} shows boxplots of the square root daily mortality counts for the largest 10 cities in the United States. \begin{figure}[tbh] \centering <>= library(NMMAPSdata) load(file.path("data", "CityDataCombined.rda")) data(cities) data(regions) SeasonNames <- c("Winter", "Spring", "Summer", "Fall") cc <- dget(file.path("data", "cityList.R")) cc <- cc[!(cc %in% c("hono", "anch"))] useNames <- paste(rep(cc, 4), rep(SeasonNames, each = length(cc)), sep = ".") Season <- citydata$Season[1:5114] regionID <- with(cities, region[match(cc, city)]) regionNames <- c("Industrial Midwest", "Northeast", "Northwest", "Southern California", "Southeast", "Southwest", "Upper Midwest") ## Top 10 cities cc <- with(cities, city[order(pop, decreasing = TRUE)])[1:10] dCity <- with(citydata, split(death, city)) dCity <- dCity[names(dCity) %in% cc] dSeas <- lapply(dCity, split, f = Season) ord <- order(with(cities, pop[match(names(dSeas), city)]), decreasing = TRUE) dSeas <- dSeas[ord] ## Modify city names cityNames <- with(cities, cityname[match(cc, city)]) cityNames[4] <- "Dallas-\nFort Worth" cityNames[7] <- "Anaheim" ## x11(width = 10, height = 4) par(mar = c(4, 4, 2, 0) + .1, las = 2) plot(0, 0, ylim = c(4, 17), xlim = c(1, 50), xlab = "Season", axes = FALSE, ylab = "Square root daily mortality counts", type = "n") for(i in seq(along = dSeas)) { y <- unname(unlist(dSeas[[i]])) n <- unname(unlist(lapply(dSeas[[i]], length))) f <- factor(unlist(mapply(rep, seq(length(n)), n))) boxplot(sqrt(y) ~ f, outline = FALSE, add = TRUE, at = seq(length(n)) + ((i-1) * 5), pars = list(boxwex = .5), lty = 1, axes = FALSE) } axis(1, at = 1:50, labels = rep(c("Winter", "Spring", "Summer", "Fall", NA), length(dSeas)), tick = FALSE, cex.axis = 0.7) axis(2) par(las = 1) text(seq(2.5, 50, 5), c(8, 10, 8, 11, 10, 10, 10, 10, 10, 10), cityNames) @ \caption{Boxplots of square root daily mortality by season for the 10 largest U.S. cities, 1987--2000.} \label{mort-top10} \end{figure} Each city shows a clear decrease in mortality towards summer and a peak in the winter. Figure~\ref{season-region-pm10} shows the mean daily levels of \PMTen\ by season for all cities in each of the seven regions of the U.S. The Southern California, North West, and South West regions have their highest mean levels of \PMTen\ in the fall while other regions have their highest levels in the summer. \begin{figure}[tbh] \centering <>= rm(list = ls(all = TRUE)) library(NMMAPSdata) load(file.path("data", "CityDataCombined.rda")) data(cities) data(regions) cc <- dget(file.path("data", "cityList.R")) cc <- cc[!(cc %in% c("hono", "anch"))] SeasonNames <- c("Winter", "Spring", "Summer", "Fall") useNames <- paste(rep(cc, 4), rep(SeasonNames, each = length(cc)), sep = ".") Season <- citydata$Season[1:5114] regionID <- with(cities, region[match(cc, city)]) regionNames <- c("Industrial Midwest", "Northeast", "Northwest", "Southern California", "Southeast", "Southwest", "Upper Midwest") pm10 <- pmax(citydata$pm10tmean + citydata$pm10mtrend, 0) pmCity <- split(pm10, citydata$city) pmCity <- pmCity[names(pmCity) %in% cc] pmSeas <- lapply(pmCity, split, f = Season) pmSeasRegion <- split(pmSeas, regionID) seasRegion <- lapply(pmSeasRegion, function(reg) { l <- lapply(SeasonNames, function(seas) lapply(reg, "[[", seas)) names(l) <- SeasonNames lapply(l, function(x) unlist(x, use.names = FALSE)) }) bstats <- lapply(seasRegion, function(x) sapply(x, function(y) boxplot.stats(y)$stats)) abstats <- array(unlist(bstats), c(5, 4, length(bstats))) med <- as.vector(rbind(abstats[3, ,], NA)) x <- 1:length(med) ## x11(width = 10, height = 4) par(mar = c(4, 5, 2, 0) + .1) par(las = 2) plot(0, 0, xlim = c(1, 35), ylim = c(0, 125), xaxt = "n", xlab = "Season", ylab = expression(paste(PM[10]," level ",bgroup("(",paste(mu,g/m^3),")"))), frame.plot = FALSE, type = "n") for(i in seq(along = seasRegion)) { y <- unlist(seasRegion[[i]]) n <- unlist(lapply(seasRegion[[i]], length)) f <- factor(unlist(mapply(rep, seq(length(n)), n))) boxplot(y ~ f, outline = FALSE, add = TRUE, at = seq(length(n)) + ((i-1) * 5), pars = list(boxwex = .5), lty = 1, axes = FALSE) } axis(1, at = x, labels = rep(c("Winter", "Spring", "Summer", "Fall", NA), 7), tick = FALSE, cex.axis = .7) text(seq(2.7, 35, 5), c(100, 100, 100, 120, 100, 100, 100), regionNames) @ \caption{Boxplots of regionally averaged daily levels of particulate matter less than 10~$\mu$m in aerodynamic diameter (\PMTen) by season for 100 U.S. cities, 1987--2000.} \label{season-region-pm10} \end{figure} The national average estimates of the overall and seasonal short-term effects of \PMTen\ on mortality for lags 0, 1, and 2 are summarized in Table~\ref{tab:natlavg}. These estimates were obtained by pooling the city-specific maximum likelihood estimates from the main effect and pollutant-season interaction models according to the hierarchical normal model. Across all seasons, we found that the national average estimate of the effect of \PMTen\ on mortality is largest at lag 1 and equal to an estimated 0.19 (95\% posterior interval of 0.10, 0.28) percent increase in mortality per 10~\mgmcube\ increase in \PMTen. Previous NMMAPS analyses using data from the eight-year period 1987--1994 have reported similar slightly higher national average estimates for \PMTen\ log relative rates~\cite{domi:mcde:zege:same:2002,domi:mcder:zege:same:2003}. For example, the national average estimate reported in~\cite{domi:mcder:zege:same:2003} was a 0.22 (0.03, 0.42) percent increase in mortality for a 10~\mgmcube\ increase in \PMTen. <>= source(file.path("data", "utils.R")) library(tlnise) library(xtable) Seasons <- c("Winter", "Spring", "Summer", "Fall", "All Seasons") Lags <- paste("Lag", 0:2); exclude <- c("hono", "anch") ## Load non-seasonal estimates load(file.path("results", "city-specific-est.pm10.rda")) results <- lapply(results, function(x) x[setdiff(names(x), exclude)]) ## Pool estimates betacovTotal <- lapply(results, extractBetaCov, pollutant = "pm10") pooledTotal <- lapply(betacovTotal, poolCoef) ## Load seasonal estimates load(file.path("results", "seasonal.factor2.lag.012.pm10.rda")) results <- lapply(results, function(x) x[setdiff(names(x), exclude)]) ## Pool estimates by season pooledSeas <- lapply(results, coefSeasonal, pollutant = "pm10", method = "factor2") pooled <- lapply(seq(along = pooledSeas), function(i) { m <- rbind(pooledSeas[[i]], pooledTotal[[i]]) rownames(m) <- Seasons m }) ## Make table names(pooled) <- Lags pm <- do.call("rbind", pooled) * 1000 ppp <- cbind(pm[,1], pm[,1] - 2*pm[,2], pm[,1] + 2*pm[,2]) lf <- gsub("\\$", "", sub("_", " ", LouisFormat(ppp, "confint"))) estmat <- matrix(lf, byrow = TRUE, ncol = 5, dimnames = list(Lags, Seasons)) print(xtable(estmat, caption = "National average estimates and 95\\% posterior intervals of the overall and season-specific effects of \PMTen\ at lags 0, 1, and 2 for 100 cities, 1987--2000. Estimates were obtained by pooling city-specific coefficients from the main effect and pollutant-season interaction models, respectively, and represent the percent increase in daily mortality for a 10~\mgmcube\ increase in \PMTen.", label = "tab:natlavg", align = c("l", rep("l", 5)))) @ For \PMTen\ at lag 1, the estimates for winter, spring, and fall are similar and equal to 0.15 ($-0.08, 0.39$), 0.14 ($-0.14, 0.42$), and 0.14 ($-0.06, 0.34$), respectively. The estimate for summer is more than twice as large at 0.36 ($0.11, 0.61$). \PMTen\ at lag 0 appears to have a larger effect in the spring and much smaller effects in the other seasons. In addition, estimates for lag 0 have a much larger between season difference (e.g. spring and winter) than those of lag 1. The estimates for lag 2 are generally smaller than those of lag 0 or 1 and, given the size of the posterior intervals, do not vary much across seasons. Regional differences in the seasonal patterns of the \PMTen\ relative rates were explored by including a region indicator variable in the second stage of the hierarchical model. For \PMTen\ at lag 1, Figure~\ref{periodic-seasonal} shows the results of estimating separate seasonal trends from the sine/cosine model for the seven regions of the U.S. The Industrial Midwest and the North East have seasonal trends characterized as being lower in the winter and higher in the summer. In Southern California there is a larger effect (0.5 percent increase in mortality per 10~\mgmcube\ increase in \PMTen) that is constant all year. The effect of \PMTen\ is close to zero all year round in the North West, South East, South West, and the Upper Midwest, but the North West experiences a slight increase during the summer months. With the exception of Southern California, all regions have a smaller effect in the winter months. Seasonal analyses for mortality due cardiovascular and respiratory diseases (not shown) provided results that are qualitatively similar to those for total non-accidental mortality, with larger summer effects in the Industrial Midwest and the North East regions, as well as overall for the entire U.S. \begin{figure}[tbh] \centering <>= rm(list = ls(all = TRUE)) library(NMMAPSdata) library(tlnise) library(tsModelSpec) source(file.path("data", "model-fitting.R")) source(file.path("data", "utils.R")) exclude <- c("anch", "hono") ## Load the results load(file.path("results", "seasonal.periodic.lag.012.dfSeas.1.rda")) r <- results[[2]][setdiff(names(results[[2]]), exclude)] ## Lag 1 ## Get region indicators for each city data(cities) cityRegion <- with(cities, region[match(names(r), city)]) regionNames <- sort(unique(cityRegion)) regionInd <- 1 * outer(cityRegion, regionNames, "==") colnames(regionInd) <- regionNames ndays <- 365 x <- 1:ndays B <- harmonic(x, 1, ndays, intercept = TRUE) ## Do pooling bc <- extractBetaCov(r, pollutant = "pm10") initTLNise() g <- tlnise(bc$beta, bc$cov, regionInd, prnt = FALSE, labelY = names(r), intercept = FALSE, seed = 54321) pooledRegion <- matrix(g$gamma[,1], byrow = TRUE, ncol = ncol(regionInd), nrow = nrow(g$theta)) pooledRegionSD <- matrix(g$gamma[,2], byrow = TRUE, ncol = ncol(regionInd), nrow = nrow(g$theta)) b <- B %*% pooledRegion zero <- matrix(0, nrow = ndays, ncol = 7 * 3) curveRegion <- lapply(1:ncol(pooledRegion), function(i) { B <- zero ## This is weird B[, seq(i, 7 * 3, 7)] <- harmonic(x, 1, ndays, intercept = TRUE) s <- sqrt(diag(B %*% g$Dgamma %*% t(B))) cbind(b[,i], b[,i] - 2*s, b[,i] + 2*s) }) ## Overall estimate initTLNise() g <- tlnise(bc$beta, bc$cov, rep(1, NROW(bc$beta)), seed = 12345, prnt = FALSE) V <- diag(B %*% g$Dgamma %*% t(B)) b <- B %*% g$gamma[,1] curve <- cbind(b, b - 2*sqrt(V), b + 2*sqrt(V)) ## Plot regionFullNames <- c("Industrial Midwest", "Northeast", "Northwest", "Southern California", "Southeast", "Southwest", "Upper Midwest") Y <- rbind(do.call("rbind", curveRegion), curve) * 1000 conf <- Y[,2:3] rng <- range(Y) + c(-1, 1) * .05 y <- Y[,1] nRegions <- length(curveRegion) + 1 x <- rep(1:ndays, nRegions) g <- ordered(rep(c(regionFullNames, "All Regions"), each = ndays), levels = c(regionFullNames, "All Regions")) library(lattice) ltheme <- canonical.theme(color = FALSE) ## in-built B&W theme ltheme$strip.background$col <- "transparent" ## change strip bg lattice.options(default.theme = ltheme) ## set as default ## trellis.device(height = 5, width = 8, color = FALSE) p <- xyplot(y ~ x | g, as.table = TRUE, subscripts = TRUE, type = "l", panel = function(x, y, subscripts, ...) { llines(x, Y[subscripts, 2], lty = 3, lwd = 2) llines(x, Y[subscripts, 3], lty = 3, lwd = 2) panel.xyplot(x, y, ...) panel.abline(h = 0, lty = 2) }, strip = strip.custom(bg = 0), ylim = rng, layout = c(4, 2), ylab = list(expression(paste("% increase in mortality with ", "10-", mu, "g/", m^3," increase in ", PM[10])), cex = 0.9), xlab = "Day in year", scales = list(alternating = 1, tck = c(0.8, 0)) ) print(p) @ \caption{National and regional smooth seasonal effects of \PMTen\ (particulate matter less than 10~$\mu$m in aerodynamic diameter) at lag 1 for 100 U.S. cities, 1987--2000. Estimates were obtained by pooling city-specific coefficients from the sine/cosine model (equation~\ref{model:periodic}). Dotted lines indicate pointwise 95\% posterior intervals.} \label{periodic-seasonal} \end{figure} Figure~\ref{postbeta-contour} shows samples from the joint posterior distributions of the regionally and nationally pooled harmonic coefficients $\beta_1$ and $\beta_2$ in the sine/cosine model for \PMTen\ at lag 1. The region with the strongest evidence of a seasonal pattern is the North East with a marginal posterior $\text{Prob}(\beta_2 > 0\mid\text{data}) = 0.94$. There is moderate evidence of seasonality in the Industrial Midwest and the North West with $\text{Prob}(\beta_2 > 0\mid\text{data}) = 0.83$ and $0.74$, respectively. The joint distributions of the coefficients for the South East, South West, Upper Midwest, and Southern California regions are centered at zero, indicating a lack of any seasonal variation. At the national level, the marginal posterior $\text{Prob}(\beta_2 > 0\mid\text{data}) = 0.91$, while the marginal distribution for $\beta_1$ is centered almost exactly around zero. \PMTen\ at lag 0 shows slightly more evidence of seasonality for the national average. However, the overall short-term effect of \PMTen\ at lag 0 is smaller on average, as indicated in Table~\ref{tab:natlavg}. There is little evidence of seasonal variation of the short-term effect of \PMTen\ at lag 2. \begin{figure}[tbh] \centering <>= rm(list = ls()) library(NMMAPSdata) source(file.path("data", "utils.R")) library(tlnise) library(ellipse) library(MASS) exclude <- c("anch", "hono") ## Load the results load(file.path("results", "seasonal.periodic-ortho.lag.012.pm10.rda")) r <- results[[2]][setdiff(names(results[[2]]), exclude)] ## Get region indicators for each city data(cities) cityRegion <- with(cities, region[match(names(r), city)]) regionNames <- sort(unique(cityRegion)) regionInd <- 1 * outer(cityRegion, regionNames, "==") colnames(regionInd) <- regionNames ## Pool estimates bc <- extractBetaCov(r, pollutant = "pm10") initTLNise() g <- tlnise(bc$beta, bc$cov, regionInd, prnt = FALSE, labelY = names(r), intercept = FALSE, seed = 10) ## Get sample from posterior distribution set.seed(20) gs <- tlnise:::sampleGamma(g, bc$cov, bc$beta, regionInd) set.seed(30) ds <- tlnise:::drawSample0(gs, n = 1500) ads <- array(t(ds), c(7, 3, nrow(ds))) ## Overall estimate initTLNise() g0 <- tlnise(bc$beta, bc$cov, rep(1, nrow(bc$beta)), seed = 100, prnt = FALSE) set.seed(200) gs0 <- tlnise:::sampleGamma(g0, bc$cov, bc$beta) set.seed(300) ds0 <- tlnise:::drawSample0(gs0, n = 1500) ## Compute posterior probs. postprob1 <- matrix(nrow = 8, ncol = 2) postprob2 <- matrix(nrow = 8, ncol = 2) n <- dim(ads)[3] for(i in 1:7) { postprob1[i, ] <- c(sum(ads[i, 2, ] > 0) / n, sum(ads[i, 2, ] < 0) / n) postprob2[i, ] <- c(sum(ads[i, 3, ] > 0) / n, sum(ads[i, 3, ] < 0) / n) } postprob1[8, ] <- c(sum(ds0[, 2] > 0)/nrow(ds0), sum(ds0[, 2] < 0)/nrow(ds0)) postprob2[8, ] <- c(sum(ds0[, 3] > 0)/nrow(ds0), sum(ds0[, 3] < 0)/nrow(ds0)) ## Draw ellipses x1 <- NULL y1 <- NULL x2 <- NULL y2 <- NULL for(i in 1:7) { V <- cov(t(ads[i, -1, ])) m <- colMeans(t(ads[i, -1, ])) e <- ellipse(V, centre = m, level = c(.75, .95)) e1 <- t(matrix(e[,1], nrow = 2)) e2 <- t(matrix(e[,2], nrow = 2)) x1 <- c(x1, e1[,1]) y1 <- c(y1, e2[,1]) x2 <- c(x2, e1[,2]) y2 <- c(y2, e2[,2]) } V0 <- cov(ds0[, -1]) m0 <- colMeans(ds0[, -1]) e <- ellipse(V0, centre = m0, level = c(.75, .95)) e1 <- t(matrix(e[,1], nrow = 2)) e2 <- t(matrix(e[,2], nrow = 2)) x1 <- c(x1, e1[,1]) y1 <- c(y1, e2[,1]) x2 <- c(x2, e1[,2]) y2 <- c(y2, e2[,2]) b <- vector("list", length = 8) ## Only plot a few points set.seed(1025301) for(i in 1:7) { b[[i]] <- t(ads[i, 2:3, sample(1500, 100)]) } set.seed(20) b[[8]] <- ds0[sample(1500, 15), 2:3] regionFullNames <- c("Industrial Midwest", "Northeast", "Northwest", "Southern California", "Southeast", "Southwest", "Upper Midwest", "All Regions") f <- gl(8, 50, ordered = TRUE) levels(f) <- regionFullNames library(lattice) ltheme <- canonical.theme(color = FALSE) ## in-built B&W theme ltheme$strip.background$col <- "transparent" ## change strip bg lattice.options(default.theme = ltheme) ## set as default ## trellis.device(dev = x11, height = 5, width = 8, color = FALSE) yrng <- c(-.1, .1) xrng <- c(-.1, .09) p <- xyplot(y2 ~ x2 | f, subscripts = TRUE, type = "l", lty = "43", lwd = 1.5, panel = function(x, y, subscripts, ...) { i <- (subscripts[1] - 1) / 50 + 1 lpoints(b[[i]], pch = 1, cex = 0.5) ## col = gray(0.5)) panel.xyplot(x, y, ...) panel.abline(h = 0, v = 0, lty = 3, lwd = 1) llines(x1[subscripts], y1[subscripts], lty = 1, col = 1, lwd = 1.5) ltext(-0.05, -0.075, labels = eval(substitute(expression(paste(italic(p), "(", beta[2] > 0, ")") == k), list(k = formatC(round(postprob2[i, 1], 2), digits = 2, flag = "#")))), cex = 0.7) ltext(-0.05, -0.09, labels = eval(substitute(expression(paste(italic(p), "(", beta[1] > 0, ")") == k), list(k = formatC(round(postprob1[i, 1], 2), digits = 2, flag = "#")))), cex = 0.7) }, as.table = TRUE, ylim = yrng, aspect = 1, xlim = xrng, xlab = expression(beta[1]), ylab = expression(beta[2]), scales = list(cex = 0.8, alternating = 1, tck = c(0.8, 0)), par.strip.text = list(cex = 0.8), strip = strip.custom(bg = 0) ) print(p) @ \caption{Samples from the national and regional joint posterior distributions of the pooled coefficients $\beta_1$ and $\beta_2$ from sine/cosine seasonal model (equation~\ref{model:periodic}) for \PMTen\ at lag 1, 100 U.S. cities, 1987--2000. The solid and dashed lines indicate the 75\% and 95\% regions for the joint posterior distribution of $\beta_1$ and $\beta_2$, given the data. Each panel includes the marginal posterior probabilities of each coefficient being greater than~0. Posterior probabilities closer to 1 indicate stronger evidence of seasonal patterns.} \label{postbeta-contour} \end{figure} \subsection*{Sensitivity analyses} We performed several additional analyses to explore the sensitivity of the estimated seasonal \PMTen\ log relative rates to model specification. Specifically, we examined sensitivity to (1) adjustment for long-term trends and seasonality in mortality; (2) the inclusion of other pollutants; (3) the exposure lag; and (4) the specification of the temperature component. Selecting the degrees of freedom of the smooth function of time used to control for long-term trends and seasonality is an important issue in time series models of air pollution and mortality because estimates of pollution coefficients can change considerably depending on the specification of the number of degrees of freedom~\citep{schw:1994,schw:1994a,toul:atki:lete:2004}. Our original model used a natural cubic spline with 7 degrees of freedom per year of data. For \PMTen\ at lag 1, Figure~\ref{periodic-dfTime-sens} shows the sensitivity of the sine/cosine model to using 3, 5, 7, 9, and 11 degrees of freedom per year in the smooth function of time. With only 3 degrees of freedom per year the curves deviate considerably from those in Figure~\ref{periodic-seasonal}, e.g. the estimate for Southern California exhibits much more seasonal variation. However, these deviations more likely reflect a lack of adjustment in the model rather than a real seasonal change. With more aggressive control for seasonality and long-term trends the estimates appear to be stable. \begin{figure}[tbh] \centering <>= rm(list = ls()) library(NMMAPSdata) library(tsModelSpec) library(tlnise) exclude <- c("anch", "hono") load("results/seasonal.periodic.sens.lag.012.df.1.pm10.rda") r <- results[[2]][setdiff(names(results[[2]]), exclude)] ## Lag 1 ## Get region indicators for each city data(cities) cityRegion <- with(cities, region[match(names(r), city)]) regionNames <- sort(unique(cityRegion)) regionInd <- 1 * outer(cityRegion, regionNames, "==") colnames(regionInd) <- regionNames ndays <- 365 x <- 1:ndays B <- harmonic(x, 1, ndays, intercept = TRUE) dfnames <- paste("df", seq(3, 11, 2), sep = "") source(file.path("data", "utils.R")) curveList <- lapply(seq(along = dfnames), function(i) { d <- lapply(r, "[[", dfnames[i]) bc <- extractBetaCov(d, "pm10") initTLNise() g <- tlnise(bc$beta, bc$cov, regionInd, prnt = FALSE, labelY = names(r), intercept = FALSE, seed = i * 10000) pooledRegion <- matrix(g$gamma[,1], byrow = TRUE, ncol = ncol(regionInd), nrow = nrow(g$theta)) pooledRegionSD <- matrix(g$gamma[,2], byrow = TRUE, ncol = ncol(regionInd), nrow = nrow(g$theta)) b <- B %*% pooledRegion zero <- matrix(0, nrow = ndays, ncol = 7 * 3) curveRegion <- lapply(1:ncol(pooledRegion), function(i) { B <- zero ## This is weird B[, seq(i, 7 * 3, 7)] <- harmonic(x, 1, ndays, intercept = TRUE) s <- sqrt(diag(B %*% g$Dgamma %*% t(B))) cbind(b[,i], b[,i] - 2*s, b[,i] + 2*s) }) g <- poolCoef(bc, method = "tlnise", extractors = list(function(x) x))[[1]] V <- diag(B %*% g$Dgamma %*% t(B)) b <- B %*% g$gamma[,1] curve <- cbind(b, b-2*sqrt(V), b+2*sqrt(V)) append(curveRegion, list(curve)) }) regionFullNames <- c("Industrial Midwest", "Northeast", "Northwest", "Southern California", "Southeast", "Southwest", "Upper Midwest", "All Regions") curve <- unlist(lapply(curveList, function(x) { as.vector(sapply(x, function(y) y[,1])) })) * 1000 lo <- unlist(lapply(curveList, function(x) { as.vector(sapply(x, function(y) y[,2])) })) * 1000 hi <- unlist(lapply(curveList, function(x) { as.vector(sapply(x, function(y) y[,3])) })) * 1000 xx <- rep(x, 8 * length(dfnames)) f <- ordered(rep(rep(regionFullNames, each = 365), length(dfnames)), levels=regionFullNames) g <- ordered(rep(dfnames, each = 365 * 8), levels = dfnames) rng <- range(cbind(curve, lo, hi)) library(lattice) ltheme <- canonical.theme(color = FALSE) ## in-built B&W theme ltheme$strip.background$col <- "transparent" ## change strip bg lattice.options(default.theme = ltheme) ## set as default ## trellis.device(dev = x11, height = 5, width = 8, color = FALSE) l <- trellis.par.get("superpose.line") l$lty <- c(2,3,1,4,5,6,7) l$lwd <- c(1,1,3,1,1,1,1) trellis.par.set("superpose.line", l) p <- xyplot(curve ~ xx | f, groups = g, type = "l", as.table = TRUE, ylim=rng, xlab = "Day in year", ylab = expression(paste("% increase in mortality with ", "10-", mu, "g/", m^3, " increase in ", PM[10])), panel = function(x, y, subscripts, groups, ...) { panel.superpose(x, y, subscripts, groups, ...) panel.abline(h = 0, lty = 1) }, strip = strip.custom(bg = 0), layout = c(4, 2), scales = list(alternating = 1, tck = c(0.8, 0), x = list(cex=0.8)) ) print(p) @ \caption{Sensitivity of national and regional estimates of smooth seasonal effects for \PMTen\ at lag 1 to the degrees of freedom assigned to the smooth function of time, 100 U.S. cities, 1987--2000. The degrees of freedom chosen were 3 (short dashed), 5 (dotted), 7 (solid), 9 (dot-dashed), and 11 (long dashed) degrees of freedom per year of data.} \label{periodic-dfTime-sens} \end{figure} Table~\ref{tab:multipollutant} shows the sensitivity of the lag 1 \PMTen\ log relative rate as other pollutants are included in the pollutant-season interaction model. The seasonal national average estimates exhibit the same pattern when either current day SO$_2$, O$_3$, or NO$_2$ are included as copollutants in the model. With SO$_2$ or NO$_2$ included, the summer effect for \PMTen\ increases slightly over the single pollutant estimate obtained with a restricted list of 45 cities. The inclusion of O$_3$ appears to attenuate the effect somewhat. Note that the lack of data for the other pollutants reduced the number of cities available for the copollutant analyses. <>= rm(list = ls(all = TRUE)) exclude <- c("anch", "hono") library(tlnise) source(file.path("data", "utils.R")) ## No copollutants load(file.path("results", "seasonal.factor2.lag.012.pm10.rda")) bc <- extractBetaCov(subset(results[[2]],!(names(results[[2]]) %in% exclude)),"pm10") pNoneall <- poolCoef(bc) * 1000 ## Load copollutant model results load(file.path("results", "seasonal.none.stepfun.pm10.lag.1.rda")) bc <- extractBetaCov(results, "pm10") initTLNise() g <- tlnise(bc$beta, bc$cov, rep(1, nrow(bc$beta)), seed = 10, prnt = FALSE) pNone <- g$gamma[, 1:2] * 1000 load(file.path("results", "seasonal.so2.stepfun.pm10.lag.1.rda")) bc <- extractBetaCov(results, "pm10") initTLNise() g <- tlnise(bc$beta, bc$cov, rep(1, nrow(bc$beta)), seed = 100, prnt = FALSE) pSO2 <- g$gamma[, 1:2] * 1000 load(file.path("results", "seasonal.o3.stepfun.pm10.lag.1.rda")) bc <- extractBetaCov(results, "pm10") initTLNise() g <- tlnise(bc$beta, bc$cov, rep(1, nrow(bc$beta)), seed = 1000, prnt = FALSE) pO3 <- g$gamma[, 1:2] * 1000 load(file.path("results", "seasonal.no2.stepfun.pm10.lag.1.rda")) bc <- extractBetaCov(results, "pm10") initTLNise() g <- tlnise(bc$beta, bc$cov, rep(1, nrow(bc$beta)), seed = 200, prnt = FALSE) pNO2 <- g$gamma[, 1:2] * 1000 combined <- rbind(pNoneall, pNone, pSO2, pO3, pNO2) ## Make table ppp <- cbind(combined[,1], combined[,1] - 2*combined[,2], combined[,1] + 2*combined[,2]) mest <- matrix(gsub("\\$", "", sub("_", " ", LouisFormat(ppp, type = "confint"))), byrow = TRUE, ncol = 4) dimnames(mest) <- list(c("\\PMTen\\ only (100 cities)", "\\PMTen\\ only (restricted)", "with SO$_2$ (restricted)", "with O$_3$ (restricted)", "with NO$_2$ (restricted)" ), c("Winter", "Spring", "Summer", "Fall")) print(xtable(mest, caption = "National average estimates and 95\\% posterior intervals of season-specific lag 1 \PMTen\ log relative rates adjusted for other pollutants. Estimates represent the percent increase in daily mortality for a 10~\mgmcube\ increase in \PMTen. Results for ``45 cities'' were obtained by pooling coefficients from the \textit{same} list of 45 cities for which measurements for all copollutants were simultaneously available.", label = "tab:multipollutant", align = c("l", rep("l", 4)))) @ Figure~\ref{periodic-lags012} shows the region-specific seasonal trends for \PMTen\ at lags 0, 1, and 2 with 95\% posterior regions for lag 1. The lag 0 seasonal trend for each region has a similar pattern to the lag 1 trend but is lower in general. In the North East and Industrial Midwest, the peak in the seasonal trend for lag 0 appears to come in late May while the peak for lag 1 comes in mid-July. The North West, South East, and Upper Midwest exhibit small changes in seasonal patterns across lags but remain largely flat. Southern California and the South West appear to pick up slightly stronger seasonal patterns in lags 0 and 2. \begin{figure}[tbh] \centering <>= ## Periodic Seasonal by Region (all lags at once) rm(list = ls()) source(file.path("data", "utils.R")) exclude <- c("anch", "hono") ## Load the results load(file.path("results", "seasonal.periodic.lag.012.dfSeas.1.rda")) results <- lapply(results, function(r) r[setdiff(names(r), exclude)]) ## Get region indicators for each city data(cities) cityRegion <- with(cities, region[match(names(results[[1]]), city)]) regionNames <- sort(unique(cityRegion)) regionInd <- 1 * outer(cityRegion, regionNames, "==") colnames(regionInd) <- regionNames ndays <- 365 x <- 1:ndays B <- harmonic(x, 1, ndays, intercept = TRUE) bc <- lapply(results, extractBetaCov, pollutant = "pm10") set.seed(100302) gg <- lapply(bc, function(x) { initTLNise() tlnise(x$beta, x$cov, regionInd, prnt = FALSE, intercept = FALSE, seed = sample(100000, 1)) }) pooledRegion <- lapply(gg, function(g) { matrix(g$gamma[,1], byrow = TRUE, ncol = ncol(regionInd), nrow = nrow(g$theta)) }) pooledRegionSD <- lapply(gg, function(g) { matrix(g$gamma[,2], byrow = TRUE, ncol = ncol(regionInd), nrow = nrow(g$theta)) }) b <- lapply(pooledRegion, function(p) B %*% p) zero <- matrix(0, nrow = ndays, ncol = 7 * 3) curveRegion <- lapply(seq(along = pooledRegion), function(j) { pr <- pooledRegion[[j]] lapply(1:ncol(pr), function(i) { B <- zero ## This is weird B[, seq(i, 7 * 3, 7)] <- harmonic(x, 1, ndays, intercept = TRUE) s <- sqrt(diag(B %*% gg[[j]]$Dgamma %*% t(B))) cbind(b[[j]][,i], b[[j]][,i] - 2*s, b[[j]][,i] + 2*s) }) }) ## Overall estimate set.seed(10321342) gg <- lapply(bc, function(b) { initTLNise() tlnise(b$beta, b$cov, rep(1, NROW(b$beta)), seed = sample(100000, 1), prnt = FALSE) }) V <- lapply(gg, function(g) diag(B %*% g$Dgamma %*% t(B))) b <- lapply(gg, function(g) B %*% g$gamma[,1]) curve <- lapply(seq(along = b), function(i) { cbind(b[[i]], b[[i]] - 2*sqrt(V[[i]]), b[[i]] + 2*sqrt(V[[i]])) }) curveTotal <- curveRegion for(i in seq(along = curveTotal)) { curveTotal[[i]] <- append(curveTotal[[i]], list(curve[[i]])) } regionFullNames <- c("Industrial Midwest", "Northeast", "Northwest", "Southern California", "Southeast", "Southwest", "Upper Midwest", "All Regions") y0 <- do.call("rbind", curveTotal[[1]])[,1] * 1000 y1 <- do.call("rbind", curveTotal[[2]])[,1] * 1000 conf1 <- do.call("rbind", curveTotal[[2]])[, 2:3] * 1000 conf1 <- rbind(conf1, conf1, conf1) y2 <- do.call("rbind", curveTotal[[3]])[,1] * 1000 rng <- range(unlist(curveTotal)) * 1000 + c(-1,1) * .07 xx <- rep(x, length(regionFullNames)) f <- ordered(rep(regionFullNames, each = 365), levels = regionFullNames) ## trellis.device(dev = x11, height = 5, width = 8, color = FALSE) l <- trellis.par.get("superpose.line") l$lty <- c(2, 1, 4) l$lwd <- c(1, 1.5, 1) trellis.par.set("superpose.line", l) p <- xyplot(y0 + y1 + y2 ~ xx | f, allow.multiple = TRUE, type = "l", ylim = rng, layout = c(4, 2), as.table = TRUE, panel = function(x, y, subscripts, groups, ...) { s <- subscripts[1:365] xx <- x[1:365] llines(xx, conf1[s, 1], lty = 1, lwd = 1.5) llines(xx, conf1[s, 2], lty = 1, lwd = 1.5) panel.superpose(x, y, subscripts, groups, ...) panel.abline(h = 0, lty = 5) }, xlab = "Day in year", strip = strip.custom(bg = 0), scales = list(alternating = 1, tck = c(0.8, 0), x = list(cex=.8)), ylab = expression(paste("% increase in mortality with ", "10-", mu, "g/", m^3, " increase in ", PM[10])) ) print(p) @ \caption{Sensitivity of national and regional estimates of smooth seasonal effects to \PMTen\ exposure lag, 100 U.S. cities, 1987--2000. Solid black lines indicate the log relative rate estimate and pointwise 95\% posterior intervals for \PMTen\ at lag 1. Also shown are the log relative rate estimates for \PMTen\ at lag 0 (short dashed) and lag 2 (dot-dashed).} \label{periodic-lags012} \end{figure} To explore sensitivity to temperature and to control for temperature effects spread out over multiple days, we fit separate models which included an additional interaction between the current day temperature and the running mean of lags 1 through 3 as well as a running mean of lags 1 through 7. Neither addition to the model made a noticeable impact on the regional or overall seasonal patterns of the \PMTen\ log relative rate. \section*{DISCUSSION} In this paper we have used a Bayesian semi-parametric hierarchical model for estimating time-varying effects of air pollution on daily mortality. The model combines information across multiple cities to increase the precision of seasonal relative rate estimates. We found seasonal patterns for the national average effect of \PMTen\ at both lag 0 and lag 1. Seasonal patterns varied by geographical region with a strong pattern for lag 1 appearing in the North East region. Equally interesting was the lack of seasonal variation in the southern regions of the country. Understanding the health effects of PM components is an increasingly important research problem, as noted in~\cite{nrc:2004}. Exploration of the spatial-temporal variation of the short-term effects of PM on mortality is essential to generating (or ruling out) specific hypotheses about the toxicity of PM components. Data are becoming available from the Environmental Protection Agency's PM$_{2.5}$ National Chemical Speciation Network which contain detailed time series information on the composition of PM~\citep{epa:2003}. Knowledge of the spatial-temporal patterns of the short-term effects of PM will be necessary for guiding future analyses of these PM constituent data. The modification of short-term effects of pollution by season has been explored previously in a number of single city studies. Styer, et~al.~\citep{stye:mcmi:gao:davi:sack:1995} analyzed data from Cook County and Salt Lake County and found (for Cook County) that the effect of \PMTen\ was higher in the spring and fall. In a review of research on particulate air pollution and mortality, Moolgavkar and Luebeck stated that analyses in Steubenville, Philadelphia, and Cook County indicated that the effects of pollutants are strongly modified by season~\citep{mool:lueb:1996}. Kelsall, et~al.~\citep{kels:same:zege:xu:1997} examined Philadelphia data and concluded that after adjusting for long-term variation in mortality and the effects of weather there was little evidence of different effects by season. More recently, Moolgavkar~\citep{mool:2003} analyzed data from Cook County and Los Angeles County and found between season variation of the effects of numerous pollutants on daily mortality in both counties. Estimation of short-term effects of air pollution on daily mortality for single cities is hampered by the inherent high variability of the resulting effect estimates. Estimation of seasonally varying effects poses an additional challenge because it involves further stratification of the data. Since fewer data are available for estimating season-specific effects, the variability of such estimates increases, making it difficult to discern any meaningful seasonal pattern. Rather, a multi-site approach, where information can be combined across neighboring cities, can provide more precise city-specific log relative rates as well as provide a natural framework for characterizing regional and national trends. For example, Moolgavkar~\citep{mool:2003} found an inconsistent relationship between \PMTen\ and mortality in Los Angeles County when SO$_2$ was included in the model. However, we find in Table~\ref{tab:multipollutant} that the seasonal pattern in the national average (as well as the Southern California regional average, not shown) is robust to the inclusion of copollutants. Regional differences in the short-term effects of \PMTen\ have been explored in NMMAPS~\citep{domi:dani:zege:same:2002,domi:mcder:zege:same:2003} and in the Air Pollution and Health: A European Approach (APHEA) study~\citep{kats:toul:samo:gryp:2001,samo:schw:wojt:2001}. Both studies found regional modification of the effect of \PMTen\ on daily non-accidental mortality. The results presented here are consistent with previous NMMAPS analyses with respect to regional average \PMTen\ effects. The estimated seasonal patterns for lag 1 appear to have two distinct shapes. The Industrial Midwest, North East, and North West regions all exhibit a larger effect during the summer months while the other regions exhibit little seasonal variation. These patterns are somewhat sensitive to the lag of pollution used. Therefore, an important question raised by this study is how the total effect of \PMTen\ in a distributed lags model would vary by season. Unfortunately, the U.S. pollution database has daily PM levels for a small fraction of cities making if difficult to answer this question. These analyses provide strong evidence that the \PMTen\ log relative rate is greater in the spring and summer in the northern regions, particularly in the North East. This result admits several competing hypotheses. First, the PM constituents may vary by season in these regions with the most toxic particles having a spring/summer maximum. A detailed analysis of the regional and seasonal variation in PM constituents is needed to better understand these patterns. PM$_{2.5}$ and associated speciation data have only been collected regularly in the U.S. since 1999. Therefore, we expect multi-site time series studies of PM$_{2.5}$ speciation data to be carried out in the near future. Second, even if the constituents do not vary substantially, it is possible that the higher short-term effect of ambient exposure to PM estimated in spring and summer in the North East regions could be the result of more time spent outdoors and therefore less exposure measurement error. While there has been some work suggesting that differences between personal and ambient exposure to \PMTen\ can vary across seasons~\citep{keel:dvon:yip:2002}, there are also data indicating that individual activity patterns are, on average, very similar across regions of the U.S.~\citep{nhaps:1996,klep:nels:ott:robi:tsan:swit:beha:hern:enge:2001,wall:mitc:ocon:2003}. If the seasonal variation in the \PMTen\ log relative rates were entirely attributable to activity patterns, then we would expect to have estimated similar seasonal patterns of PM effects across regions. However, Southern California exhibited a positive effect that does not vary by season. A third possibility is that the particle effect may be swamped by the more powerful effect of winter infectious diseases so that it can only be observed when infectious diseases are less prevalent. This hypothesis does not explain the absence of a \PMTen\ mortality association in the southern regions where infectious disease incidence is also seasonal. Finally, this result may reflect seasonally varying bias from an, as yet, unidentified source. Having established the pattern of regional and seasonal variation in the \PMTen\ log relative rate, a more targeted investigation of possible sources of such bias is now possible. \bibliographystyle{aje2} \bibliography{seasonal} \appendix \clearpage \setlength{\parindent}{0in} \section*{TABLE AND FIGURE LEGENDS} Table 1: Summary statistics for average daily mortality, \PMTen, and temperature for 100 U.S. cities, 1987--2000.\\ Table 2: National average estimates and 95\% posterior intervals of the overall and season-specific effects of \PMTen\ at lags 0, 1, and 2 for 100 cities, 1987--2000. Estimates were obtained by pooling city-specific coefficients from the main effect and pollutant-season interaction models, respectively, and represent the percent increase in daily mortality for a 10~\mgmcube\ increase in \PMTen.\\ Table 3: National average estimates and 95\% posterior intervals of season-specific lag 1 \PMTen\ log relative rates adjusted for other pollutants. Estimates represent the percent increase in daily mortality for a 10~\mgmcube\ increase in \PMTen. Results for ``45 cities'' were obtained by pooling coefficients from the \textit{same} list of 45 cities for which measurements for all copollutants were simultaneously available.\\ Figure 1: Boxplots of square root daily mortality by season for the 10 largest U.S. cities, 1987--2000.\\ Figure 2: Boxplots of regionally averaged daily levels of particulate matter less than 10~$\mu$m in aerodynamic diameter (\PMTen) by season for 100 U.S. cities, 1987--2000.\\ Figure 3: National and regional smooth seasonal effects of \PMTen\ (particulate matter less than 10~$\mu$m in aerodynamic diameter) at lag 1 for 100 U.S. cities, 1987--2000. Estimates were obtained by pooling city-specific coefficients from the sine/cosine model (equation~\ref{model:periodic}). Dotted lines indicate pointwise 95\% posterior intervals.\\ Figure 4: Samples from the national and regional joint posterior distributions of the pooled coefficients $\beta_1$ and $\beta_2$ from sine/cosine seasonal model (equation~\ref{model:periodic}) for \PMTen\ at lag 1, 100 U.S. cities, 1987--2000. The solid and dashed lines indicate the 75\% and 95\% regions for the joint posterior distribution of $\beta_1$ and $\beta_2$, given the data. Each panel includes the marginal posterior probabilities of each coefficient being greater than~0. Posterior probabilities closer to 1 indicate stronger evidence of seasonal patterns.\\ Figure 5: Sensitivity of national and regional estimates of smooth seasonal effects for \PMTen\ at lag 1 to the degrees of freedom assigned to the smooth function of time, 100 U.S. cities, 1987--2000. The degrees of freedom chosen were 3 (short dashed), 5 (dotted), 7 (solid), 9 (dot-dashed), and 11 (long dashed) degrees of freedom per year of data.\\ Figure 6: Sensitivity of national and regional estimates of smooth seasonal effects to \PMTen\ exposure lag, 100 U.S. cities, 1987--2000. Solid black lines indicate the log relative rate estimate and pointwise 95\% posterior intervals for \PMTen\ at lag 1. Also shown are the log relative rate estimates for \PMTen\ at lag 0 (short dashed) and lag 2 (dot-dashed).\\ \clearpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \end{document}