\documentclass[11pt]{article}

\usepackage{amsmath,amssymb,geometry,textcomp}
\usepackage{graphicx}

\geometry{margin=1in}

\usepackage[sort&compress]{natbib}
\bibpunct{(}{)}{,}{n}{}{;}  % commas between entries
\renewcommand{\bibnumfmt}[1]{#1.}  %refs without []'s around numbers

%%\renewcommand{\familydefault}{cmss}
\renewcommand{\baselinestretch}{1.8}
\newcommand{\bsym}{\boldsymbol}
\newcommand{\mb}{\mathbf}
\newcommand{\PMTen}{PM$_{10}$}
\newcommand{\mgmcube}{$\mu$g$/$m$^3$}

\date{}

\title{Seasonal Analyses of Air Pollution and Mortality in 100
U.S. Cities}

\author{Roger D. Peng\and Francesca Dominici\and Roberto
Pastor-Barriuso\and Scott L. Zeger\and Jonathan M. Samet}


\begin{document}

\maketitle

\noindent Word count:  3843

\clearpage 

\begin{abstract}
Time series models relating short-term changes in air pollution levels
to daily mortality counts typically assume that the effects of air
pollution on the log relative rate of mortality do not vary with time.
However, these short-term effects might plausibly vary by season.
Changes in the sources of air pollution and meteorology can result in
changes in characteristics of the air pollution mixture across
seasons.  The authors develop Bayesian semi-parametric hierarchical
models for estimating time-varying effects of pollution on mortality
in multi-site time series studies. The methods are applied to the
National Morbidity and Mortality Air Pollution Study database for the
period 1987--2000, which includes data for 100 U.S. cities.  At the
national level, a 10~\mgmcube\ increase in \PMTen\ at lag 1 is
associated with a $0.15$ (95\% posterior interval: $-0.08$, $0.39$),
$0.14$ ($-0.14$, $0.42$), $0.36$ ($0.11$, $0.61$), and $0.14$
($-0.06$, $0.34$) percent increase in mortality for winter, spring,
summer, and fall, respectively.  An analysis by geographical regions
finds a strong seasonal pattern in the northeast (with a peak in
summer) and little seasonal variation in the southern regions of the
country.  These results provide useful information for understanding
particle toxicity and guiding future analyses of particle constituent
data.\\

\noindent Word count:  200\\

\noindent MeSH headings: air pollution, epidemiology,
models/statistical, mortality
\end{abstract}

\clearpage

\section*{Acknowledgments}

This research was supported in part by NIH/NHLBI grant T32HL07024,
NIEHS grant R01ES012054, the NIEHS Center for Urban Environmental
Health (P30ES03819).  Research described in this article was partially
supported by a contract and grant from the Health Effects Institute
(HEI), an organization jointly funded by the U.S. Environmental
Protection Agency (EPA) and automotive manufacturers.  The contents of
this article do not necessarily reflect the views and policies of HEI,
nor do they necessarily reflect the views and policies of EPA, nor
motor vehicle or engine manufacturers.  Roberto Pastor-Barriuso was
supported in part by grant EPY 1261/02 from the Instituto de Salud
Carlos III.

\clearpage

Numerous time series studies have indicated a positive association
between short-term variation in ambient levels of particulate matter
(PM) and daily mortality counts~\citep[see
e.g.][]{pope:dock:schw:1995,dock:pope:1996,bell:same:domi:2004}.  The
models used in these studies have typically assumed that the
association between PM and daily mortality is constant over the study
interval.  However, the short-term effects of PM on mortality might
plausibly exhibit seasonal variation.  Studies in a number of
locations have shown that that the characteristics of the PM mixture
change throughout the year and that the relative and absolute
contributions of particular components to PM mass may be different
during different times of the year~\citep[][ch.~3 and references
therein]{epa:2003}.  Patterns of human activity also change from
season to season, so that a particular air pollution concentration in
one season may lead to a different exposure in another season.  Other
potential time-varying confounding and modifying factors, such as
temperature and influenza epidemics, can also affect estimates of
short-term effects of air pollution on mortality differently in
different seasons.  All of the issues described above indicate a need
to extend current models for time series data on air pollution and
health to incorporate time-varying pollution effects.

The composition of particulate matter is known to vary in the spatial
domain as well, suggesting that seasonal patterns should be examined
by geographical region~\citep{spen:wils:1996}.  For example, in the
North West wood burning is a greater source of PM in the colder
seasons than in the warmer months.  The PM mixture in the Eastern
U.S. contains a large fraction of sulfates (almost 40\% of total mass)
originating from power plants in the Midwest, while PM in areas of the
Western U.S. such as Southern California and the Pacific North West
contains more nitrates and organic compounds (approximately 30\% of
total mass)~\citep{spen:wils:1996,epa:1996,epa:2003}.

In this paper we develop statistical methods for estimating seasonal
patterns in the short-term effects of air pollution on mortality in
multi-site time series studies.  We propose Bayesian semi-parametric
hierarchical models for estimating time-varying health effects within
each city and for comparing temporal patterns across cities and
geographical regions.  Using data from the National Morbidity,
Mortality, and Air Pollution
Study~(NMMAPS)~\citep{same:zege:domi:curri:cour:dock:schw:zano:2000},
we estimate seasonal patterns in the short-term effects of PM less
than 10~$\mu$m in aerodynamic diameter (\PMTen) on daily
non-accidental mortality.  The data have been extended from the
original study to include 100 U.S. cities for the period 1987--2000,
an addition of 10 cities and 6 years of data.  The seasonal patterns
are estimated for seven geographical regions and on average for the
entire U.S.  We explore the sensitivity of estimated seasonal patterns
to temperature adjustment, copollutants, exposure lag, and adjustments
for long-term mortality trends.


\section*{MATERIALS AND METHODS}
\label{sec:datamethods}
\label{sec:models}

The NMMAPS database contains daily time series of mortality, weather,
and air pollution assembled from publicly available sources for the
largest 100 cities in the United States.  A full description of the
construction of the database can be found
in~\cite{same:zege:domi:curri:cour:dock:schw:zano:2000}.  The most
recent data are available at \texttt{http://www.ihapss.jhsph.edu}.

Within each city, we specify a semi-parametric regression model for
the time-varying log relative rate using a generalized additive model
framework~\citep{hast:tibs:1990}.  More specifically, let $Y^c_t$ be
the total number of non-accidental deaths on day $t$ in city $c$.  The
$Y^c_t$ are Poisson distributed with expectation $\mu^c_t$ and with
possible overdispersion $\phi^c$.  The general form of the
city-specific model is
\begin{eqnarray}
Y^c_t & \sim & \text{Poisson}(\mu^c_t) \nonumber\\
\text{Var}(Y^c_t) & = & \phi^c\mu^c_t \nonumber\\
\log(\mu_t^c) & = & \beta^c(t)\,x^c_{t-\ell} + \text{confounders}
\label{model:general}
\end{eqnarray}
where $x^c_{t-\ell}$ is the lag $\ell$ \PMTen\ level for day $t$.

The function $\beta^c(t)$ in equation~\ref{model:general} represents
the time-varying effect of \PMTen\ on mortality and is a yearly
periodic function for estimating seasonal patterns.  To estimate
smooth seasonal patterns in the city-specific log relative rates, we
use a sine/cosine model for $\beta^c(t)$ of the form
\begin{equation}
\beta^c(t) = \beta^c_0 + \beta^c_{1}\sin(2\pi t/365) / c_1
+ \beta^c_{2} \cos(2\pi t/365) / c_2
\label{model:periodic}
\end{equation}
where $\beta^c_0$, $\beta^c_{1}$, $\beta^c_{2}$ are estimated and
$c_1$ and $c_2$ are known orthogonalizing constants.  In this model,
the effect of \PMTen\ is allowed to vary smoothly over the course of a
year, but is constrained to be periodic across
years~\citep[e.g.][]{stol:stra:ziel:1999}. While it is possible to
include higher frequency basis terms for the representation of
$\beta^c(t)$ in equation~\ref{model:periodic}, there is little reason
to expect there to be much high frequency variation in the seasonal
effects of \PMTen.

To allow for season-specific \PMTen\ log relative rates, we use a
pollutant-season interaction model with indicator functions for each
season:
\begin{eqnarray}
\beta^c(t) = \beta^c_W\, I_{\text{winter}} + \beta^c_{Sp}\,
I_{\text{spring}} + \beta^c_{Sm}\, I_{\text{summer}} + \beta^c_{F}\,
I_{\text{fall}},
\label{model:seasonal}
\end{eqnarray}
where winter, spring, summer, and fall are defined as beginning on
December 21st, March 21st, June 21st, and September 21st,
respectively.  
%%As in model~\ref{model:main}, the pollutant-season
%%interaction model assumes a log-linear effect of \PMTen\ on mortality,
%%but it provides separate estimates for each season.  
Although these seasonal estimates serve as concise summaries, it is
unlikely that the effect of \PMTen\ on mortality is discontinuous
across seasons.  Furthermore, the estimates depend on the
specification of the season boundaries which are artificial and can
differ considerably across geographic regions.

Our main effect model, which does not contain any adjustment for
season takes $\beta^c(t)$ to be constant across time, i.e.
\begin{equation}
\beta^c(t) = \beta^c.
\label{model:main}
\end{equation}
This model assumes a homogeneous log-linear effect of \PMTen\ on
mortality, a condition that was found appropriate in previous NMMAPS
analyses~\citep{dani:domi:same:zege:2000,domi:mcde:zege:same:2002,domi:dani:zege:same:2002,domi:mcder:zege:same:2003}.
Note that the main effect model is nested within the interaction and
sine/cosine models, so that if
$\beta_{W}=\beta_{Sp}=\beta_{Sm}=\beta_{F}$ in
equation~\ref{model:seasonal} and $\beta^c_1 = \beta^c_2 = 0$ in
equation~\ref{model:periodic}, both models reduce to
equation~\ref{model:main}.

The potential confounders included in the equation~\ref{model:general}
are similar to those used in previous NMMAPS
analyses~\citep[e.g.][]{domi:dani:zege:same:2002,domi:mcder:zege:same:2003}
and consist of indicators for the day of the week, age specific
intercepts corresponding to the categories of less than 65 years of
age, 65--74 years, and 75 years or older, a smooth function of
calendar time, and smooth functions of temperature and dew point
temperature.  In addition to the overall smooth function of time, two
separate smooth functions of time are included for the older two age
groups.  All of the smooth functions are represented by natural cubic
splines.

The complexity of each of the smooth functions of time and temperature
is controlled by the numbers of degrees of freedom assigned to each
function.  We use 7 degrees of freedom per year for the overall smooth
function of time, which removes any fluctuations in mortality at time
scales longer than two months.  The separate smooth functions of time
for the older two age categories each receive 1 degree of freedom per
year to capture gradual trends specific to these age groups.  For
temperature we use 6 degrees of freedom and for dew point we use 3
degrees of freedom.  A somewhat larger number of degrees of freedom is
necessary for temperature in order to capture the well known
``J-shaped'' nonlinear relationship between temperature and mortality.
Others have adjusted for temperature simply by doing separate analyses
of the data by
season~\citep{mool:lueb:hall:ande:1995,same:zege:kels:xu:kalk:1998,domi:same:zege:2000,same:domi:curr:cour:zege:2000,nmmaps1,same:zege:domi:curri:cour:dock:schw:zano:2000}.

All of the above models were fit using quasi-likelihood methods as
implemented in the R statistical software package~\citep{R:2004}.  The
data are available via the NMMAPSdata package~\citep{peng:welt:2004}
and code for fitting the models is available on the web
at~\texttt{http://www.ihapss.jhsph.edu/data/NMMAPS/R/}.


\subsection*{Pooling information across cities}
\label{sec:hnormal}

After fitting each of the city-specific models we use a hierarchical
normal model for pooling information and borrowing strength across
cities~\citep[see][]{nmmaps1,domi:same:zege:2000,domi:dani:zege:same:2002}.
For a particular model, we have a city-specific maximum likelihood
estimate $\widehat{\boldsymbol{\beta}}^c$ which is a scalar for the
main effect model in equation~\ref{model:main}, a vector of length
four for the pollutant-season interaction model in
equation~\ref{model:seasonal}, and a vector of length three for the
sine/cosine model in equation~\ref{model:periodic}.
$\widehat{\boldsymbol{\beta}}^c$ is assumed to be normally distributed
around the true city-specific log relative rates
$\boldsymbol{\beta}^c$ with covariance matrix $V^c$, estimated within
each city.  In addition, the true rates are assumed to vary
independently across cities according to a normal distribution, i.e.
\begin{eqnarray}
\widehat{\boldsymbol{\beta}}^c\mid\boldsymbol{\beta}^c
& \sim & 
\mathcal{N}(\boldsymbol{\beta}^c, V^c)\nonumber\\
\boldsymbol{\beta}^c\mid\boldsymbol{\alpha},\Sigma
& \sim & 
\mathcal{N}(Z^c\,\boldsymbol{\alpha}, \Sigma) \label{model:hnorm}
\end{eqnarray}
where $\Sigma$ is the covariance matrix describing the between-city
variation of $\boldsymbol{\beta}^c$ and $\boldsymbol{\alpha}$ is the
overall mean for the cities.  $Z^c$ is a matrix of second stage
covariates for describing possible differences between cities.  To
characterize regional differences in seasonal patterns we include as a
second stage covariate an indicator for the following seven regions
(also used in~\cite{same:zege:domi:curri:cour:dock:schw:zano:2000}):
Industrial Midwest (19 cities), North East (17), North West (13),
Southern California (7), South East (26), South West (10), and Upper
Midwest (8).

The final national average estimate $\boldsymbol{\alpha}$ represents
the combined information from all of the cities. The diagonal elements
of $\Sigma$ measure the heterogeneity across cities and the
off-diagonal elements represent the correlation of the estimates
between cities.  The hierarchical model is fit using the two level
normal independent sampling estimation (TLNise) software
of~\cite{ever:morr:2000} with uniform priors on $\boldsymbol{\alpha}$
and $\Sigma$.  This software provides a sample from the posterior
distribution of $\Sigma$ from which one can calculate posterior means
and variances of the overall and city-specific pollution effects.

\subsection*{Evidence for seasonality in the log relative rates}

To quantify the amount of evidence supporting the presence of a
seasonal pattern in the national and regional averages we examine the
posterior distributions of the pooled log relative rate estimates.  In
particular, for the sine/cosine model in
equation~\ref{model:periodic}, we can check the posterior probability
that the coefficients for the harmonic terms are non-zero.  While the
values of the pooled coefficients $\beta_1$ and $\beta_2$ do not have
meaningful interpretations, if either one of these coefficients is
non-zero with high probability, then there is strong evidence of a
seasonal trend.


\section*{RESULTS}
\label{sec:results}

The daily mortality counts for the years 1987--2000 include
approximately 10 million deaths.  By city, the daily average ranged
from 2 deaths per day in Arlington, VA to 190 per day in New York, NY.
The daily mean of \PMTen\ ranged from 13~\mgmcube\ in Coventry, RI to
49~\mgmcube\ in Fresno, CA.  Summary statistics for the dataset are
shown in Table~\ref{datasum}.
\begin{table}[ht]
\begin{center}
\begin{tabular}{lrrrrr}
\hline
 & Min. & 25\% & Median & 75\% & Max. \\
\hline
Avg. Daily Mortality & 2.2 & 7.6 & 12.2 & 20.4 & 190.2 \\
Avg. Daily \PMTen\ (\mgmcube) & 13.2 & 24.7 & 27.1 & 32.0 & 48.7 \\
Avg. Daily Temp. (\textdegree~F) & 37.0 & 51.8 & 58.1 & 64.7 & 77.8 \\
\hline
\end{tabular}
\end{center}
\caption{Summary statistics for average daily mortality, \PMTen, and
temperature for 100 U.S. cities, 1987--2000.}
\label{datasum}
\end{table}

Mortality and \PMTen\ levels are known to vary considerably across
seasons.  Generally, mortality tends to be higher in the winter and
fall and lower in the summer and spring.  Figure~\ref{mort-top10}
shows boxplots of the square root daily mortality counts for the
largest 10 cities in the United States.  
\begin{figure}[tbh]
  \centering
<<mortalityTop10,fig=true,eps=false,echo=false,results=hide,height=4,width=10>>=
library(NMMAPSdata)

load(file.path("data", "CityDataCombined.rda"))
data(cities)
data(regions)

SeasonNames <- c("Winter", "Spring", "Summer", "Fall")
cc <- dget(file.path("data", "cityList.R"))
cc <- cc[!(cc %in% c("hono", "anch"))]
useNames <- paste(rep(cc, 4), rep(SeasonNames, each = length(cc)), sep = ".")
Season <- citydata$Season[1:5114]
regionID <- with(cities, region[match(cc, city)])
regionNames <- c("Industrial Midwest", "Northeast", "Northwest",
                 "Southern California", "Southeast", "Southwest",
                 "Upper Midwest")

## Top 10 cities
cc <- with(cities, city[order(pop, decreasing = TRUE)])[1:10]
dCity <- with(citydata, split(death, city))
dCity <- dCity[names(dCity) %in% cc]
dSeas <- lapply(dCity, split, f = Season)
ord <- order(with(cities, pop[match(names(dSeas), city)]), decreasing = TRUE)
dSeas <- dSeas[ord]

## Modify city names
cityNames <- with(cities, cityname[match(cc, city)])
cityNames[4] <- "Dallas-\nFort Worth"
cityNames[7] <- "Anaheim"

## x11(width = 10, height = 4)
par(mar = c(4, 4, 2, 0) + .1, las = 2)
plot(0, 0, ylim = c(4, 17), xlim = c(1, 50), xlab = "Season", axes = FALSE,
     ylab = "Square root daily mortality counts", type = "n")
for(i in seq(along = dSeas)) {
    y <- unname(unlist(dSeas[[i]]))
    n <- unname(unlist(lapply(dSeas[[i]], length)))
    f <- factor(unlist(mapply(rep, seq(length(n)), n)))
    boxplot(sqrt(y) ~ f, outline = FALSE, add = TRUE,
            at = seq(length(n)) + ((i-1) * 5),
            pars = list(boxwex = .5), lty = 1, axes = FALSE)
}
axis(1, at = 1:50,
     labels = rep(c("Winter", "Spring", "Summer", "Fall", NA), length(dSeas)), 
     tick = FALSE, cex.axis = 0.7)
axis(2)
par(las = 1)
text(seq(2.5, 50, 5), c(8, 10, 8, 11, 10, 10, 10, 10, 10, 10), cityNames)
@ 
\caption{Boxplots of square root daily mortality by season for the 10
largest U.S. cities, 1987--2000.}
\label{mort-top10}
\end{figure}
Each city shows a clear decrease in mortality towards summer and a
peak in the winter.  Figure~\ref{season-region-pm10} shows the mean
daily levels of \PMTen\ by season for all cities in each of the seven
regions of the U.S.  The Southern California, North West, and South
West regions have their highest mean levels of \PMTen\ in the fall
while other regions have their highest levels in the summer.
\begin{figure}[tbh]
  \centering
<<seasonRegionPM10,fig=true,eps=false,results=hide,echo=false,width=10,height=4,echo=false>>=
rm(list = ls(all = TRUE))

library(NMMAPSdata)

load(file.path("data", "CityDataCombined.rda"))
data(cities)
data(regions)
cc <- dget(file.path("data", "cityList.R"))
cc <- cc[!(cc %in% c("hono", "anch"))]

SeasonNames <- c("Winter", "Spring", "Summer", "Fall")
useNames <- paste(rep(cc, 4), rep(SeasonNames, each = length(cc)), sep = ".")
Season <- citydata$Season[1:5114]
regionID <- with(cities, region[match(cc, city)])
regionNames <- c("Industrial Midwest", "Northeast", "Northwest",
                 "Southern California", "Southeast", "Southwest",
                 "Upper Midwest")

pm10 <- pmax(citydata$pm10tmean + citydata$pm10mtrend, 0)
pmCity <- split(pm10, citydata$city)
pmCity <- pmCity[names(pmCity) %in% cc]
pmSeas <- lapply(pmCity, split, f = Season)
pmSeasRegion <- split(pmSeas, regionID)
seasRegion <- lapply(pmSeasRegion, function(reg) {
    l <- lapply(SeasonNames, function(seas) lapply(reg, "[[", seas))
    names(l) <- SeasonNames
    lapply(l, function(x) unlist(x, use.names = FALSE))
})
bstats <- lapply(seasRegion, function(x)
                 sapply(x, function(y) boxplot.stats(y)$stats))
abstats <- array(unlist(bstats), c(5, 4, length(bstats)))
med <- as.vector(rbind(abstats[3, ,], NA))
x <- 1:length(med)

## x11(width = 10, height = 4)
par(mar = c(4, 5, 2, 0) + .1)
par(las = 2)
plot(0, 0, xlim = c(1, 35), ylim = c(0, 125), xaxt = "n", xlab = "Season",
     ylab = expression(paste(PM[10]," level ",bgroup("(",paste(mu,g/m^3),")"))),
     frame.plot = FALSE, type = "n")
for(i in seq(along = seasRegion)) {
    y <- unlist(seasRegion[[i]])
    n <- unlist(lapply(seasRegion[[i]], length))
    f <- factor(unlist(mapply(rep, seq(length(n)), n)))
    boxplot(y ~ f, outline = FALSE, add = TRUE,
            at = seq(length(n)) + ((i-1) * 5),
            pars = list(boxwex = .5), lty = 1, axes = FALSE)    
}
axis(1, at = x, labels = rep(c("Winter", "Spring", "Summer", "Fall", NA), 7),
     tick = FALSE, cex.axis = .7)
text(seq(2.7, 35, 5), c(100, 100, 100, 120, 100, 100, 100), regionNames)
@ 
\caption{Boxplots of regionally averaged daily levels of particulate
matter less than 10~$\mu$m in aerodynamic diameter (\PMTen) by season
for 100 U.S. cities, 1987--2000.}
\label{season-region-pm10}
\end{figure}


The national average estimates of the overall and seasonal short-term
effects of \PMTen\ on mortality for lags 0, 1, and 2 are summarized in
Table~\ref{tab:natlavg}.  These estimates were obtained by pooling the
city-specific maximum likelihood estimates from the main effect and
pollutant-season interaction models according to the hierarchical
normal model.  Across all seasons, we found that the national average
estimate of the effect of \PMTen\ on mortality is largest at lag 1 and
equal to an estimated 0.19 (95\% posterior interval of 0.10, 0.28)
percent increase in mortality per 10~\mgmcube\ increase in \PMTen.
Previous NMMAPS analyses using data from the eight-year period
1987--1994 have reported similar slightly higher national average
estimates for \PMTen\ log relative
rates~\cite{domi:mcde:zege:same:2002,domi:mcder:zege:same:2003}.  For
example, the national average estimate reported
in~\cite{domi:mcder:zege:same:2003} was a 0.22 (0.03, 0.42) percent
increase in mortality for a 10~\mgmcube\ increase in \PMTen.
<<nationalAverageEstimates,results=tex,echo=false>>=
source(file.path("data", "utils.R"))

library(tlnise)
library(xtable)

Seasons <- c("Winter", "Spring", "Summer", "Fall", "All Seasons")
Lags <- paste("Lag", 0:2); exclude <- c("hono", "anch")

## Load non-seasonal estimates 
load(file.path("results", "city-specific-est.pm10.rda"))
results <- lapply(results, function(x) x[setdiff(names(x), exclude)])

## Pool estimates
betacovTotal <- lapply(results, extractBetaCov, pollutant = "pm10")
pooledTotal <- lapply(betacovTotal, poolCoef)

## Load seasonal estimates
load(file.path("results", "seasonal.factor2.lag.012.pm10.rda"))
results <- lapply(results, function(x) x[setdiff(names(x), exclude)])

## Pool estimates by season
pooledSeas <- lapply(results, coefSeasonal, pollutant = "pm10", method = "factor2")
pooled <- lapply(seq(along = pooledSeas), function(i) {
    m <- rbind(pooledSeas[[i]], pooledTotal[[i]])
    rownames(m) <- Seasons
    m
})

## Make table
names(pooled) <- Lags
pm <- do.call("rbind", pooled) * 1000
ppp <- cbind(pm[,1], pm[,1] - 2*pm[,2], pm[,1] + 2*pm[,2])
lf <- gsub("\\$", "", sub("_", " ", LouisFormat(ppp, "confint")))
estmat <- matrix(lf, byrow = TRUE, ncol = 5, dimnames = list(Lags, Seasons))
print(xtable(estmat, caption = "National average estimates and 95\\% posterior intervals of the overall and season-specific effects of \PMTen\ at lags 0, 1, and 2 for 100 cities, 1987--2000.  Estimates were obtained by pooling city-specific coefficients from the main effect and pollutant-season interaction models, respectively, and represent the percent increase in daily mortality for a 10~\mgmcube\ increase in \PMTen.", 
             label = "tab:natlavg", 
             align = c("l", rep("l", 5))))
@ 

For \PMTen\ at lag 1, the estimates for winter, spring, and fall are
similar and equal to 0.15 ($-0.08, 0.39$), 0.14 ($-0.14, 0.42$), and
0.14 ($-0.06, 0.34$), respectively.  The estimate for summer is more
than twice as large at 0.36 ($0.11, 0.61$).  \PMTen\ at lag 0 appears
to have a larger effect in the spring and much smaller effects in the
other seasons.  In addition, estimates for lag 0 have a much larger
between season difference (e.g. spring and winter) than those of lag
1.  The estimates for lag 2 are generally smaller than those of lag 0
or 1 and, given the size of the posterior intervals, do not vary much
across seasons.

Regional differences in the seasonal patterns of the \PMTen\ relative
rates were explored by including a region indicator variable in the
second stage of the hierarchical model.  For \PMTen\ at lag 1,
Figure~\ref{periodic-seasonal} shows the results of estimating
separate seasonal trends from the sine/cosine model for the seven
regions of the U.S.  The Industrial Midwest and the North East have
seasonal trends characterized as being lower in the winter and higher
in the summer.  In Southern California there is a larger effect (0.5
percent increase in mortality per 10~\mgmcube\ increase in \PMTen)
that is constant all year.  The effect of \PMTen\ is close to zero all
year round in the North West, South East, South West, and the Upper
Midwest, but the North West experiences a slight increase during the
summer months.  With the exception of Southern California, all regions
have a smaller effect in the winter months.  Seasonal analyses for
mortality due cardiovascular and respiratory diseases (not shown)
provided results that are qualitatively similar to those for total
non-accidental mortality, with larger summer effects in the Industrial
Midwest and the North East regions, as well as overall for the entire
U.S.
\begin{figure}[tbh]
  \centering
<<periodicSeasonByRegion,fig=true,eps=false,echo=false,height=5,width=8,results=hide>>=
rm(list = ls(all = TRUE))

library(NMMAPSdata)
library(tlnise)
library(tsModelSpec)
source(file.path("data", "model-fitting.R"))
source(file.path("data", "utils.R"))

exclude <- c("anch", "hono")

## Load the results
load(file.path("results", "seasonal.periodic.lag.012.dfSeas.1.rda"))
r <- results[[2]][setdiff(names(results[[2]]), exclude)]  ## Lag 1

## Get region indicators for each city
data(cities)
cityRegion <- with(cities, region[match(names(r), city)])
regionNames <- sort(unique(cityRegion))
regionInd <- 1 * outer(cityRegion, regionNames, "==")
colnames(regionInd) <- regionNames

ndays <- 365
x <- 1:ndays
B <- harmonic(x, 1, ndays, intercept = TRUE)

## Do pooling
bc <- extractBetaCov(r, pollutant = "pm10")
initTLNise()
g <- tlnise(bc$beta, bc$cov, regionInd, prnt = FALSE, labelY = names(r),
            intercept = FALSE, seed = 54321)
pooledRegion <- matrix(g$gamma[,1], byrow = TRUE, ncol = ncol(regionInd),
                       nrow = nrow(g$theta))
pooledRegionSD <- matrix(g$gamma[,2], byrow = TRUE, ncol = ncol(regionInd),
                         nrow = nrow(g$theta))
b <- B %*% pooledRegion
zero <- matrix(0, nrow = ndays, ncol = 7 * 3)
curveRegion <- lapply(1:ncol(pooledRegion), function(i) {    
    B <- zero
    ## This is weird
    B[, seq(i, 7 * 3, 7)] <- harmonic(x, 1, ndays, intercept = TRUE)
    s <- sqrt(diag(B %*% g$Dgamma %*% t(B)))
    cbind(b[,i], b[,i] - 2*s, b[,i] + 2*s)
})

## Overall estimate
initTLNise()
g <- tlnise(bc$beta, bc$cov, rep(1, NROW(bc$beta)), seed = 12345, prnt = FALSE)
V <- diag(B %*% g$Dgamma %*% t(B))
b <- B %*% g$gamma[,1]
curve <- cbind(b, b - 2*sqrt(V), b + 2*sqrt(V))

## Plot
regionFullNames <- c("Industrial Midwest", "Northeast", "Northwest",
                     "Southern California", "Southeast", "Southwest",
                     "Upper Midwest")
Y <- rbind(do.call("rbind", curveRegion), curve) * 1000
conf <- Y[,2:3]
rng <- range(Y) + c(-1, 1) * .05
y <- Y[,1]
nRegions <- length(curveRegion) + 1
x <- rep(1:ndays, nRegions)
g <- ordered(rep(c(regionFullNames, "All Regions"), each = ndays),
             levels = c(regionFullNames, "All Regions"))

library(lattice)
ltheme <- canonical.theme(color = FALSE)     ## in-built B&W theme  
ltheme$strip.background$col <- "transparent" ## change strip bg  
lattice.options(default.theme = ltheme)      ## set as default  

## trellis.device(height = 5, width = 8, color = FALSE)
p <- xyplot(y ~ x | g, as.table = TRUE, subscripts = TRUE, type = "l",
            panel = function(x, y, subscripts, ...) {
                llines(x, Y[subscripts, 2], lty = 3, lwd = 2)
                llines(x, Y[subscripts, 3], lty = 3, lwd = 2)
                panel.xyplot(x, y, ...)
                panel.abline(h = 0, lty = 2)
            },
            strip = strip.custom(bg = 0),
            ylim = rng, layout = c(4, 2),
            ylab = list(expression(paste("% increase in mortality with ",
                "10-", mu, "g/", m^3," increase in ", PM[10])), cex = 0.9),
            xlab = "Day in year",
            scales = list(alternating = 1, tck = c(0.8, 0))
            )
print(p)
@ 
\caption{National and regional smooth seasonal effects of \PMTen\
(particulate matter less than 10~$\mu$m in aerodynamic diameter) at
lag 1 for 100 U.S. cities, 1987--2000.  Estimates were obtained by
pooling city-specific coefficients from the sine/cosine model
(equation~\ref{model:periodic}).  Dotted lines indicate pointwise 95\%
posterior intervals.}
\label{periodic-seasonal}
\end{figure}


Figure~\ref{postbeta-contour} shows samples from the joint posterior
distributions of the regionally and nationally pooled harmonic
coefficients $\beta_1$ and $\beta_2$ in the sine/cosine model for
\PMTen\ at lag 1.  The region with the strongest evidence of a
seasonal pattern is the North East with a marginal posterior
$\text{Prob}(\beta_2 > 0\mid\text{data}) = 0.94$.  There is moderate
evidence of seasonality in the Industrial Midwest and the North West
with $\text{Prob}(\beta_2 > 0\mid\text{data}) = 0.83$ and $0.74$,
respectively.  The joint distributions of the coefficients for the
South East, South West, Upper Midwest, and Southern California regions
are centered at zero, indicating a lack of any seasonal variation.  At
the national level, the marginal posterior $\text{Prob}(\beta_2 >
0\mid\text{data}) = 0.91$, while the marginal distribution for
$\beta_1$ is centered almost exactly around zero.  \PMTen\ at lag 0
shows slightly more evidence of seasonality for the national average.
However, the overall short-term effect of \PMTen\ at lag 0 is smaller
on average, as indicated in Table~\ref{tab:natlavg}.  There is little
evidence of seasonal variation of the short-term effect of \PMTen\ at
lag 2.
\begin{figure}[tbh]
  \centering
<<periodicPosteriorBeta,fig=true,eps=false,echo=false,height=5,width=8,results=hide>>=
rm(list = ls())

library(NMMAPSdata)
source(file.path("data", "utils.R"))
library(tlnise)
library(ellipse)
library(MASS)

exclude <- c("anch", "hono")

## Load the results
load(file.path("results", "seasonal.periodic-ortho.lag.012.pm10.rda"))
r <- results[[2]][setdiff(names(results[[2]]), exclude)] 

## Get region indicators for each city
data(cities)
cityRegion <- with(cities, region[match(names(r), city)])
regionNames <- sort(unique(cityRegion))
regionInd <- 1 * outer(cityRegion, regionNames, "==")
colnames(regionInd) <- regionNames

## Pool estimates
bc <- extractBetaCov(r, pollutant = "pm10")
initTLNise()
g <- tlnise(bc$beta, bc$cov, regionInd, prnt = FALSE, labelY = names(r),
            intercept = FALSE, seed = 10)

## Get sample from posterior distribution
set.seed(20)
gs <- tlnise:::sampleGamma(g, bc$cov, bc$beta, regionInd)
set.seed(30)
ds <- tlnise:::drawSample0(gs, n = 1500)
ads <- array(t(ds), c(7, 3, nrow(ds)))

## Overall estimate
initTLNise()
g0 <- tlnise(bc$beta, bc$cov, rep(1, nrow(bc$beta)), seed = 100, prnt = FALSE)
set.seed(200)
gs0 <- tlnise:::sampleGamma(g0, bc$cov, bc$beta)
set.seed(300)
ds0 <- tlnise:::drawSample0(gs0, n = 1500)

## Compute posterior probs.
postprob1 <- matrix(nrow = 8, ncol = 2)
postprob2 <- matrix(nrow = 8, ncol = 2)
n <- dim(ads)[3]

for(i in 1:7) {
    postprob1[i, ] <- c(sum(ads[i, 2, ] > 0) / n, sum(ads[i, 2, ] < 0) / n)
    postprob2[i, ] <- c(sum(ads[i, 3, ] > 0) / n, sum(ads[i, 3, ] < 0) / n)
}
postprob1[8, ] <- c(sum(ds0[, 2] > 0)/nrow(ds0), sum(ds0[, 2] < 0)/nrow(ds0))
postprob2[8, ] <- c(sum(ds0[, 3] > 0)/nrow(ds0), sum(ds0[, 3] < 0)/nrow(ds0))

## Draw ellipses
x1 <- NULL
y1 <- NULL
x2 <- NULL
y2 <- NULL

for(i in 1:7) {
    V <- cov(t(ads[i, -1, ]))
    m <- colMeans(t(ads[i, -1, ]))
    e <- ellipse(V, centre = m, level = c(.75, .95))
    e1 <- t(matrix(e[,1], nrow = 2))
    e2 <- t(matrix(e[,2], nrow = 2))
    x1 <- c(x1, e1[,1])
    y1 <- c(y1, e2[,1])
    x2 <- c(x2, e1[,2])
    y2 <- c(y2, e2[,2])    
}
V0 <- cov(ds0[, -1])
m0 <- colMeans(ds0[, -1])
e <- ellipse(V0, centre = m0, level = c(.75, .95))
e1 <- t(matrix(e[,1], nrow = 2))
e2 <- t(matrix(e[,2], nrow = 2))
x1 <- c(x1, e1[,1])
y1 <- c(y1, e2[,1])
x2 <- c(x2, e1[,2])
y2 <- c(y2, e2[,2])

b <- vector("list", length = 8)

## Only plot a few points
set.seed(1025301)
for(i in 1:7) {
    b[[i]] <- t(ads[i, 2:3, sample(1500, 100)])
}
set.seed(20)
b[[8]] <- ds0[sample(1500, 15), 2:3]

regionFullNames <- c("Industrial Midwest", "Northeast", "Northwest",
                     "Southern California", "Southeast", "Southwest",
                     "Upper Midwest", "All Regions")
f <- gl(8, 50, ordered = TRUE)
levels(f) <- regionFullNames

library(lattice)
ltheme <- canonical.theme(color = FALSE)     ## in-built B&W theme  
ltheme$strip.background$col <- "transparent" ## change strip bg  
lattice.options(default.theme = ltheme)      ## set as default  

## trellis.device(dev = x11, height = 5, width = 8, color = FALSE)

yrng <- c(-.1, .1)
xrng <- c(-.1, .09)
p <- xyplot(y2 ~ x2 | f, subscripts = TRUE, type = "l", lty = "43", lwd = 1.5,
            panel = function(x, y, subscripts, ...) {
                i <- (subscripts[1] - 1) / 50 + 1
                lpoints(b[[i]], pch = 1, cex = 0.5) ## col = gray(0.5))
                panel.xyplot(x, y, ...)
                panel.abline(h = 0, v = 0, lty = 3, lwd = 1)
                llines(x1[subscripts], y1[subscripts], lty = 1, col = 1, lwd = 1.5)
                ltext(-0.05, -0.075,
                      labels = eval(substitute(expression(paste(italic(p), "(", beta[2] > 0, ")") == k), list(k = formatC(round(postprob2[i, 1], 2), digits = 2, flag = "#")))), cex = 0.7)
                ltext(-0.05, -0.09,
                      labels = eval(substitute(expression(paste(italic(p), "(", beta[1] > 0, ")") == k), list(k = formatC(round(postprob1[i, 1], 2), digits = 2, flag = "#")))), cex = 0.7)
            }, as.table = TRUE, ylim = yrng, aspect = 1, xlim = xrng,
            xlab = expression(beta[1]), ylab = expression(beta[2]),
            scales = list(cex = 0.8, alternating = 1, tck = c(0.8, 0)),
            par.strip.text = list(cex = 0.8),
            strip = strip.custom(bg = 0)
            )
print(p)
@ 
\caption{Samples from the national and regional joint posterior
distributions of the pooled coefficients $\beta_1$ and $\beta_2$ from
sine/cosine seasonal model (equation~\ref{model:periodic}) for \PMTen\
at lag 1, 100 U.S. cities, 1987--2000.  The solid and dashed lines
indicate the 75\% and 95\% regions for the joint posterior
distribution of $\beta_1$ and $\beta_2$, given the data.  Each panel
includes the marginal posterior probabilities of each coefficient
being greater than~0.  Posterior probabilities closer to 1 indicate
stronger evidence of seasonal patterns.}
\label{postbeta-contour}
\end{figure}


\subsection*{Sensitivity analyses}

We performed several additional analyses to explore the sensitivity of
the estimated seasonal \PMTen\ log relative rates to model
specification.  Specifically, we examined sensitivity to (1)
adjustment for long-term trends and seasonality in mortality; (2) the
inclusion of other pollutants; (3) the exposure lag; and (4) the
specification of the temperature component.

Selecting the degrees of freedom of the smooth function of time used
to control for long-term trends and seasonality is an important issue
in time series models of air pollution and mortality because estimates
of pollution coefficients can change considerably depending on the
specification of the number of degrees of
freedom~\citep{schw:1994,schw:1994a,toul:atki:lete:2004}. Our original
model used a natural cubic spline with 7 degrees of freedom per year
of data.  For \PMTen\ at lag 1, Figure~\ref{periodic-dfTime-sens}
shows the sensitivity of the sine/cosine model to using 3, 5, 7, 9,
and 11 degrees of freedom per year in the smooth function of time.
With only 3 degrees of freedom per year the curves deviate
considerably from those in Figure~\ref{periodic-seasonal}, e.g. the
estimate for Southern California exhibits much more seasonal
variation.  However, these deviations more likely reflect a lack of
adjustment in the model rather than a real seasonal change.  With more
aggressive control for seasonality and long-term trends the estimates
appear to be stable.
\begin{figure}[tbh]
  \centering
<<periodicDfTimeSensitivity,echo=false,results=hide,fig=true,eps=false,height=5,width=8>>=
rm(list = ls())

library(NMMAPSdata)
library(tsModelSpec)
library(tlnise)

exclude <- c("anch", "hono")

load("results/seasonal.periodic.sens.lag.012.df.1.pm10.rda")
r <- results[[2]][setdiff(names(results[[2]]), exclude)]  ## Lag 1

## Get region indicators for each city
data(cities)
cityRegion <- with(cities, region[match(names(r), city)])
regionNames <- sort(unique(cityRegion))
regionInd <- 1 * outer(cityRegion, regionNames, "==")
colnames(regionInd) <- regionNames

ndays <- 365
x <- 1:ndays
B <- harmonic(x, 1, ndays, intercept = TRUE)
dfnames <- paste("df", seq(3, 11, 2), sep = "")

source(file.path("data", "utils.R"))

curveList <- lapply(seq(along = dfnames), function(i) {
    d <- lapply(r, "[[", dfnames[i])
    bc <- extractBetaCov(d, "pm10")
    initTLNise()
    g <- tlnise(bc$beta, bc$cov, regionInd, prnt = FALSE, labelY = names(r),
                intercept = FALSE, seed = i * 10000)
    pooledRegion <- matrix(g$gamma[,1], byrow = TRUE, ncol = ncol(regionInd),
                           nrow = nrow(g$theta))
    pooledRegionSD <- matrix(g$gamma[,2], byrow = TRUE, ncol = ncol(regionInd),
                             nrow = nrow(g$theta))
    b <- B %*% pooledRegion
    zero <- matrix(0, nrow = ndays, ncol = 7 * 3)
    curveRegion <- lapply(1:ncol(pooledRegion), function(i) {
        B <- zero
        ## This is weird
        B[, seq(i, 7 * 3, 7)] <- harmonic(x, 1, ndays, intercept = TRUE)
        s <- sqrt(diag(B %*% g$Dgamma %*% t(B)))
        cbind(b[,i], b[,i] - 2*s, b[,i] + 2*s)
    })
    g <- poolCoef(bc, method = "tlnise", extractors = list(function(x) x))[[1]]
    V <- diag(B %*% g$Dgamma %*% t(B))
    b <- B %*% g$gamma[,1]
    curve <- cbind(b, b-2*sqrt(V), b+2*sqrt(V))
    append(curveRegion, list(curve))
})

regionFullNames <- c("Industrial Midwest", "Northeast", "Northwest",
                     "Southern California", "Southeast", "Southwest",
                     "Upper Midwest", "All Regions")

curve <- unlist(lapply(curveList, function(x) {
    as.vector(sapply(x, function(y) y[,1]))
})) * 1000
lo <- unlist(lapply(curveList, function(x) {
    as.vector(sapply(x, function(y) y[,2]))
})) * 1000
hi <- unlist(lapply(curveList, function(x) {
    as.vector(sapply(x, function(y) y[,3]))
})) * 1000
xx <- rep(x, 8 * length(dfnames))
f <- ordered(rep(rep(regionFullNames, each = 365), length(dfnames)),
             levels=regionFullNames)
g <- ordered(rep(dfnames, each = 365 * 8), levels = dfnames)
rng <- range(cbind(curve, lo, hi))

library(lattice)
ltheme <- canonical.theme(color = FALSE)     ## in-built B&W theme  
ltheme$strip.background$col <- "transparent" ## change strip bg  
lattice.options(default.theme = ltheme)      ## set as default  

## trellis.device(dev = x11, height = 5, width = 8, color = FALSE)
l <- trellis.par.get("superpose.line")
l$lty <- c(2,3,1,4,5,6,7)
l$lwd <- c(1,1,3,1,1,1,1)
trellis.par.set("superpose.line", l)

p <- xyplot(curve ~ xx | f, groups = g, type = "l", as.table = TRUE, ylim=rng,
            xlab = "Day in year",
            ylab = expression(paste("% increase in mortality with ",
                "10-", mu, "g/", m^3, " increase in ", PM[10])),
            panel = function(x, y, subscripts, groups, ...) {
                panel.superpose(x, y, subscripts, groups, ...)
                panel.abline(h = 0, lty = 1)
            },
            strip = strip.custom(bg = 0),
            layout = c(4, 2),
            scales = list(alternating = 1, tck = c(0.8, 0), x = list(cex=0.8))
            )
print(p)
@ 
\caption{Sensitivity of national and regional estimates of smooth
seasonal effects for \PMTen\ at lag 1 to the degrees of freedom
assigned to the smooth function of time, 100 U.S. cities, 1987--2000.
The degrees of freedom chosen were 3 (short dashed), 5 (dotted), 7
(solid), 9 (dot-dashed), and 11 (long dashed) degrees of freedom per
year of data.}
\label{periodic-dfTime-sens}
\end{figure}


Table~\ref{tab:multipollutant} shows the sensitivity of the lag 1
\PMTen\ log relative rate as other pollutants are included in the
pollutant-season interaction model.  The seasonal national average
estimates exhibit the same pattern when either current day SO$_2$,
O$_3$, or NO$_2$ are included as copollutants in the model.  With
SO$_2$ or NO$_2$ included, the summer effect for \PMTen\ increases
slightly over the single pollutant estimate obtained with a restricted
list of 45 cities.  The inclusion of O$_3$ appears to attenuate the
effect somewhat.  Note that the lack of data for the other pollutants
reduced the number of cities available for the copollutant analyses.
<<copollutantModelTable,results=tex,echo=false>>=
rm(list = ls(all = TRUE))

exclude <- c("anch", "hono")

library(tlnise)
source(file.path("data", "utils.R"))

## No copollutants
load(file.path("results", "seasonal.factor2.lag.012.pm10.rda"))
bc <- extractBetaCov(subset(results[[2]],!(names(results[[2]]) %in% exclude)),"pm10")
pNoneall <- poolCoef(bc) * 1000

## Load copollutant model results
load(file.path("results", "seasonal.none.stepfun.pm10.lag.1.rda"))
bc <- extractBetaCov(results, "pm10")
initTLNise()
g <- tlnise(bc$beta, bc$cov, rep(1, nrow(bc$beta)), seed = 10, prnt = FALSE)
pNone <- g$gamma[, 1:2] * 1000

load(file.path("results", "seasonal.so2.stepfun.pm10.lag.1.rda"))
bc <- extractBetaCov(results, "pm10")
initTLNise()
g <- tlnise(bc$beta, bc$cov, rep(1, nrow(bc$beta)), seed = 100, prnt = FALSE)
pSO2 <- g$gamma[, 1:2] * 1000

load(file.path("results", "seasonal.o3.stepfun.pm10.lag.1.rda"))
bc <- extractBetaCov(results, "pm10")
initTLNise()
g <- tlnise(bc$beta, bc$cov, rep(1, nrow(bc$beta)), seed = 1000, prnt = FALSE)
pO3 <- g$gamma[, 1:2] * 1000

load(file.path("results", "seasonal.no2.stepfun.pm10.lag.1.rda"))
bc <- extractBetaCov(results, "pm10")
initTLNise()
g <- tlnise(bc$beta, bc$cov, rep(1, nrow(bc$beta)), seed = 200, prnt = FALSE)
pNO2 <- g$gamma[, 1:2] * 1000

combined <- rbind(pNoneall, pNone, pSO2, pO3, pNO2)

## Make table
ppp <- cbind(combined[,1], combined[,1] - 2*combined[,2],
             combined[,1] + 2*combined[,2])
mest <- matrix(gsub("\\$", "", sub("_", " ", LouisFormat(ppp, type = "confint"))), 
               byrow = TRUE, ncol = 4)
dimnames(mest) <- list(c("\\PMTen\\ only (100 cities)",
                         "\\PMTen\\ only (restricted)", 
                         "with SO$_2$ (restricted)",
                         "with O$_3$ (restricted)",
                         "with NO$_2$ (restricted)"
                         ),
                       c("Winter", "Spring", "Summer", "Fall"))
                       
print(xtable(mest, caption = "National average estimates and 95\\% posterior intervals of season-specific lag 1 \PMTen\ log relative rates adjusted for other pollutants.  Estimates represent the percent increase in daily mortality for a 10~\mgmcube\ increase in \PMTen.  Results for ``45 cities'' were obtained by pooling coefficients from the \textit{same} list of 45 cities for which measurements for all copollutants were simultaneously available.", 
             label = "tab:multipollutant",
             align = c("l", rep("l", 4))))
@ 

Figure~\ref{periodic-lags012} shows the region-specific seasonal
trends for \PMTen\ at lags 0, 1, and 2 with 95\% posterior regions for
lag 1.  The lag 0 seasonal trend for each region has a similar pattern
to the lag 1 trend but is lower in general.  In the North East and
Industrial Midwest, the peak in the seasonal trend for lag 0 appears
to come in late May while the peak for lag 1 comes in mid-July.  The
North West, South East, and Upper Midwest exhibit small changes in
seasonal patterns across lags but remain largely flat.  Southern
California and the South West appear to pick up slightly stronger
seasonal patterns in lags 0 and 2.  
\begin{figure}[tbh]
  \centering
<<periodicLags012,fig=true,eps=false,results=hide,echo=false,height=5,width=8>>=
## Periodic Seasonal by Region (all lags at once)
rm(list = ls())

source(file.path("data", "utils.R"))

exclude <- c("anch", "hono")

## Load the results
load(file.path("results", "seasonal.periodic.lag.012.dfSeas.1.rda"))
results <- lapply(results, function(r) r[setdiff(names(r), exclude)])

## Get region indicators for each city
data(cities)
cityRegion <- with(cities, region[match(names(results[[1]]), city)])
regionNames <- sort(unique(cityRegion))
regionInd <- 1 * outer(cityRegion, regionNames, "==")
colnames(regionInd) <- regionNames

ndays <- 365
x <- 1:ndays
B <- harmonic(x, 1, ndays, intercept = TRUE)

bc <- lapply(results, extractBetaCov, pollutant = "pm10")
set.seed(100302)
gg <- lapply(bc, function(x) {
    initTLNise()
    tlnise(x$beta, x$cov, regionInd, prnt = FALSE, intercept = FALSE,
           seed = sample(100000, 1))
})
pooledRegion <- lapply(gg, function(g) {
    matrix(g$gamma[,1], byrow = TRUE, ncol = ncol(regionInd),
           nrow = nrow(g$theta))
})
pooledRegionSD <- lapply(gg, function(g) {
    matrix(g$gamma[,2], byrow = TRUE, ncol = ncol(regionInd),
           nrow = nrow(g$theta))
})
b <- lapply(pooledRegion, function(p) B %*% p)
zero <- matrix(0, nrow = ndays, ncol = 7 * 3)
curveRegion <- lapply(seq(along = pooledRegion), function(j) {
    pr <- pooledRegion[[j]]
    lapply(1:ncol(pr), function(i) {
        B <- zero
        ## This is weird
        B[, seq(i, 7 * 3, 7)] <- harmonic(x, 1, ndays, intercept = TRUE)
        s <- sqrt(diag(B %*% gg[[j]]$Dgamma %*% t(B)))
        cbind(b[[j]][,i], b[[j]][,i] - 2*s, b[[j]][,i] + 2*s)
    })
})

## Overall estimate
set.seed(10321342)
gg <- lapply(bc, function(b) {
    initTLNise()
    tlnise(b$beta, b$cov, rep(1, NROW(b$beta)), seed = sample(100000, 1),
           prnt = FALSE)
})
V <- lapply(gg, function(g) diag(B %*% g$Dgamma %*% t(B)))
b <- lapply(gg, function(g) B %*% g$gamma[,1])
curve <- lapply(seq(along = b), function(i) {
    cbind(b[[i]], b[[i]] - 2*sqrt(V[[i]]), b[[i]] + 2*sqrt(V[[i]]))
})

curveTotal <- curveRegion
for(i in seq(along = curveTotal)) {
    curveTotal[[i]] <- append(curveTotal[[i]], list(curve[[i]]))
}

regionFullNames <- c("Industrial Midwest", "Northeast", "Northwest",
                     "Southern California", "Southeast", "Southwest",
                     "Upper Midwest", "All Regions")

y0 <- do.call("rbind", curveTotal[[1]])[,1] * 1000
y1 <- do.call("rbind", curveTotal[[2]])[,1] * 1000
conf1 <- do.call("rbind", curveTotal[[2]])[, 2:3] * 1000
conf1 <- rbind(conf1, conf1, conf1)
y2 <- do.call("rbind", curveTotal[[3]])[,1] * 1000
rng <- range(unlist(curveTotal)) * 1000 + c(-1,1) * .07
xx <- rep(x, length(regionFullNames))
f <- ordered(rep(regionFullNames, each = 365), levels = regionFullNames)

## trellis.device(dev = x11, height = 5, width = 8, color = FALSE)

l <- trellis.par.get("superpose.line")
l$lty <- c(2, 1, 4)
l$lwd <- c(1, 1.5, 1)
trellis.par.set("superpose.line", l)

p <- xyplot(y0 + y1 + y2 ~ xx | f, allow.multiple = TRUE, type = "l",
            ylim = rng, layout = c(4, 2), as.table = TRUE,
            panel = function(x, y, subscripts, groups, ...) {
                s <- subscripts[1:365]
                xx <- x[1:365]
                llines(xx, conf1[s, 1], lty = 1, lwd = 1.5)
                llines(xx, conf1[s, 2], lty = 1, lwd = 1.5)
                panel.superpose(x, y, subscripts, groups, ...)
                panel.abline(h = 0, lty = 5)
            }, 
            xlab = "Day in year",
            strip = strip.custom(bg = 0),
            scales = list(alternating = 1, tck = c(0.8, 0), x = list(cex=.8)),
            ylab = expression(paste("% increase in mortality with ",
                "10-", mu, "g/", m^3, " increase in ", PM[10]))
            )
print(p)
@ 
\caption{Sensitivity of national and regional estimates of smooth
seasonal effects to \PMTen\ exposure lag, 100 U.S. cities, 1987--2000.
Solid black lines indicate the log relative rate estimate and
pointwise 95\% posterior intervals for \PMTen\ at lag 1.  Also shown
are the log relative rate estimates for \PMTen\ at lag 0 (short
dashed) and lag 2 (dot-dashed).}
\label{periodic-lags012}
\end{figure}

To explore sensitivity to temperature and to control for temperature
effects spread out over multiple days, we fit separate models which
included an additional interaction between the current day temperature
and the running mean of lags 1 through 3 as well as a running mean of
lags 1 through 7.  Neither addition to the model made a noticeable
impact on the regional or overall seasonal patterns of the \PMTen\ log
relative rate.


\section*{DISCUSSION}

In this paper we have used a Bayesian semi-parametric hierarchical
model for estimating time-varying effects of air pollution on daily
mortality.  The model combines information across multiple cities to
increase the precision of seasonal relative rate estimates.  We found
seasonal patterns for the national average effect of \PMTen\ at both
lag 0 and lag 1.  Seasonal patterns varied by geographical region with
a strong pattern for lag 1 appearing in the North East region.
Equally interesting was the lack of seasonal variation in the southern
regions of the country.

Understanding the health effects of PM components is an increasingly
important research problem, as noted in~\cite{nrc:2004}.  Exploration
of the spatial-temporal variation of the short-term effects of PM on
mortality is essential to generating (or ruling out) specific
hypotheses about the toxicity of PM components.  Data are becoming
available from the Environmental Protection Agency's PM$_{2.5}$
National Chemical Speciation Network which contain detailed time
series information on the composition of PM~\citep{epa:2003}.
Knowledge of the spatial-temporal patterns of the short-term effects
of PM will be necessary for guiding future analyses of these PM
constituent data.

The modification of short-term effects of pollution by season has been
explored previously in a number of single city studies.  Styer,
et~al.~\citep{stye:mcmi:gao:davi:sack:1995} analyzed data from Cook
County and Salt Lake County and found (for Cook County) that the
effect of \PMTen\ was higher in the spring and fall.  In a review of
research on particulate air pollution and mortality, Moolgavkar and
Luebeck stated that analyses in Steubenville, Philadelphia, and Cook
County indicated that the effects of pollutants are strongly modified
by season~\citep{mool:lueb:1996}.  Kelsall,
et~al.~\citep{kels:same:zege:xu:1997} examined Philadelphia data and
concluded that after adjusting for long-term variation in mortality
and the effects of weather there was little evidence of different
effects by season.  More recently, Moolgavkar~\citep{mool:2003}
analyzed data from Cook County and Los Angeles County and found
between season variation of the effects of numerous pollutants on
daily mortality in both counties.

Estimation of short-term effects of air pollution on daily mortality
for single cities is hampered by the inherent high variability of the
resulting effect estimates.  Estimation of seasonally varying effects
poses an additional challenge because it involves further
stratification of the data.  Since fewer data are available for
estimating season-specific effects, the variability of such estimates
increases, making it difficult to discern any meaningful seasonal
pattern.  Rather, a multi-site approach, where information can be
combined across neighboring cities, can provide more precise
city-specific log relative rates as well as provide a natural
framework for characterizing regional and national trends.  For
example, Moolgavkar~\citep{mool:2003} found an inconsistent
relationship between \PMTen\ and mortality in Los Angeles County when
SO$_2$ was included in the model.  However, we find in
Table~\ref{tab:multipollutant} that the seasonal pattern in the
national average (as well as the Southern California regional average,
not shown) is robust to the inclusion of copollutants.

Regional differences in the short-term effects of \PMTen\ have been
explored in
NMMAPS~\citep{domi:dani:zege:same:2002,domi:mcder:zege:same:2003} and
in the Air Pollution and Health: A European Approach (APHEA)
study~\citep{kats:toul:samo:gryp:2001,samo:schw:wojt:2001}.  Both
studies found regional modification of the effect of \PMTen\ on daily
non-accidental mortality.  The results presented here are consistent
with previous NMMAPS analyses with respect to regional average \PMTen\
effects.  The estimated seasonal patterns for lag 1 appear to have two
distinct shapes.  The Industrial Midwest, North East, and North West
regions all exhibit a larger effect during the summer months while the
other regions exhibit little seasonal variation.  These patterns are
somewhat sensitive to the lag of pollution used. Therefore, an
important question raised by this study is how the total effect of
\PMTen\ in a distributed lags model would vary by
season. Unfortunately, the U.S. pollution database has daily PM levels
for a small fraction of cities making if difficult to answer this
question.

These analyses provide strong evidence that the \PMTen\ log relative
rate is greater in the spring and summer in the northern regions,
particularly in the North East. This result admits several competing
hypotheses. First, the PM constituents may vary by season in these
regions with the most toxic particles having a spring/summer maximum.
A detailed analysis of the regional and seasonal variation in PM
constituents is needed to better understand these patterns.
PM$_{2.5}$ and associated speciation data have only been collected
regularly in the U.S. since 1999.  Therefore, we expect multi-site
time series studies of PM$_{2.5}$ speciation data to be carried out in
the near future.

Second, even if the constituents do not vary substantially, it is
possible that the higher short-term effect of ambient exposure to PM
estimated in spring and summer in the North East regions could be the
result of more time spent outdoors and therefore less exposure
measurement error.  While there has been some work suggesting that
differences between personal and ambient exposure to \PMTen\ can vary
across seasons~\citep{keel:dvon:yip:2002}, there are also data
indicating that individual activity patterns are, on average, very
similar across regions of the
U.S.~\citep{nhaps:1996,klep:nels:ott:robi:tsan:swit:beha:hern:enge:2001,wall:mitc:ocon:2003}.
If the seasonal variation in the \PMTen\ log relative rates were
entirely attributable to activity patterns, then we would expect to
have estimated similar seasonal patterns of PM effects across regions.
However, Southern California exhibited a positive effect that does not
vary by season.

A third possibility is that the particle effect may be swamped by the
more powerful effect of winter infectious diseases so that it can only
be observed when infectious diseases are less prevalent. This
hypothesis does not explain the absence of a \PMTen\ mortality
association in the southern regions where infectious disease incidence
is also seasonal.  Finally, this result may reflect seasonally varying
bias from an, as yet, unidentified source. Having established the
pattern of regional and seasonal variation in the \PMTen\ log relative
rate, a more targeted investigation of possible sources of such bias
is now possible.


\bibliographystyle{aje2} 
\bibliography{seasonal}

\appendix


\clearpage

\setlength{\parindent}{0in}

\section*{TABLE AND FIGURE LEGENDS}

Table 1: Summary statistics for average daily mortality, \PMTen, and
temperature for 100 U.S. cities, 1987--2000.\\

Table 2: National average estimates and 95\% posterior intervals of
the overall and season-specific effects of \PMTen\ at lags 0, 1, and 2
for 100 cities, 1987--2000.  Estimates were obtained by pooling
city-specific coefficients from the main effect and pollutant-season
interaction models, respectively, and represent the percent increase
in daily mortality for a 10~\mgmcube\ increase in \PMTen.\\

Table 3: National average estimates and 95\% posterior intervals of
season-specific lag 1 \PMTen\ log relative rates adjusted for other
pollutants.  Estimates represent the percent increase in daily
mortality for a 10~\mgmcube\ increase in \PMTen.  Results for ``45
cities'' were obtained by pooling coefficients from the \textit{same}
list of 45 cities for which measurements for all copollutants were
simultaneously available.\\

Figure 1: Boxplots of square root daily mortality by season for the 10
largest U.S. cities, 1987--2000.\\

Figure 2: Boxplots of regionally averaged daily levels of particulate
matter less than 10~$\mu$m in aerodynamic diameter (\PMTen) by season
for 100 U.S. cities, 1987--2000.\\

Figure 3: National and regional smooth seasonal effects of \PMTen\
(particulate matter less than 10~$\mu$m in aerodynamic diameter) at
lag 1 for 100 U.S. cities, 1987--2000.  Estimates were obtained by
pooling city-specific coefficients from the sine/cosine model
(equation~\ref{model:periodic}).  Dotted lines indicate pointwise 95\%
posterior intervals.\\

Figure 4: Samples from the national and regional joint posterior
distributions of the pooled coefficients $\beta_1$ and $\beta_2$ from
sine/cosine seasonal model (equation~\ref{model:periodic}) for \PMTen\
at lag 1, 100 U.S. cities, 1987--2000.  The solid and dashed lines
indicate the 75\% and 95\% regions for the joint posterior
distribution of $\beta_1$ and $\beta_2$, given the data.  Each panel
includes the marginal posterior probabilities of each coefficient
being greater than~0.  Posterior probabilities closer to 1 indicate
stronger evidence of seasonal patterns.\\

Figure 5: Sensitivity of national and regional estimates of smooth
seasonal effects for \PMTen\ at lag 1 to the degrees of freedom
assigned to the smooth function of time, 100 U.S. cities, 1987--2000.
The degrees of freedom chosen were 3 (short dashed), 5 (dotted), 7
(solid), 9 (dot-dashed), and 11 (long dashed) degrees of freedom per
year of data.\\

Figure 6: Sensitivity of national and regional estimates of smooth
seasonal effects to \PMTen\ exposure lag, 100 U.S. cities, 1987--2000.
Solid black lines indicate the log relative rate estimate and
pointwise 95\% posterior intervals for \PMTen\ at lag 1.  Also shown
are the log relative rate estimates for \PMTen\ at lag 0 (short
dashed) and lag 2 (dot-dashed).\\

\clearpage

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\end{document}