Time to Vaccinate

bayes time series public health Covid-19

Using Bayesian Structural Times Series to estimate when some North Carolina counties will be vaccinated to a sufficient number.

Michael DeWitt https://michaeldewittjr.com
2021-05-07

This post is completely inspired by [@mjskay](https://twitter.com/mjskay) ’s very interesting analysis/ critique of a New York Times linear extrapolation of when the United States would reach the critical proportion for vaccination. The New York Times has made some amazing graphics during the pandemic, but their analyses have been pretty spotty. Between this linear extrapolation of vaccination rates and their SIRs for third and fourth waves, they really need to put some additional thought into the more math heavy type work.

Matt uses Bayesian Structural Time Series implemented in the bsts package(which I really like) in order to make a more nuanced analysis. Additionally, because we know that the percent vaccinated is monotonically increasing and bounded between 0 and 1, we can make some transforms to our data.

North Carolina

I along with some work colleauges have been maintaining a package called nccovid (available at remotes::install_github("conedatascience/nccovid)) which allows for ready access to North Carolina COVID-19 related data. This does not represent the views of my employer.

Data Prep

First I am going to pull in the needed packages:

Show code

And now we can pull in the North Carolina data:

Show code
dat <- nccovid::get_vaccinations()

nc_oa <- dat[,.(people_partial_vax = sum(people_partial_vax, na.rm = TRUE),
                                         population = sum(population, na.rm = TRUE)), by = "date"] %>% 
  .[,vax:=people_partial_vax/population] %>% 
  .[order(date)]

It’s good to plot these data just to make sure they “feel” right.

Show code
nc_oa %>% 
  ggplot(aes(date, vax))+
  geom_line()+
  labs(
    title = "Cumulative Percent of North Carolinians Vaccinated",
    caption = "Data: North Carolina Department of Health and Human Services\nAnalysis: @medewittjr",
    y = "Percent Vaccinated"
  )+
  scale_y_continuous(labels = scales::percent)+
  geom_smooth(method = "lm", color = "red",alpha = .1, lty = "dashed")
Running Percentage Vaccinated

Figure 1: Running Percentage Vaccinated

So we can see that linear fit for North Carolina likely isn’t valid, at least not in the last few weeks. Regardless, we have out data for the next steps of the analysis.

Data Transformation

As I mentioned earlier and a critical point of this analysis is that we know a few things about vaccinations. Firstly, vaccinations are non-reversible and are thus monotonically increasing (e.g., they can only increase). Additionally, depending on the definition you take, the percentage of the population vaccinated with a first dose is bound between 0 and 1.

With these features in mind, we can use a logit transform on our vaccinations and then fit that on our data.

Show code
nc_log_diff <- nc_oa %>%
  group_by(people_partial_vax) %>% 
  mutate(id = row_number()) %>% 
  filter(id == min(id)) %>% 
  dplyr::select(-id) %>% 
  ungroup() %>% 
  mutate(log_diff_vax = c(NA, log(diff(qlogis(vax))))) %>%
  slice(-1)

Now we can fit the model using the priors that Matt used.

Show code
fit = with(nc_log_diff, bsts(log_diff_vax, 
  state.specification = list() %>%
    AddSemilocalLinearTrend(log_diff_vax,
      level.sigma.prior = SdPrior(0.5, 1),
      slope.mean.prior = NormalPrior(0,0.5),
      initial.level.prior = NormalPrior(0,0.5),
      initial.slope.prior = NormalPrior(0,0.5),
      slope.sigma.prior = SdPrior(0.5, 1),
      slope.ar1.prior = Ar1CoefficientPrior(0, 0.5)
    ) %>%
    AddSeasonal(log_diff_vax, 7, 
      sigma.prior = SdPrior(0.5, 1)
    ),
  prior = SdPrior(0.5, 1),
  niter = 40000,
  seed = 336 
))
=-=-=-=-= Iteration 0 Sat May  8 00:37:50 2021 =-=-=-=-=
=-=-=-=-= Iteration 4000 Sat May  8 00:37:55 2021 =-=-=-=-=
=-=-=-=-= Iteration 8000 Sat May  8 00:38:01 2021 =-=-=-=-=
=-=-=-=-= Iteration 12000 Sat May  8 00:38:07 2021 =-=-=-=-=
=-=-=-=-= Iteration 16000 Sat May  8 00:38:12 2021 =-=-=-=-=
=-=-=-=-= Iteration 20000 Sat May  8 00:38:18 2021 =-=-=-=-=
=-=-=-=-= Iteration 24000 Sat May  8 00:38:24 2021 =-=-=-=-=
=-=-=-=-= Iteration 28000 Sat May  8 00:38:29 2021 =-=-=-=-=
=-=-=-=-= Iteration 32000 Sat May  8 00:38:35 2021 =-=-=-=-=
=-=-=-=-= Iteration 36000 Sat May  8 00:38:41 2021 =-=-=-=-=

Diagnostics (that I did not know could be leveraged from posterior. just a really great package).

Show code
draws <- as_draws(do.call(cbind, fit[startsWith(names(fit), "trend") | startsWith(names(fit), "sigma")]))
summary(draws, median, mad, quantile2, default_convergence_measures()) %>% 
  knitr::kable(digits = 3)
variable median mad q5 q95 rhat ess_bulk ess_tail
sigma.obs 0.377 0.098 0.231 0.548 1.000 1614.460 2614.554
trend.level.sd 0.259 0.064 0.173 0.387 1.000 2493.318 5814.679
trend.slope.mean -0.069 0.042 -0.141 0.002 1.000 8537.857 14278.566
trend.slope.ar.coefficient -0.667 0.190 -0.893 -0.132 1.001 2386.838 2985.400
trend.slope.sd 0.365 0.094 0.234 0.541 1.000 1864.859 4510.872
sigma.seasonal.7 0.240 0.060 0.162 0.363 1.001 2005.757 4519.323

All of the Gelman-Rubin statistics look good as well as the effective sample sizes, so I’m happy with what I see. However, it is always good to visualize the outputs as well:

Show code
bayesplot::mcmc_trace(draws)

Trace plots look pretty good as well.

Predictions

Now we can predict out with our model for 180 days.

Show code
forecast_days <- 180

fits <- nc_log_diff %>%
  add_draws(colSums(aperm(fit$state.contributions, c(2, 1, 3))))

predictions <- nc_log_diff %$%
  tibble(date = max(date) + days(1:forecast_days)) %>%
  add_draws(predict(fit, horizon = forecast_days)$distribution, value = "log_diff_vax")

Now we need to convert our predictions back into percentages:

Show code
pred_vax = predictions %>%
  group_by(.draw) %>%
  mutate(
    vax = plogis(cumsum(c(
      qlogis(tail(nc_log_diff$vax, 1)) + exp(log_diff_vax[[1]]),
      exp(log_diff_vax[-1])
    )))
  )

Results

Now we can visualize the outputs. We see that there is a high degree of uncertainty regarding what proportion of the population will likely be vaccinated in the coming 180 days.

Show code
widths = c(.5, .8, .95)
pred_vax %>%
  ggplot(aes(date, vax)) +
  stat_lineribbon(.width = widths, color = "#08519c") +
  scale_fill_brewer()+
  geom_line(data = nc_oa, size = 1)+
  labs(
    title = "Percent vaccinated (at least partially vaccinated)",
    subtitle = "For the State of North Carolina",
    y = NULL,
    x = NULL
  )+
  scale_y_continuous(limits = c(0,1), labels = scales::percent_format())

The next big question is what will be our likely vaccine coverage. If we want to target 80% of the population vaccinated, which is aggressive, but needed if \(R0\) of SARS-CoV-2 is close to 4-4.5 and the vaccine effectiveness is between 90-95%.

Show code
prob <- pred_vax %>%
  group_by(date) %>%
  summarise(prob_vax_gt_x = mean(vax > .80)) %>%
  ggplot(aes(date, prob_vax_gt_x)) +
  geom_line(color = "#08519c", size = 1) +
  theme_ggdist() +
  geom_hline(yintercept = 0.5, alpha = 0.25) +
  scale_y_continuous(limits = c(0,1), labels = scales::percent_format()) +
  labs(
    subtitle = "Pr(Percent vaccinated > 80%)",
    y = NULL,
    x = NULL
  )

prob

Sadly, it looks like that is a low probability outcome.

What about a county?

Just for fun I want to run through a single North Carolina County and see if there is any difference.

Show code
# Oddly one day showed a decrease...government records....
forsyth_log_diff <- dat[county=="Forsyth",c("county", 
                                            "people_partial_vax",
                                            "population",
                                            "date")] %>%
  .[,vax:=people_partial_vax/population] %>% 
  group_by(people_partial_vax) %>% 
  mutate(id = row_number()) %>% 
  filter(id == min(id)) %>% 
  dplyr::select(-id) %>% 
  ungroup() %>% 
  mutate(log_diff_vax = c(NA, log(diff(qlogis(vax))))) %>%
  filter(!is.na(log_diff_vax)) %>% 
  slice(-1)

fit = with(forsyth_log_diff, bsts(log_diff_vax, 
  state.specification = list() %>%
    AddSemilocalLinearTrend(log_diff_vax,
      level.sigma.prior = SdPrior(0.5, 1),
      slope.mean.prior = NormalPrior(0,0.5),
      initial.level.prior = NormalPrior(0,0.5),
      initial.slope.prior = NormalPrior(0,0.5),
      slope.sigma.prior = SdPrior(0.5, 1),
      slope.ar1.prior = Ar1CoefficientPrior(0, 0.5)
    ) %>%
    AddSeasonal(log_diff_vax, 7, 
      sigma.prior = SdPrior(0.5, 1)
    ),
  prior = SdPrior(0.5, 1),
  niter = 40000,
  seed = 336 
))
=-=-=-=-= Iteration 0 Sat May  8 00:39:21 2021 =-=-=-=-=
=-=-=-=-= Iteration 4000 Sat May  8 00:39:26 2021 =-=-=-=-=
=-=-=-=-= Iteration 8000 Sat May  8 00:39:31 2021 =-=-=-=-=
=-=-=-=-= Iteration 12000 Sat May  8 00:39:36 2021 =-=-=-=-=
=-=-=-=-= Iteration 16000 Sat May  8 00:39:41 2021 =-=-=-=-=
=-=-=-=-= Iteration 20000 Sat May  8 00:39:46 2021 =-=-=-=-=
=-=-=-=-= Iteration 24000 Sat May  8 00:39:51 2021 =-=-=-=-=
=-=-=-=-= Iteration 28000 Sat May  8 00:39:56 2021 =-=-=-=-=
=-=-=-=-= Iteration 32000 Sat May  8 00:40:02 2021 =-=-=-=-=
=-=-=-=-= Iteration 36000 Sat May  8 00:40:07 2021 =-=-=-=-=
Show code
fits <- forsyth_log_diff %>%
  add_draws(colSums(aperm(fit$state.contributions, c(2, 1, 3))))

predictions <- forsyth_log_diff %$%
  tibble(date = max(date) + days(1:forecast_days)) %>%
  add_draws(predict(fit, horizon = forecast_days)$distribution, value = "log_diff_vax")

pred_vax = predictions %>%
  group_by(.draw) %>%
  mutate(
    vax = plogis(cumsum(c(
      qlogis(tail(forsyth_log_diff$vax, 1)) + exp(log_diff_vax[[1]]),
      exp(log_diff_vax[-1])
    )))
  )
Show code
pred_vax %>%
  ggplot(aes(date, vax)) +
  stat_lineribbon(.width = widths, color = "#08519c") +
  scale_fill_brewer()+
  geom_line(data = forsyth_log_diff, size = 1)+
  labs(
    title = "Percent vaccinated (at least partially vaccinated)",
    subtitle = "For the Forsth County, North Carolina",
    y = NULL,
    x = NULL
  )+
  scale_y_continuous(limits = c(0,1), labels = scales::percent_format())

Show code
pred_vax %>%
  group_by(date) %>%
  summarise(prob_vax_gt_x = mean(vax > .80)) %>%
  ggplot(aes(date, prob_vax_gt_x)) +
  geom_line(color = "#08519c", size = 1) +
  theme_ggdist() +
  geom_hline(yintercept = 0.5, alpha = 0.25) +
  scale_y_continuous(limits = c(0,1), labels = scales::percent_format()) +
  labs(
    title = "Pr(Percent vaccinated > 80%)",
    subtitle = "For Forsyth County, North Carolina",
    y = NULL,
    x = NULL
  )

Hmm, not too much better

Show code
knitr::include_graphics("notbad.gif")

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/medewitt/medewitt.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

DeWitt (2021, May 7). Michael DeWitt: Time to Vaccinate. Retrieved from https://michaeldewittjr.com/programming/2021-05-07-time-to-vaccinate/

BibTeX citation

@misc{dewitt2021time,
  author = {DeWitt, Michael},
  title = {Michael DeWitt: Time to Vaccinate},
  url = {https://michaeldewittjr.com/programming/2021-05-07-time-to-vaccinate/},
  year = {2021}
}