ggdist and Epidemic Curves

pandemic scenarios curve statistics

This post explores using tools to summarise curves rather than fixed time summary methods. This includes using odin and ggdist to explore the risk of underestimating epidemic curves.

Michael DeWitt
Show code

This little post is based on a recent paper by Juul et al. (2020). For infectious disease modelling we are often running many scenarios through the models. Why? Because there are many parameters to estimate, but very few things that we actually observe (cases by day or incidence) so in some sense our models all suffer from identifiability issues. Additionally, many of the parameters that we do measure have some uncertainty around them. For this reason it is very common to run models with different parameters estimates or distributions in order to understand a range of possible values. A simple way to summarise these models is through point-wise quantiles through time. The genius of Juul’s paper is that this can underestimate many outcomes because it might pick up the beginning of one epidemic curve and miss the peak.

Enter curve estimates. These measures actually look at the curves proper through the trajectory rather than at discrete time points. Here we can rank the entire curve and capture the contribution of the curve rather than the point-wise estimate.

To see this let’s run a simple example from the odin package (FitzJohn 2019). Additionally, the amazing ggdist (Kay 2020) provides an easy way to summarise curves.

Show code
sir_model <- system.file("examples/discrete_stochastic_sir_arrays.R", package = "odin")

## Core equations for transitions between compartments:
update(S[]) <- S[i] - n_SI[i]
update(I[]) <- I[i] + n_SI[i] - n_IR[i]
update(R[]) <- R[i] + n_IR[i]

## Individual probabilities of transition:
p_SI[] <- 1 - exp(-beta * I[i] / N[i])
p_IR <- 1 - exp(-gamma)

## Draws from binomial distributions for numbers changing between
## compartments:
n_SI[] <- rbinom(S[i], p_SI[i])
n_IR[] <- rbinom(I[i], p_IR)

## Total population size
N[] <- S[i] + I[i] + R[i]

## Initial states:
initial(S[]) <- S_ini
initial(I[]) <- I_ini
initial(R[]) <- 0

## User defined parameters - default in parentheses:
S_ini <- user(1000)
I_ini <- user(1)
beta <- user(0.2)
gamma <- user(0.1)

## Number of replicates
nsim <- user(100)
dim(N) <- nsim
dim(S) <- nsim
dim(I) <- nsim
dim(R) <- nsim
dim(p_SI) <- nsim
dim(n_SI) <- nsim
dim(n_IR) <- nsim
Show code
x <- odin(sir_model)
─  installing *source* package ‘discrete.stochastic.sir.arrays30f32291’ ...

   ** using staged installation

   ** libs

   /usr/local/opt/llvm/bin/clang  -I"/usr/local/Cellar/r/4.1.0/lib/R/include" -DNDEBUG   -I/usr/local/opt/gettext/include -I/usr/local/opt/readline/include -I/usr/local/opt/xz/include -I/usr/local/include   -fPIC  -g -O3 -Wall -pedantic -std=gnu99 -mtune=native -pipe -c odin.c -o odin.o

   /usr/local/opt/llvm/bin/clang  -I"/usr/local/Cellar/r/4.1.0/lib/R/include" -DNDEBUG   -I/usr/local/opt/gettext/include -I/usr/local/opt/readline/include -I/usr/local/opt/xz/include -I/usr/local/include   -fPIC  -g -O3 -Wall -pedantic -std=gnu99 -mtune=native -pipe -c registration.c -o registration.o

   /usr/local/opt/llvm/bin/clang -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/usr/local/Cellar/r/4.1.0/lib/R/lib -L/usr/local/opt/gettext/lib -L/usr/local/opt/readline/lib -L/usr/local/opt/xz/lib -L/usr/local/lib -o odin.o registration.o -L/usr/local/Cellar/r/4.1.0/lib/R/lib -lR -lintl -Wl,-framework -Wl,CoreFoundation

   installing to /private/var/folders/0x/6bnjy4n15kz8nbfbbwbyk3pr0000gn/T/RtmpNf7yrC/devtools_install_12c3a57ffc9a2/00LOCK-file12c3a59188454/00new/discrete.stochastic.sir.arrays30f32291/libs

   ** checking absolute paths in shared objects and dynamic libraries

─  DONE (discrete.stochastic.sir.arrays30f32291)
Show code
fit <- x()

sampz <-$run(step = 100))

Now that we have values, a simple analysis would look like the following:

Show code
val_long <- sampz %>% 
  select(step, contains("I")) %>% 
  tidyr::gather(key, value, -step) %>% 
  mutate(.draw = readr::parse_number(stringr::str_extract(key,"\\d+")))

sumz_point <- val_long %>% 
  group_by(step) %>% 
  summarise(q05 = quantile(value, probs = .05),
            q95 = quantile(value, probs = .95))

val_long %>% 
  ggplot(aes(step, y = value, group = .draw))+
  geom_line(alpha = .2)+
  geom_ribbon(data = sumz_point, 
              aes(x = step, ymin = q05, ymax = q95), 
              inherit.aes = FALSE, fill = "orange", alpha =.2)+
    title = "Simulated SIR Curve for Infections"

If we take the curve distribution approach we can see the following

Show code
p <- val_long %>% 
  group_by(step) %>%
  curve_interval(value, .width = c(.5, .8, .95)) %>%
  ggplot(aes(x = step, y = value)) +
  geom_hline(yintercept = 1, color = "gray75", linetype = "dashed") +
  geom_lineribbon(aes(ymin = .lower, ymax = .upper)) +
  scale_fill_brewer() +
    title = "Simulated SIR Curve for Infections"


Now when we lay these ontop of one another, we see the issue with using the time interval method:

Show code
  geom_ribbon(data = sumz_point, 
              aes(x = step, ymin = q05, ymax = q95), 
              inherit.aes = FALSE, fill = "orange", alpha =.5)

We see that using the fixed time method we underestimate the peak much of the time! This is very dangerous and important when setting policy and establishing needs.

FitzJohn, Rich. 2019. Odin: ODE Generation and Integration.
Juul, Jonas L., Kaare Græsbøll, Lasse Engbo Christiansen, and Sune Lehmann. 2020. “Fixed-Time Descriptive Statistics Underestimate Extremes of Epidemic Curve Ensembles.” arXiv:2007.05035 [Physics, q-Bio, Stat], July.
Kay, Matthew. 2020. ggdist: Visualizations of Distributions and Uncertainty.



If you see mistakes or want to suggest changes, please create an issue on the source repository.


Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".


For attribution, please cite this work as

DeWitt (2020, Aug. 9). Michael DeWitt: ggdist and Epidemic Curves. Retrieved from

BibTeX citation

  author = {DeWitt, Michael},
  title = {Michael DeWitt: ggdist and Epidemic Curves},
  url = {},
  year = {2020}