This is partially just for me to remember how to parameterise a Negative Binomial distribution. Typically, the Negative Binomial is used for the number of Bernoulli trials before the first success or number of successes is achieved (and thus a discrete distribution). For some probability of success, \(p\), given \(r\) failures and \(k\) successes we can define the probability mass function as:

\[ {k + r - 1 \choose k} * (1-p)^r*p^k \]

That is at least the textbook definition. Typically we encounter the negative binomial distribution as a way to deal with over-dispersed Poisson processes (and thus is really a mixture model of gamma and poison distributions).

## Basic Reproduction Number and Dispersion

In outbreaks and epidemics we are typically interested in the basic reproduction number, \(R_0\), which represents the average number of secondary infections caused by an index infection in a fully susceptible population. One could simulate such a branching process using a Poisson branching process.

Borrowing some simulation code from Althaus (2015) we can first simulate transmission chains with a Poisson distribution and an R of .8.

```
<- 2.5
R0 <- 1
k
<- epitrix::gamma_mucv2shapescale(mu = 3,cv = .5)
gamma_params
<- distcrete::distcrete(name = "gamma", gamma_params$scale,
serial_interval shape =gamma_params$shape,
w = 0)
# Set seed for random number generator
set.seed(645)
# Number of simulation runs
<- 1e2
runs # Number of initial cases
<- 1
seed # Initialize plot
plot(NA,xlim=c(0,100),ylim=c(0,100),xlab="Time (days)",
ylab="Cumulative number of cases",frame=FALSE)
```

```
Warning in plot.default(NA, xlim = c(0, 100), ylim = c(0, 100), xlab = "Time
(days)", : partial argument match of 'frame' to 'frame.plot'
```

```
# Set color scheme for different trajectories
<- sample(terrain.colors(runs))
cols # Simulate outbreak trajectories
for(i in 1:runs) {
<- seed
cases <- rep(0, seed)
t <- t
times while (cases > 0) {
<- rpois(cases,.8)
secondary <- numeric()
t.new for (j in 1:length(secondary)) {
<-
t.new c(
t.new,+ serial_interval$r(secondary[j])
t[j]
)
}<- length(t.new)
cases <- t.new
t <- c(times, t.new)
times
}lines(sort(times),
1:length(times),
col = cols[i],
lwd = 1)
points(max(times), length(times), col = cols[i], pch = 16)
}
```

### Super-Spreading

One of the buzzwords with the SARS-CoV-2 Pandemic has been super-spreading, and there is evidence that super-spreading events play a large role in the transmission of SARS-CoV-2 (Endo et al. 2020). Super-spreading refers to those index cases which transmit to a larger number of people than would have been expected (Lloyd-Smith et al. 2005). Larger than expected is a bit of a loose definition, do have a way of measuring this using our reproduction number (or even the basic reproduction number). Now it is true that the reproduction number and basic reproduction number are both estimated themselves from the data with associated measurement and missing data problems, but this is a thought exercise so let’s assume we know them.

We measure the degree of super-spreading using the dispersion parameter called k (or \(\phi\)). The closer the dispersion parameter is to 1, the more Poisson-like is the branching process. Conversely, as k the dispersion parameter approaches 0, the more super-spreading events there will be. We can see this as we incease the number of samples that the negative binomial with \(\phi\) converges to the mean of Poisson distribution, while the smaller \(\phi\) does not converge.

```
<- data.frame(
sims <- purrr::map_dbl(1:1000, ~mean(rpois(.x, R0))),
ran_poi <- purrr::map_dbl(1:1000, ~mean(rnbinom(.x, 1, mu = R0))),
ran_nb <- purrr::map_dbl(1:1000, ~mean(rnbinom(.x, .1, mu = R0)))
ran_nb_ss
)
matplot(sims, type = "l",
main = "Convergence of Mean",
xlab = "Samples", ylab = "Mean", adj = 0, ylim = c(0,10))
legend("topright", fill = c(1,2,3),legend = c("Poisson",
"NB with K=1",
"NB with k=.1"))
```

With the same reproduction number, but a dispersion of 0.1, we can see that some of the transmission chains are much larger. Thus is could be said that super-spreading transmission might be slow to take hold (easy to have chains that terminate quickly), but it can start wildfires; this tranmission pattern and its effect on controlling epidemics is discussed in Fraser et al. (2004).

```
<- epitrix::gamma_mucv2shapescale(mu = 3,cv = .5)
gamma_params
<- distcrete::distcrete(name = "gamma", gamma_params$scale,
serial_interval shape =gamma_params$shape,
w = 0)
# Set seed for random number generator
set.seed(645)
# Number of simulation runs
<- 1e2
runs # Number of initial cases
<- 1
seed # Initialize plot
plot(NA,xlim=c(0,100),ylim=c(0,100),xlab="Time (days)",
ylab="Cumulative number of cases",frame=FALSE)
```

```
Warning in plot.default(NA, xlim = c(0, 100), ylim = c(0, 100), xlab = "Time
(days)", : partial argument match of 'frame' to 'frame.plot'
```

```
# Set color scheme for different trajectories
<- sample(terrain.colors(runs))
cols # Simulate outbreak trajectories
<- list()
collector for(i in 1:runs) {
<- seed
cases <- rep(0, seed)
t <- t
times while (cases > 0) {
<-
secondary rnbinom(cases,
1,
.mu = .9)
<- numeric()
t.new for (j in 1:length(secondary)) {
<-
t.new c(
t.new,+ serial_interval$r(
t[j]
secondary[j]
)
)
}<- length(t.new)
cases <- t.new
t <- c(times, t.new)
times
}<- length(times)
collector[[i]] lines(sort(times),
1:length(times),
col = cols[i],
lwd = 1)
points(max(times), length(times), col = cols[i], pch = 16)
}
```

```
<- do.call(rbind, collector) - 1
transmission_chains hist(transmission_chains,
breaks = 30,
main = "Number of Secondary Cases",
adj = 0, col = "steelblue", xlab = "Secondary Cases")
```

## Proportion Responsible

Again, borrowing code from Endo et al. (2020) to estimate the proportion of infections that are responsible for a given proportion of cases.

`<- fitdistrplus::fitdist(data =as.vector(transmission_chains), distr = "nbinom" ) nb_fit `

```
=function(R0,k,prop){
prop_responsible=qnbinom(1-prop,k+1,mu=R0*(k+1)/k)
qm1=1-prop-pnbinom(qm1-1,k+1,mu=R0*(k+1)/k)
remq=remq/dnbinom(qm1,k+1,mu=R0*(k+1)/k)
remx=qm1+1
q1-pnbinom(q-1,k,mu=R0)-dnbinom(q,k,mu=R0)*remx
}
prop_responsible(nb_fit$estimate[[2]],nb_fit$estimate[[1]],.8)
```

`[1] 0.02311288`

## References

*The Lancet Infectious Diseases*15 (5): 507–8. https://doi.org/10.1016/S1473-3099(15)70135-0.

*Wellcome Open Research*5 (April): 67. https://doi.org/10.12688/wellcomeopenres.15842.1.

*Proceedings of the National Academy of Sciences*101 (16): 6146–51. https://doi.org/10.1073/pnas.0307506101.

*Nature*438 (7066): 355–59. https://doi.org/10.1038/nature04153.

## Reuse

## Citation

```
@online{dewitt2020,
author = {Michael DeWitt},
editor = {},
title = {Negative {Binomial} {Distribution} and {Epidemics}},
date = {2020-09-01},
url = {https://michaeldewittjr.com/programming/2020-08-30-negative-binomial-distribution-and-epidemics},
langid = {en}
}
```