Insufficient Statistics - Sensitivity and Specificity

One important concept that is discuss often is the role of sensitivity and specificity in testing. Here we are looking at the ability of a given diagnostic test to correctly identify true positives and correctly identify false positives. If you always forget which one is sensitivity and which one is specifity, no worries, I feel like everyone googles it.

There has been a lot written already about this topic, and I certainly won’t add anymore than has already been written. Regardless, I would like to just do some

library(rstan)

Loading required package: StanHeaders


rstan version 2.26.3 (Stan version 2.26.1)

For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores()).
To avoid recompilation of unchanged Stan programs, we recommend calling
rstan_options(auto_write = TRUE)
For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
change `threads_per_chain` option:
rstan_options(threads_per_chain = 1)

library(cmdstanr)

This is cmdstanr version 0.4.0

- Online documentation and vignettes at mc-stan.org/cmdstanr

- CmdStan path set to: /Users/michael/.cmdstanr/cmdstan-2.28.2

- Use set_cmdstan_path() to change the path


A newer version of CmdStan is available. See ?install_cmdstan() to install it.
To disable this check set option or environment variable CMDSTANR_NO_VER_CHECK=TRUE.

library(ggplot2)
theme_set(theme_minimal())

First, I’ll start with the model from Gelman and Carpenter and make a few tweaks, and then explore the results.

writeLines(readLines("seroprevalence.stan"))

// The input data is a vector 'y' of length 'N'.
data {
  int<lower = 0> y_sample;
  int<lower = 0> n_sample;
  real sensitivity;
  real specificity;
  real sensitivity_sd;
  real specificity_sd;
}

// The parameters accepted by the model. Our model
// accepts two parameters 'mu' and 'sigma'.
parameters {
  real<lower=0, upper = 1> p;
  real<lower=0, upper = 1> spec;
  real<lower=0, upper = 1> sens;
}

// The positivites observed are a function of 
// sensitivity and specificity of test then
// drawn from a bininomial distribution
model {
  real p_sample = p * sens + (1 - p) * (1 - spec);
  y_sample ~ binomial(n_sample, p_sample);
  
  // Use Priors for Sensitivity and Specificity
  // Truncated Normal Because That's What is 
  // Available
  
  spec ~ normal(specificity, specificity_sd);
  sens ~ normal(sensitivity, sensitivity_sd);
  
}

I made a slight tweak in that the sensitivity and specificity mean and variance are set by the user rather than estimating them from data. Most of the time we don’t have data like these in the hospital where we swab someone multiple times and know the ground “truth.” This would be the equivalent of reading these testing stats off of the proverbial box.

CmdStan path set to: /Users/michael/.cmdstanr/cmdstan-2.24.0-rc1

We can go ahead and compile our model using cmdstan_model. You’ll note that I am using the cmdstanr package rather than rstan for compiling and sampling from Stan. This decision is intentional…mainly because I chronically have issues with upgrading rstan on every platform. cmstand seeks to ameliorate this by being a thin wrapper to the command line version of stan. It’s an easy switch, but there are a few differences.

mod <- cmdstan_model("seroprevalence.stan")

Wake Forest University Baptist Medical Center is doing a large study for seroprevalence in North Carolina so that is a good place to start.

They list 19,057 study participants, but this is a multi-part study where not all the persons receive a test kit to test for antibodies. The participants receive a daily questionaire for symptoms and SARS-CoV-2 positive contacts (it doesn’t use that language, just Covid-19 but recall that’s the disease and not the virus). Let’s just assume 25% of the participants receive a test kit. Based on their numbers 9% on average (8-10%) have tested positive. This is somewhat inline with existing seroprevalence studies (Stringhini et al. 2020; Pollán et al. 2020; Havers et al. 2020).

Running Through Scenarios

So I looked up the sensitivity and specificity for the antibody test being used in this study (Scanwell) and plugged in those initial values.¹

postitives <- 19057*.09*.25
n <- as.integer(19057*.25)

dat_medium_sensi <- list(
    y_sample = as.integer(postitives),
    n_sample = n,
    sensitivity = .863,
    specificity = .999,
    sensitivity_sd = .01,
    specificity_sd= .01
)

fit_medium_sensi <- mod$sample(data = dat_medium_sensi,chains = 2,
                  iter_warmup = 500, refresh = 0,
                  iter_sampling = 1000, 
                  adapt_delta = .95)

Running MCMC with 2 chains, at most 4 in parallel...

Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.

Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.4 seconds.

Warning in seq.default(from = 1, len = along - 1): partial argument match of
'len' to 'length.out'

Warning in seq.default(to = N - 1, len = N - along): partial argument match of
'len' to 'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.list)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = ncol(arg.dim)): partial argument match of 'len' to
'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.names)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

So let’s see where this gives us:

knitr::kable(fit_medium_sensi$summary("p")[,c("mean", "q5", "q95")], digits = 3)

Warning in seq.default(from = 1, len = along - 1): partial argument match of
'len' to 'length.out'

Warning in seq.default(to = N - 1, len = N - along): partial argument match of
'len' to 'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.list)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = ncol(arg.dim)): partial argument match of 'len' to
'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.names)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

mean	q5	q95
0.096	0.081	0.108

bayesplot::mcmc_hist(fit_medium_sensi$draws("p"))+
  labs(
    title = "Posterior Draws for P",
    subtitle = "Seroprevalence"
  )+
  geom_vline(xintercept = .09, lty = "dashed", colour = "red")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Which puts us right in the error range that Wake reported. That’s good.

Now let’s test to see what happens if we losen the test. Let’s say that the sensivity is lower…closer to 65% which some reason (could be that many people have below antibody titers below the lower detection threshold. This means that there would be false negative).

dat_low_sensi <- list(
    y_sample = as.integer(postitives),
    n_sample = n,
    sensitivity = .65,
    specificity = .999,
    sensitivity_sd = .05,
    specificity_sd= .01
)

fit_low_sensi <- mod$sample(data = dat_low_sensi,chains = 2,
                  iter_warmup = 500, refresh = 0,
                  iter_sampling = 1000, 
                  adapt_delta = .95)

Running MCMC with 2 chains, at most 4 in parallel...

Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.

Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.2 seconds.

Warning in seq.default(from = 1, len = along - 1): partial argument match of
'len' to 'length.out'

Warning in seq.default(to = N - 1, len = N - along): partial argument match of
'len' to 'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.list)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = ncol(arg.dim)): partial argument match of 'len' to
'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.names)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Now we can look at the results:

knitr::kable(fit_low_sensi$summary("p")[,c("mean", "q5", "q95")], digits = 3)

Warning in seq.default(from = 1, len = along - 1): partial argument match of
'len' to 'length.out'

Warning in seq.default(to = N - 1, len = N - along): partial argument match of
'len' to 'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.list)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = ncol(arg.dim)): partial argument match of 'len' to
'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.names)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

mean	q5	q95
0.128	0.105	0.154

Yikes! Here we see that we could be under-estimating prevalence.

bayesplot::mcmc_hist(fit_low_sensi$draws("p"))+
  labs(
    title = "Posterior Draws for P",
    subtitle = "Seroprevalence"
  )+
  geom_vline(xintercept = .09, lty = "dashed", colour = "red")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Now maybe another scenario, is that the sensitivity is the same as the “box” value, but there is some additional noise from testing error.

dat_medi_low_sensi <- list(
    y_sample = as.integer(postitives),
    n_sample = n,
    sensitivity = .85,
    specificity = .999,
    sensitivity_sd = .05,
    specificity_sd= .01
)

fit_medi_low_sensi <- mod$sample(data = dat_medi_low_sensi,
                            chains = 2,
                  iter_warmup = 500, refresh = 0,
                  iter_sampling = 1000, 
                  adapt_delta = .95)

Running MCMC with 2 chains, at most 4 in parallel...

Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.

Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.1 seconds.

Warning in seq.default(from = 1, len = along - 1): partial argument match of
'len' to 'length.out'

Warning in seq.default(to = N - 1, len = N - along): partial argument match of
'len' to 'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.list)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = ncol(arg.dim)): partial argument match of 'len' to
'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.names)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

bayesplot::mcmc_hist(fit_medi_low_sensi$draws("p"))+
  labs(
    title = "Posterior Draws for P",
    subtitle = "Seroprevalence"
  )+
  geom_vline(xintercept = .09, lty = "dashed", colour = "red")

Warning in seq.default(from = 1, len = along - 1): partial argument match of
'len' to 'length.out'

Warning in seq.default(to = N - 1, len = N - along): partial argument match of
'len' to 'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.list)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = ncol(arg.dim)): partial argument match of 'len' to
'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.names)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We could still be missing some folks.

Specificity?

Just for completeness, it is worth looking at what happens if specificity degrades. This is generally unlikely in many of these tests (especially the case in PCR tests where if there is viral RNA present, it will grow..e.g. if you have a positive tests you are positive).

dat_low_spec <- list(
    y_sample = as.integer(postitives),
    n_sample = n,
    sensitivity = .85,
    specificity = .90,
    sensitivity_sd = .05,
    specificity_sd= .01
)

fit_low_spec <- mod$sample(data = dat_low_spec,
                            chains = 2,
                  iter_warmup = 500, refresh = 0,
                  iter_sampling = 1000, 
                  adapt_delta = .95)

Running MCMC with 2 chains, at most 4 in parallel...

Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.

Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.2 seconds.

Warning in seq.default(from = 1, len = along - 1): partial argument match of
'len' to 'length.out'

Warning in seq.default(to = N - 1, len = N - along): partial argument match of
'len' to 'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.list)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = ncol(arg.dim)): partial argument match of 'len' to
'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.names)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

bayesplot::mcmc_hist(fit_low_spec$draws("p"))+
  labs(
    title = "Posterior Draws for P",
    subtitle = "Seroprevalence"
  )+
  geom_vline(xintercept = .09, lty = "dashed", colour = "red")

Warning in seq.default(from = 1, len = along - 1): partial argument match of
'len' to 'length.out'

Warning in seq.default(to = N - 1, len = N - along): partial argument match of
'len' to 'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.list)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = ncol(arg.dim)): partial argument match of 'len' to
'length.out'

Warning in seq.default(len = N): partial argument match of 'len' to 'length.out'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(along = arg.names): partial argument match of 'along' to
'along.with'

Warning in seq.default(len = length(arg.names)): partial argument match of 'len'
to 'length.out'

Warning in seq.default(along = perm): partial argument match of 'along' to
'along.with'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Yikes! This shows the importance of having high specificity!!!

References

Havers, Fiona P., Carrie Reed, Travis Lim, Joel M. Montgomery, John D. Klena, Aron J. Hall, Alicia M. Fry, et al. 2020. “Seroprevalence of Antibodies to SARS-CoV-2 in 10 Sites in the United States, March 23-May 12, 2020.” JAMA Internal Medicine, July. https://doi.org/10.1001/jamainternmed.2020.4130.

Pollán, Marina, Beatriz Pérez-Gómez, Roberto Pastor-Barriuso, Jesús Oteo, Miguel A. Hernán, Mayte Pérez-Olmeda, Jose L. Sanmartín, et al. 2020. “Prevalence of SARS-CoV-2 in Spain (ENE-COVID): A Nationwide, Population-Based Seroepidemiological Study.” The Lancet 0 (0). https://doi.org/10.1016/S0140-6736(20)31483-5.

Stringhini, Silvia, Ania Wisniak, Giovanni Piumatti, Andrew S Azman, Stephen A Lauer, Helene Baysson, David De Ridder, et al. 2020. “Repeated Seroprevalence of Anti-SARS-CoV-2 IgG Antibodies in a Population-Based Sample from Geneva, Switzerland.” Preprint. Infectious Diseases (except HIV/AIDS). https://doi.org/10.1101/2020.05.02.20088898.

Footnotes

Note that there was another study that found in a clinical setting the results were worse, but that’s why we have simulations.↩︎

Reuse

https://creativecommons.org/licenses/by/4.0/

Citation

BibTeX citation:

@online{dewitt2020,
  author = {Michael DeWitt},
  title = {Sensitivity and {Specificity}},
  date = {2020-08-09},
  url = {https://michaeldewittjr.com/programming/2020-08-09-sensitivity-and-specificity},
  langid = {en}
}

For attribution, please cite this work as:

Michael DeWitt. 2020. “Sensitivity and Specificity.” August 9, 2020. https://michaeldewittjr.com/programming/2020-08-09-sensitivity-and-specificity.