Advantage of Bayesian Hypothesis Testing

Bayes brms hypothesis contrasts Covid-19

Here looking at the differences in traditional linear contrasts versus Bayesian Hypothesis testing.

Michael DeWitt https://michaeldewittjr.com
2021-11-19

A colleague sent me the following table from a report from the UK REACT Survey:

REACT Supplemental Table of Reported Symptoms by Variant

The argument being made was that those persons who were infected with the AY4.2 variant were less likely to present with traditional COVID-19-like symptoms. This is problematic for a variety of reasons, mainly that we have been told for months what to look for as far as symptomology. Additionally, we have protocols in clinics and hospitals to triage patients, especially when testing resources are scarce. There is also a time component – if people don’t exhibit symptoms (that they think are COVID-19) or soon, they may unwittingly transmit the infection.

So Let’s Recreate the Analysis

Just for fun, I think it would be nice to re-analyze these data. I’ll make them here:

suppressPackageStartupMessages(library(tidyverse))
library(brms)

sx <- tribble(
  ~"variant", ~"sx", ~"n",
  "AY.4", 224, 484,
  "AY4.2", 33, 99,
  "AY.43", 34, 40,
  "AY.44", 7, 13,
  "AY.5", 13, 21,
  "AY.6", 7, 13,
  "B1.617.2", 50, 108
  ) %>% 
  mutate(variant_use = gsub(pattern = "\\.", "_", variant)) %>% 
  group_by(variant_use) %>% 
  nest() %>% 
  mutate(ci = map(data, ~broom::tidy(
    PropCIs::exactci(.x$sx, .x$n, conf.level = .95)))) %>% 
  unnest(cols = c(data, ci))

As always, it is a good practice to graph the data. I have added some exact confidence intervals as well given the large differences in sample size amongst the variants.

sx$prop <- with(sx, sx/n)

sx %>% 
  ggplot(aes(reorder(variant,prop)))+
  geom_pointrange(aes(y = prop, ymin = conf.low, ymax = conf.high))+
  theme_classic()+
  coord_flip()+
  labs(
    title = "Proportion of Patients with COVID-19 Sx by Variant",
    subtitle = "95% Confidence Intervals Shown"
  )

Frequentist Approach

Now we can model this using the MLE frequentist approach. Note the fancy binomial syntax given that we are looking at successes, here defined as having COVID-19 like symptoms, out of those testing positive.

We can then use the emmeans package to look at the contrasts.

fit <- glm(cbind(sx, n) ~ variant, data = sx, family = binomial())


library(emmeans)

o <- emmeans(fit, "variant", type = "response")
pwpm(o, diffs = TRUE)
            AY.4   AY.43   AY.44    AY.5    AY.6   AY4.2 B1.617.2
AY.4     [0.316]  0.1731  0.9999  0.9847  0.9999  0.7360   1.0000
AY.43      0.544 [0.459]  0.9767  0.9894  0.9767  0.0382   0.3522
AY.44      0.860   1.579 [0.350]  1.0000  1.0000  0.9660   0.9999
AY.5       0.748   1.373   0.870 [0.382]  1.0000  0.7303   0.9900
AY.6       0.860   1.579   1.000   1.150 [0.350]  0.9660   0.9999
AY4.2      1.388   2.550   1.615   1.857   1.615 [0.250]   0.8766
B1.617.2   1.000   1.836   1.163   1.337   1.163   0.720  [0.316]

Row and column labels: variant
Upper triangle: P values   null = 1  adjust = "tukey"
Diagonal: [Estimates] (prob)   type = "response"
Lower triangle: Comparisons (odds.ratio)   earlier vs. later

We can see from these contrasts that nothing really stands out. One important line in the printout is the “adjust = tukey”. One issue with multiple hypothesis tests in the frequentist framework is the need to adjust your alpha, or Type 1 error threshold when you do multiple significance tests. The classical adjustment is Bonferroni adjustments, but these are much more conservative than Tukey adjustments. Regardless, the more tests you do in this framework, the higher your threshold is.

Bayesian Hypothesis Testing

One of the many advantages of Bayesian hypothesis testing is that you don’t have to worry about multiple comparisons (there is even a Gelman paper titled something like that). There are spurious findings that can occur (enter Type M and Type S errors rather than Type 1 and Type 2 errors), but by comparing samples from the posterior distribution you can analyze exactly what your research question is.

So we can do this by building a similar model as before:

fit_bayes <- brm(sx | trials(n) ~ variant, data = sx, refresh =0,
                 family = binomial(), backend = "cmdstanr", file = "local.rds")

summary(fit_bayes)
 Family: binomial 
  Links: mu = logit 
Formula: sx | trials(n) ~ variant 
   Data: sx (Number of observations: 7) 
  Draws: 4 chains, each with iter = 1000; warmup = 0; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept          -0.15      0.09    -0.33     0.03 1.00     4274
variantAY.43        1.95      0.47     1.09     2.95 1.00     3826
variantAY.44        0.31      0.59    -0.83     1.47 1.00     3416
variantAY.5         0.66      0.46    -0.22     1.58 1.00     3511
variantAY.6         0.32      0.59    -0.84     1.48 1.00     3499
variantAY4.2       -0.55      0.24    -1.01    -0.08 1.00     3842
variantB1.617.2    -0.00      0.22    -0.43     0.42 1.00     4210
                Tail_ESS
Intercept           3524
variantAY.43        2585
variantAY.44        2847
variantAY.5         2850
variantAY.6         3113
variantAY4.2        2747
variantB1.617.2     2843

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

And now run our hypothesis test. Theoretically, we could have made this a random effects model, but for ease, I won’t.

hypothesis(fit_bayes, "variantAY4.2 < variantB1.617.2")
Hypothesis Tests for class b:
                Hypothesis Estimate Est.Error CI.Lower CI.Upper
1 (variantAY4.2)-(v... < 0    -0.55      0.29    -1.03    -0.07
  Evid.Ratio Post.Prob Star
1      33.19      0.97    *
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

Now we can see in this framework that there does appear to be evidence of a difference in the fraction of those patients presenting with traditional COVID-19 symptoms!

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/medewitt/medewitt.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

DeWitt (2021, Nov. 19). Michael DeWitt: Advantage of Bayesian Hypothesis Testing. Retrieved from https://michaeldewittjr.com/programming/2021-11-19-advantage-of-bayesian-hypothesis-testing/

BibTeX citation

@misc{dewitt2021advantage,
  author = {DeWitt, Michael},
  title = {Michael DeWitt: Advantage of Bayesian Hypothesis Testing},
  url = {https://michaeldewittjr.com/programming/2021-11-19-advantage-of-bayesian-hypothesis-testing/},
  year = {2021}
}