MRP using brms

This post explores MRP using brms and tidyverse modeling.

Michael DeWitt https://michaeldewittjr.com
11-07-2018

Multi-Level Regression with Post-Stratification

This blogpost is a reproduction of this. Multi-level regression with post-stratification (MRP) is one of the more powerful predictive strategies for opinion polling and election forecasting. It combines the power of multi-level modeling which anticipate and predict group and population level effects which have good small N predictive properties with post-stratification which helps to temper the predictions to the demographics or other variables of interest. This method has been put out by Andrew Gelman et al and there is a good deal of literature on the subject.

This blog post then seeks to reproduce an example made by Tim Mastny using the brms package and tidyverse tooling.

Reproduction


library(tidyverse)
library(lme4)
library(brms)
library(rstan)
devtools::install_github("hrbrmstr/albersusa")

  
   checking for file ‘/private/var/folders/0x/6bnjy4n15kz8nbfbbwbyk3pr0000gn/T/Rtmp3SDOIg/remotes105a590b9a2b/hrbrmstr-albersusa-5b933bf/DESCRIPTION’ ...
  
✔  checking for file ‘/private/var/folders/0x/6bnjy4n15kz8nbfbbwbyk3pr0000gn/T/Rtmp3SDOIg/remotes105a590b9a2b/hrbrmstr-albersusa-5b933bf/DESCRIPTION’

  
─  preparing ‘albersusa’:

  
   checking DESCRIPTION meta-information ...
  
✔  checking DESCRIPTION meta-information

  
─  checking for LF line-endings in source and make files and shell scripts

  
─  checking for empty or unneeded directories

  
─  building ‘albersusa_0.3.1.tar.gz’

  
   

library(albersusa)
library(cowplot)

rstan_options(auto_write=TRUE)
options(mc.cores=parallel::detectCores())

Data

All the data required to reproduce this example is located here


marriage.data <- foreign::read.dta('gay_marriage_megapoll.dta',
                                   convert.underscore=TRUE)
Statelevel <- foreign::read.dta("state_level_update.dta",
                                convert.underscore = TRUE)
Census <- foreign::read.dta("poststratification 2000.dta",
                            convert.underscore = TRUE)

Tidy and Munge

In this section the data are being cleaned


# add state level predictors to marriage.data
Statelevel <- Statelevel %>% rename(state = sstate)

marriage.data <- Statelevel %>%
  select(state, p.evang, p.mormon, kerry.04) %>%
  right_join(marriage.data)

# Combine demographic groups
marriage.data <- marriage.data %>%
  mutate(race.female = (female * 3) + race.wbh) %>%
  mutate(age.edu.cat = 4 * (age.cat - 1) + edu.cat) %>%
  mutate(p.relig = p.evang + p.mormon) %>%
  filter(state != "")

Buildign the Post-stratifications

In this step Tim is


# change column names for natural join with marriage.data
Census <- Census %>% 
  rename(state = cstate,
         age.cat = cage.cat,
         edu.cat = cedu.cat,
         region = cregion)

Census <- Statelevel %>%
  select(state, p.evang, p.mormon, kerry.04) %>%
  right_join(Census)

Census <- Census %>%
  mutate(race.female = (cfemale * 3 ) + crace.WBH) %>%
  mutate(age.edu.cat = 4 * (age.cat - 1) + edu.cat) %>%
  mutate(p.relig = p.evang + p.mormon)

Now the fully Bayesian model can be built with the associated group level effects specified.


bayes.mod <- brm(yes.of.all ~ (1 | race.female) + (1 | age.cat) + (1 | edu.cat)
  + (1 | age.edu.cat) + (1 | state) + (1 | region)
  + p.relig + kerry.04,
data = marriage.data, family = bernoulli(),
prior = c(
  set_prior("normal(0,0.2)", class = "b"),
  set_prior("normal(0,0.2)", class = "sd", group = "race.female"),
  set_prior("normal(0,0.2)", class = "sd", group = "age.cat"),
  set_prior("normal(0,0.2)", class = "sd", group = "edu.cat"),
  set_prior("normal(0,0.2)", class = "sd", group = "age.edu.cat"),
  set_prior("normal(0,0.2)", class = "sd", group = "state"),
  set_prior("normal(0,0.2)", class = "sd", group = "region")
), iter = 1000, chains = 2, cores = 2, silent = TRUE
)

Examine the model output. If this were for a paper or a real prediction it would be important to run the model through a few more iterations with further diagnostic plots to ensure convergence.


summary(bayes.mod)

 Family: bernoulli 
  Links: mu = logit 
Formula: yes.of.all ~ (1 | race.female) + (1 | age.cat) + (1 | edu.cat) + (1 | age.edu.cat) + (1 | state) + (1 | region) + p.relig + kerry.04 
   Data: marriage.data (Number of observations: 6341) 
Samples: 2 chains, each with iter = 1000; warmup = 500; thin = 1;
         total post-warmup samples = 1000

Group-Level Effects: 
~age.cat (Number of levels: 4) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.42      0.10     0.26     0.64        829 1.00

~age.edu.cat (Number of levels: 16) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.09      0.06     0.00     0.23        420 1.01

~edu.cat (Number of levels: 4) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.32      0.09     0.18     0.53       1215 1.00

~race.female (Number of levels: 6) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.23      0.07     0.12     0.40        652 1.00

~region (Number of levels: 5) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.22      0.08     0.10     0.40        864 1.00

~state (Number of levels: 49) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.06      0.04     0.00     0.16        387 1.00

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept    -1.49      0.49    -2.40    -0.56        923 1.00
p.relig      -0.01      0.00    -0.02    -0.01       1469 1.00
kerry.04      0.02      0.01     0.01     0.03       2111 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

Do some visualisations


library(tidybayes)

bayes.mod %>%
  gather_draws(`sd_.*`, regex=TRUE) %>%
  ungroup() %>%
  mutate(group = stringr::str_replace_all(.variable, c("sd_" = "","__Intercept"=""))) %>%
  ggplot(aes(y=group, x = .value)) + 
  ggridges::geom_density_ridges(aes(height=..density..),
                                rel_min_height = 0.01, stat = "density",
                                scale=1.5)

Now Bring them Together

The next step then is to combine the predictions from the Bayesian HLM with the postratified values. This can then be used to estimate support.


ps.bayes.mod <- bayes.mod %>%
  add_predicted_draws(newdata=Census, allow_new_levels=TRUE) %>%
  rename(support = .prediction) %>%
  mean_qi() %>%
  mutate(support = support * cpercent.state) %>%
  group_by(state) %>%
  summarise(support = sum(support))

ps.bayes.mod %>% 
  knitr::kable()
state support
AK 0.3466636
AL 0.1764107
AR 0.1876421
AZ 0.3879724
CA 0.4472327
CO 0.4052016
CT 0.4226956
DC 0.5028808
DE 0.3407151
FL 0.2899568
GA 0.2378031
HI 0.4288982
IA 0.3062509
ID 0.2660855
IL 0.3523119
IN 0.2678527
KS 0.2650033
KY 0.2046088
LA 0.2364903
MA 0.4665508
MD 0.3383659
ME 0.4025905
MI 0.3285792
MN 0.3380396
MO 0.2666576
MS 0.1916718
MT 0.3340163
NC 0.2443973
ND 0.2694332
NE 0.2474155
NH 0.4137035
NJ 0.4124982
NM 0.3964261
NV 0.3893627
NY 0.4413092
OH 0.3231159
OK 0.1801253
OR 0.4013172
PA 0.3674844
RI 0.4470249
SC 0.2152091
SD 0.2621392
TN 0.2123831
TX 0.2371166
UT 0.1907503
VA 0.2821133
VT 0.4400084
WA 0.4206920
WI 0.3247024
WV 0.2618787
WY 0.2951907

With uncertainity

Now we can add some additional quantification of uncertainity by adding multiple draws from the posterior distribution.


pred<-bayes.mod %>%
  add_predicted_draws(newdata=Census, allow_new_levels=TRUE, n = 10) %>%
  rename(support = .prediction) %>%
  mutate(support = support * cpercent.state)%>%
  group_by(state, add = TRUE) %>%
  mean_qi()

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

DeWitt (2018, Nov. 7). Michael DeWitt: MRP using brms. Retrieved from https://michaeldewittjr.com/dewitt_blog/posts/2018-11-07-mrp-using-brms/

BibTeX citation

@misc{dewitt2018mrp,
  author = {DeWitt, Michael},
  title = {Michael DeWitt: MRP using brms},
  url = {https://michaeldewittjr.com/dewitt_blog/posts/2018-11-07-mrp-using-brms/},
  year = {2018}
}