Insufficient Statistics

Background

I recently got a question about using MRP and I thought it would be worthwhile to share some of the additional explanation of using this approach with a simulated data set. Simulating your data and testing your method is a really good way to understand if your model is sensitive enough to detect differences. This kind of fake data simulation will allow you to see if your model fails, before using it production or in the field where the cost of failure is high.

Population Data

I’m going to generate some synthetic data for this example. This will represent our population and provide a benchmark for “truth.” The data are completely made up and don’t represent anything in particular.

These data represent a population of 1 million persons, of binary gender, four different races, living in the US. Again, the proportions are made up.

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

✓ ggplot2 3.3.5.9000     ✓ purrr   0.3.4.9000
✓ tibble  3.1.6.9001     ✓ dplyr   1.0.8.9000
✓ tidyr   1.1.4.9000     ✓ stringr 1.4.0.9000
✓ readr   2.1.2          ✓ forcats 0.5.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()

Let’s imagine that each person has a probability , \(\theta\) of supporting a given opinion. Again, let’s suppose that the probability of support is partially determined by some combination of gender, race, location, and of course some random noise.

Now I’m going to draw my sample for my analysis. This would represent a completely random sample of my population.

Multi-Level Regression

Now we can step into the first component of MRP, the multi-level or hierarchical regression modeling. Here based on literature and inference that race, gender, state, and census division may be important, or help me make inference on the probability of supporting the given option. Additionally, I will use partial pooling to help make inferences for some of the small cell sizes that exist in my survey. I can build that given equation using brms.

library(brms)

Loading required package: Rcpp

Loading 'brms' package (version 2.16.1). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').


Attaching package: 'brms'

The following object is masked from 'package:stats':

    ar

my_equation <- bf(true_opinion ~ (1 | race * gender) + (1 | state) + (1 | division))

Now to see what priors I have to set. This step is important as building the above model with lots of variables may have difficulty converging.

get_prior(my_equation, data = survey)

                prior     class      coef       group resp dpar nlpar bound
 student_t(3, 0, 2.5) Intercept                                            
 student_t(3, 0, 2.5)        sd                                            
 student_t(3, 0, 2.5)        sd              division                      
 student_t(3, 0, 2.5)        sd Intercept    division                      
 student_t(3, 0, 2.5)        sd                gender                      
 student_t(3, 0, 2.5)        sd Intercept      gender                      
 student_t(3, 0, 2.5)        sd                  race                      
 student_t(3, 0, 2.5)        sd Intercept        race                      
 student_t(3, 0, 2.5)        sd           race:gender                      
 student_t(3, 0, 2.5)        sd Intercept race:gender                      
 student_t(3, 0, 2.5)        sd                 state                      
 student_t(3, 0, 2.5)        sd Intercept       state                      
 student_t(3, 0, 2.5)     sigma                                            
       source
      default
      default
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
      default

Now I can set my priors for the different coefficients in my model.

my_priors <- c(
      set_prior("normal(0,0.2)", class = "sd", group = "race:gender"),
      set_prior("normal(0,0.2)", class = "sd", group = "race"),
      set_prior("normal(0,0.2)", class = "sd", group = "gender"),
      set_prior("normal(0,0.2)", class = "sd", group = "state"),
      set_prior("normal(0,0.2)", class = "sd", group = "division")
    )

Now we can run the model in brms.

fit <- brm(my_equation, survey, prior = my_priors, 
           chains = 2, iter = 1000, cores = 2, family = bernoulli(),
           silent = TRUE)

Compiling Stan program...

Trying to compile a simple C file

Running /usr/local/Cellar/r/4.1.2/lib/R/bin/R CMD SHLIB foo.c
/usr/local/opt/llvm/bin/clang  -I"/usr/local/Cellar/r/4.1.2/lib/R/include" -DNDEBUG   -I"/usr/local/lib/R/4.1/site-library/Rcpp/include/"  -I"/usr/local/lib/R/4.1/site-library/RcppEigen/include/"  -I"/usr/local/lib/R/4.1/site-library/RcppEigen/include/unsupported"  -I"/usr/local/lib/R/4.1/site-library/BH/include" -I"/usr/local/lib/R/4.1/site-library/StanHeaders/include/src/"  -I"/usr/local/lib/R/4.1/site-library/StanHeaders/include/"  -I"/usr/local/lib/R/4.1/site-library/RcppParallel/include/"  -I"/usr/local/lib/R/4.1/site-library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -DBOOST_NO_AUTO_PTR  -include '/usr/local/lib/R/4.1/site-library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/usr/local/opt/gettext/include -I/usr/local/opt/readline/include -I/usr/local/opt/xz/include -I/usr/local/include   -fPIC  -g -O3 -Wall -pedantic -std=gnu99 -mtune=native -pipe -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /usr/local/lib/R/4.1/site-library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /usr/local/lib/R/4.1/site-library/RcppEigen/include/Eigen/Dense:1:
In file included from /usr/local/lib/R/4.1/site-library/RcppEigen/include/Eigen/Core:88:
/usr/local/lib/R/4.1/site-library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:1: error: unknown type name 'namespace'
namespace Eigen {
^
/usr/local/lib/R/4.1/site-library/RcppEigen/include/Eigen/src/Core/util/Macros.h:628:16: error: expected ';' after top level declarator
namespace Eigen {
               ^
               ;
In file included from <built-in>:1:
In file included from /usr/local/lib/R/4.1/site-library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /usr/local/lib/R/4.1/site-library/RcppEigen/include/Eigen/Dense:1:
/usr/local/lib/R/4.1/site-library/RcppEigen/include/Eigen/Core:96:10: fatal error: 'complex' file not found
#include <complex>
         ^~~~~~~~~
3 errors generated.
make: *** [foo.o] Error 1

Start sampling

Now we can visualise the outputs.

library(tidybayes)


Attaching package: 'tidybayes'

The following objects are masked from 'package:brms':

    dstudent_t, pstudent_t, qstudent_t, rstudent_t

fit %>%
  gather_draws(`sd_.*`, regex=TRUE) %>%
  ungroup() %>%
  mutate(group = stringr::str_replace_all(.variable, c("sd_" = "","__Intercept"=""))) %>%
  ggplot(aes(y=group, x = .value)) + 
  ggridges::geom_density_ridges(aes(height=..density..),
                                rel_min_height = 0.01, stat = "density",
                                scale=1.5)

Warning: as.mcmc.brmsfit is deprecated and will eventually be removed.

Additionally, we should do some additional posterior checks which includes checking our Rhat values for convervenges and our effective sample size. Additionally, some posterior predictive checks would also be helpful to ensure that our model is performing well. I won’t do that here, but it is a good practice to do those things.

We can also check some of the intercepts to see if the model detected some of the changes that we introduced.

library(bayesplot)

This is bayesplot version 1.8.1

- Online documentation and vignettes at mc-stan.org/bayesplot

- bayesplot theme set to bayesplot::theme_default()

   * Does _not_ affect other ggplot2 plots

   * See ?bayesplot_theme_set for details on theme setting

posterior <- as.matrix(fit)
dimnames(posterior)

$draw
   [1] "1"    "2"    "3"    "4"    "5"    "6"    "7"    "8"    "9"    "10"  
  [11] "11"   "12"   "13"   "14"   "15"   "16"   "17"   "18"   "19"   "20"  
  [21] "21"   "22"   "23"   "24"   "25"   "26"   "27"   "28"   "29"   "30"  
  [31] "31"   "32"   "33"   "34"   "35"   "36"   "37"   "38"   "39"   "40"  
  [41] "41"   "42"   "43"   "44"   "45"   "46"   "47"   "48"   "49"   "50"  
  [51] "51"   "52"   "53"   "54"   "55"   "56"   "57"   "58"   "59"   "60"  
  [61] "61"   "62"   "63"   "64"   "65"   "66"   "67"   "68"   "69"   "70"  
  [71] "71"   "72"   "73"   "74"   "75"   "76"   "77"   "78"   "79"   "80"  
  [81] "81"   "82"   "83"   "84"   "85"   "86"   "87"   "88"   "89"   "90"  
  [91] "91"   "92"   "93"   "94"   "95"   "96"   "97"   "98"   "99"   "100" 
 [101] "101"  "102"  "103"  "104"  "105"  "106"  "107"  "108"  "109"  "110" 
 [111] "111"  "112"  "113"  "114"  "115"  "116"  "117"  "118"  "119"  "120" 
 [121] "121"  "122"  "123"  "124"  "125"  "126"  "127"  "128"  "129"  "130" 
 [131] "131"  "132"  "133"  "134"  "135"  "136"  "137"  "138"  "139"  "140" 
 [141] "141"  "142"  "143"  "144"  "145"  "146"  "147"  "148"  "149"  "150" 
 [151] "151"  "152"  "153"  "154"  "155"  "156"  "157"  "158"  "159"  "160" 
 [161] "161"  "162"  "163"  "164"  "165"  "166"  "167"  "168"  "169"  "170" 
 [171] "171"  "172"  "173"  "174"  "175"  "176"  "177"  "178"  "179"  "180" 
 [181] "181"  "182"  "183"  "184"  "185"  "186"  "187"  "188"  "189"  "190" 
 [191] "191"  "192"  "193"  "194"  "195"  "196"  "197"  "198"  "199"  "200" 
 [201] "201"  "202"  "203"  "204"  "205"  "206"  "207"  "208"  "209"  "210" 
 [211] "211"  "212"  "213"  "214"  "215"  "216"  "217"  "218"  "219"  "220" 
 [221] "221"  "222"  "223"  "224"  "225"  "226"  "227"  "228"  "229"  "230" 
 [231] "231"  "232"  "233"  "234"  "235"  "236"  "237"  "238"  "239"  "240" 
 [241] "241"  "242"  "243"  "244"  "245"  "246"  "247"  "248"  "249"  "250" 
 [251] "251"  "252"  "253"  "254"  "255"  "256"  "257"  "258"  "259"  "260" 
 [261] "261"  "262"  "263"  "264"  "265"  "266"  "267"  "268"  "269"  "270" 
 [271] "271"  "272"  "273"  "274"  "275"  "276"  "277"  "278"  "279"  "280" 
 [281] "281"  "282"  "283"  "284"  "285"  "286"  "287"  "288"  "289"  "290" 
 [291] "291"  "292"  "293"  "294"  "295"  "296"  "297"  "298"  "299"  "300" 
 [301] "301"  "302"  "303"  "304"  "305"  "306"  "307"  "308"  "309"  "310" 
 [311] "311"  "312"  "313"  "314"  "315"  "316"  "317"  "318"  "319"  "320" 
 [321] "321"  "322"  "323"  "324"  "325"  "326"  "327"  "328"  "329"  "330" 
 [331] "331"  "332"  "333"  "334"  "335"  "336"  "337"  "338"  "339"  "340" 
 [341] "341"  "342"  "343"  "344"  "345"  "346"  "347"  "348"  "349"  "350" 
 [351] "351"  "352"  "353"  "354"  "355"  "356"  "357"  "358"  "359"  "360" 
 [361] "361"  "362"  "363"  "364"  "365"  "366"  "367"  "368"  "369"  "370" 
 [371] "371"  "372"  "373"  "374"  "375"  "376"  "377"  "378"  "379"  "380" 
 [381] "381"  "382"  "383"  "384"  "385"  "386"  "387"  "388"  "389"  "390" 
 [391] "391"  "392"  "393"  "394"  "395"  "396"  "397"  "398"  "399"  "400" 
 [401] "401"  "402"  "403"  "404"  "405"  "406"  "407"  "408"  "409"  "410" 
 [411] "411"  "412"  "413"  "414"  "415"  "416"  "417"  "418"  "419"  "420" 
 [421] "421"  "422"  "423"  "424"  "425"  "426"  "427"  "428"  "429"  "430" 
 [431] "431"  "432"  "433"  "434"  "435"  "436"  "437"  "438"  "439"  "440" 
 [441] "441"  "442"  "443"  "444"  "445"  "446"  "447"  "448"  "449"  "450" 
 [451] "451"  "452"  "453"  "454"  "455"  "456"  "457"  "458"  "459"  "460" 
 [461] "461"  "462"  "463"  "464"  "465"  "466"  "467"  "468"  "469"  "470" 
 [471] "471"  "472"  "473"  "474"  "475"  "476"  "477"  "478"  "479"  "480" 
 [481] "481"  "482"  "483"  "484"  "485"  "486"  "487"  "488"  "489"  "490" 
 [491] "491"  "492"  "493"  "494"  "495"  "496"  "497"  "498"  "499"  "500" 
 [501] "501"  "502"  "503"  "504"  "505"  "506"  "507"  "508"  "509"  "510" 
 [511] "511"  "512"  "513"  "514"  "515"  "516"  "517"  "518"  "519"  "520" 
 [521] "521"  "522"  "523"  "524"  "525"  "526"  "527"  "528"  "529"  "530" 
 [531] "531"  "532"  "533"  "534"  "535"  "536"  "537"  "538"  "539"  "540" 
 [541] "541"  "542"  "543"  "544"  "545"  "546"  "547"  "548"  "549"  "550" 
 [551] "551"  "552"  "553"  "554"  "555"  "556"  "557"  "558"  "559"  "560" 
 [561] "561"  "562"  "563"  "564"  "565"  "566"  "567"  "568"  "569"  "570" 
 [571] "571"  "572"  "573"  "574"  "575"  "576"  "577"  "578"  "579"  "580" 
 [581] "581"  "582"  "583"  "584"  "585"  "586"  "587"  "588"  "589"  "590" 
 [591] "591"  "592"  "593"  "594"  "595"  "596"  "597"  "598"  "599"  "600" 
 [601] "601"  "602"  "603"  "604"  "605"  "606"  "607"  "608"  "609"  "610" 
 [611] "611"  "612"  "613"  "614"  "615"  "616"  "617"  "618"  "619"  "620" 
 [621] "621"  "622"  "623"  "624"  "625"  "626"  "627"  "628"  "629"  "630" 
 [631] "631"  "632"  "633"  "634"  "635"  "636"  "637"  "638"  "639"  "640" 
 [641] "641"  "642"  "643"  "644"  "645"  "646"  "647"  "648"  "649"  "650" 
 [651] "651"  "652"  "653"  "654"  "655"  "656"  "657"  "658"  "659"  "660" 
 [661] "661"  "662"  "663"  "664"  "665"  "666"  "667"  "668"  "669"  "670" 
 [671] "671"  "672"  "673"  "674"  "675"  "676"  "677"  "678"  "679"  "680" 
 [681] "681"  "682"  "683"  "684"  "685"  "686"  "687"  "688"  "689"  "690" 
 [691] "691"  "692"  "693"  "694"  "695"  "696"  "697"  "698"  "699"  "700" 
 [701] "701"  "702"  "703"  "704"  "705"  "706"  "707"  "708"  "709"  "710" 
 [711] "711"  "712"  "713"  "714"  "715"  "716"  "717"  "718"  "719"  "720" 
 [721] "721"  "722"  "723"  "724"  "725"  "726"  "727"  "728"  "729"  "730" 
 [731] "731"  "732"  "733"  "734"  "735"  "736"  "737"  "738"  "739"  "740" 
 [741] "741"  "742"  "743"  "744"  "745"  "746"  "747"  "748"  "749"  "750" 
 [751] "751"  "752"  "753"  "754"  "755"  "756"  "757"  "758"  "759"  "760" 
 [761] "761"  "762"  "763"  "764"  "765"  "766"  "767"  "768"  "769"  "770" 
 [771] "771"  "772"  "773"  "774"  "775"  "776"  "777"  "778"  "779"  "780" 
 [781] "781"  "782"  "783"  "784"  "785"  "786"  "787"  "788"  "789"  "790" 
 [791] "791"  "792"  "793"  "794"  "795"  "796"  "797"  "798"  "799"  "800" 
 [801] "801"  "802"  "803"  "804"  "805"  "806"  "807"  "808"  "809"  "810" 
 [811] "811"  "812"  "813"  "814"  "815"  "816"  "817"  "818"  "819"  "820" 
 [821] "821"  "822"  "823"  "824"  "825"  "826"  "827"  "828"  "829"  "830" 
 [831] "831"  "832"  "833"  "834"  "835"  "836"  "837"  "838"  "839"  "840" 
 [841] "841"  "842"  "843"  "844"  "845"  "846"  "847"  "848"  "849"  "850" 
 [851] "851"  "852"  "853"  "854"  "855"  "856"  "857"  "858"  "859"  "860" 
 [861] "861"  "862"  "863"  "864"  "865"  "866"  "867"  "868"  "869"  "870" 
 [871] "871"  "872"  "873"  "874"  "875"  "876"  "877"  "878"  "879"  "880" 
 [881] "881"  "882"  "883"  "884"  "885"  "886"  "887"  "888"  "889"  "890" 
 [891] "891"  "892"  "893"  "894"  "895"  "896"  "897"  "898"  "899"  "900" 
 [901] "901"  "902"  "903"  "904"  "905"  "906"  "907"  "908"  "909"  "910" 
 [911] "911"  "912"  "913"  "914"  "915"  "916"  "917"  "918"  "919"  "920" 
 [921] "921"  "922"  "923"  "924"  "925"  "926"  "927"  "928"  "929"  "930" 
 [931] "931"  "932"  "933"  "934"  "935"  "936"  "937"  "938"  "939"  "940" 
 [941] "941"  "942"  "943"  "944"  "945"  "946"  "947"  "948"  "949"  "950" 
 [951] "951"  "952"  "953"  "954"  "955"  "956"  "957"  "958"  "959"  "960" 
 [961] "961"  "962"  "963"  "964"  "965"  "966"  "967"  "968"  "969"  "970" 
 [971] "971"  "972"  "973"  "974"  "975"  "976"  "977"  "978"  "979"  "980" 
 [981] "981"  "982"  "983"  "984"  "985"  "986"  "987"  "988"  "989"  "990" 
 [991] "991"  "992"  "993"  "994"  "995"  "996"  "997"  "998"  "999"  "1000"

$variable
 [1] "b_Intercept"                             
 [2] "sd_division__Intercept"                  
 [3] "sd_gender__Intercept"                    
 [4] "sd_race__Intercept"                      
 [5] "sd_race:gender__Intercept"               
 [6] "sd_state__Intercept"                     
 [7] "r_division[New.England,Intercept]"       
 [8] "r_division[Middle.Atlantic,Intercept]"   
 [9] "r_division[South.Atlantic,Intercept]"    
[10] "r_division[East.South.Central,Intercept]"
[11] "r_division[West.South.Central,Intercept]"
[12] "r_division[East.North.Central,Intercept]"
[13] "r_division[West.North.Central,Intercept]"
[14] "r_division[Mountain,Intercept]"          
[15] "r_division[Pacific,Intercept]"           
[16] "r_gender[Female,Intercept]"              
[17] "r_gender[Male,Intercept]"                
[18] "r_race[Asian,Intercept]"                 
[19] "r_race[Black,Intercept]"                 
[20] "r_race[Hispanic,Intercept]"              
[21] "r_race[White,Intercept]"                 
[22] "r_race:gender[Asian_Female,Intercept]"   
[23] "r_race:gender[Asian_Male,Intercept]"     
[24] "r_race:gender[Black_Female,Intercept]"   
[25] "r_race:gender[Black_Male,Intercept]"     
[26] "r_race:gender[Hispanic_Female,Intercept]"
[27] "r_race:gender[Hispanic_Male,Intercept]"  
[28] "r_race:gender[White_Female,Intercept]"   
[29] "r_race:gender[White_Male,Intercept]"     
[30] "r_state[AK,Intercept]"                   
[31] "r_state[AL,Intercept]"                   
[32] "r_state[AR,Intercept]"                   
[33] "r_state[AZ,Intercept]"                   
[34] "r_state[CA,Intercept]"                   
[35] "r_state[CO,Intercept]"                   
[36] "r_state[CT,Intercept]"                   
[37] "r_state[DE,Intercept]"                   
[38] "r_state[FL,Intercept]"                   
[39] "r_state[GA,Intercept]"                   
[40] "r_state[HI,Intercept]"                   
[41] "r_state[IA,Intercept]"                   
[42] "r_state[ID,Intercept]"                   
[43] "r_state[IL,Intercept]"                   
[44] "r_state[IN,Intercept]"                   
[45] "r_state[KS,Intercept]"                   
[46] "r_state[KY,Intercept]"                   
[47] "r_state[LA,Intercept]"                   
[48] "r_state[MA,Intercept]"                   
[49] "r_state[MD,Intercept]"                   
[50] "r_state[ME,Intercept]"                   
[51] "r_state[MI,Intercept]"                   
[52] "r_state[MN,Intercept]"                   
[53] "r_state[MO,Intercept]"                   
[54] "r_state[MS,Intercept]"                   
[55] "r_state[MT,Intercept]"                   
[56] "r_state[NC,Intercept]"                   
[57] "r_state[ND,Intercept]"                   
[58] "r_state[NE,Intercept]"                   
[59] "r_state[NH,Intercept]"                   
[60] "r_state[NJ,Intercept]"                   
[61] "r_state[NM,Intercept]"                   
[62] "r_state[NV,Intercept]"                   
[63] "r_state[NY,Intercept]"                   
[64] "r_state[OH,Intercept]"                   
[65] "r_state[OK,Intercept]"                   
[66] "r_state[OR,Intercept]"                   
[67] "r_state[PA,Intercept]"                   
[68] "r_state[RI,Intercept]"                   
[69] "r_state[SC,Intercept]"                   
[70] "r_state[SD,Intercept]"                   
[71] "r_state[TN,Intercept]"                   
[72] "r_state[TX,Intercept]"                   
[73] "r_state[UT,Intercept]"                   
[74] "r_state[VA,Intercept]"                   
[75] "r_state[VT,Intercept]"                   
[76] "r_state[WA,Intercept]"                   
[77] "r_state[WI,Intercept]"                   
[78] "r_state[WV,Intercept]"                   
[79] "r_state[WY,Intercept]"                   
[80] "lp__"

mcmc_areas(posterior,
           pars = c("r_race[Asian,Intercept]",
                    "r_gender[Female,Intercept]",
                    "r_state[SC,Intercept]"),
           prob = 0.8)

It looks like the model picked up the gender differences as well as the specific difference for Asians. However, it looks like the model did not do a great job discriminating on differences for South Carolina. We could explore this further, but it is important to check that our model is performing as expected.

Create the Census Data

Now we step into the post-stratification step. Here we have the population values from our fake data. In reality you would probably use estimates from a census. Here we are interested in predicting state level opinion, so we we want to stratify at that level. If we wanted to make inferences about a different level we would stratify to that given level. I’m going to do both state level overall and race within state for this example.

(
  post_strat_values <- population_data %>%
    group_by(division, state, race, gender) %>%
    summarise(n = n()) %>%
    group_by(state) %>% # The level at which you want to measure support
    mutate(perc = n / sum(n)) %>%
    ungroup() %>% 
    group_by(state, race) %>% 
    mutate(perc_2 = n/ sum(n)) %>% 
    ungroup()
  
)

`summarise()` has grouped output by 'division', 'state', 'race'. You can
override using the `.groups` argument.

# A tibble: 400 × 7
   division    state race     gender     n   perc perc_2
   <fct>       <chr> <chr>    <chr>  <int>  <dbl>  <dbl>
 1 New England CT    Asian    Female   958 0.0475  0.478
 2 New England CT    Asian    Male    1047 0.0519  0.522
 3 New England CT    Black    Female  2061 0.102   0.509
 4 New England CT    Black    Male    1989 0.0986  0.491
 5 New England CT    Hispanic Female  3053 0.151   0.499
 6 New England CT    Hispanic Male    3060 0.152   0.501
 7 New England CT    White    Female  3997 0.198   0.499
 8 New England CT    White    Male    4009 0.199   0.501
 9 New England MA    Asian    Female   999 0.0493  0.493
10 New England MA    Asian    Male    1026 0.0506  0.507
# … with 390 more rows

Now we can add some draws from the posterior distribution to our dataset and then make inferences on them.

pred<-fit %>%
  add_predicted_draws(newdata=post_strat_values, allow_new_levels=TRUE, n = 100) %>%
  mutate(individual_support = .prediction) %>% 
   rename(support = .prediction) %>%
  mean_qi() %>%
  mutate(state_support = support * perc) %>% # Post-stratified by state
  mutate(state_race = support *perc_2) # Post-stratified by gender within state

Warning: 
In add_predicted_draws(): The `n` argument is a deprecated alias for `ndraws`.
Use the `ndraws` argument instead.
See help("tidybayes-deprecated").

Warning: Argument 'nsamples' is deprecated. Please use argument 'ndraws'
instead.

Now we can do whatever we want to do with regard to inferences:

by_state_estimated <- pred %>%
  group_by(state) %>%
  summarise(estimated_support = sum(state_support)) %>%
  left_join(population_data %>%
              group_by(state) %>%
              summarise(true_support = mean(true_opinion)))

Joining, by = "state"

by_state_estimated_2 <- pred %>%
  group_by(state) %>%
  summarise(estimated_support = sum(state_support)) %>%
  left_join(population_data %>%
              group_by(state) %>%
              summarise(true_support = mean(true_opinion))) %>% 
  left_join(survey %>%
              group_by(state) %>%
              summarise(survey_support = mean(true_opinion))) %>% 
  gather(method, prediction, -true_support, - state)

Joining, by = "state"
Joining, by = "state"

Now we can look to see how our prediction did for the population, though we missed the Southern states. Probably because our decision to partial pool on division was a bad one given the effects we introduced at the state level did not necessarily coincide with the census division.

by_state_estimated %>% 
ggplot(aes(true_support, estimated_support, label = state))+
  geom_label()+
  geom_abline(slope = 1)+
  theme_minimal()+
  xlim(.35,.55)+
  ylim(.35,.55)

But we can at least be calmed by the fact that if we made direct prediction from our survey we would have been way wrong!

by_state_estimated_2 %>% 
  ggplot(aes(prediction, true_support, color = method, label = state))+
  geom_label()+
  geom_abline(slope = 1)+
  theme_minimal()+
  xlim(.35,.55)+
  ylim(.35,.55)

Warning: Removed 24 rows containing missing values (geom_label).

Looking at support by Race within a given community requires that we use a different post-stratification variable which we created earlier.

(by_state_race_estimated <- pred %>%
  group_by(state, race) %>%
  summarise(estimated_support = sum(state_race)) %>%
  left_join(population_data %>%
              group_by(state, race) %>%
              summarise(true_support = mean(true_opinion))))

`summarise()` has grouped output by 'state'. You can override using the
`.groups` argument.
`summarise()` has grouped output by 'state'. You can override using the
`.groups` argument.
Joining, by = c("state", "race")

# A tibble: 200 × 4
# Groups:   state [50]
   state race     estimated_support true_support
   <chr> <chr>                <dbl>        <dbl>
 1 AK    Asian                0.471        0.396
 2 AK    Black                0.510        0.494
 3 AK    Hispanic             0.490        0.496
 4 AK    White                0.465        0.508
 5 AL    Asian                0.518        0.388
 6 AL    Black                0.451        0.500
 7 AL    Hispanic             0.510        0.492
 8 AL    White                0.474        0.498
 9 AR    Asian                0.462        0.395
10 AR    Black                0.463        0.496
# … with 190 more rows

Reuse

https://creativecommons.org/licenses/by/4.0/

Citation

BibTeX citation:

@online{dewitt2019,
  author = {Michael DeWitt},
  title = {MRP {Redux}},
  date = {2019-04-05},
  url = {https://michaeldewittjr.com/programming/2019-04-05-mrp-redux},
  langid = {en}
}

For attribution, please cite this work as:

Michael DeWitt. 2019. “MRP Redux.” April 5, 2019. https://michaeldewittjr.com/programming/2019-04-05-mrp-redux.