Michael DeWitt

ShinyProxy Serving Websites

This post discuses using the ShinyProxy framework to serve static html sites. These products could be generated from single R Markdown documents to entire websites. Serving these items in containers gives you all the benefits of containerising your work along with the ability to authenticate through ShinyProxy if desired.

Bayesian SIR

In this post I review how to build a compartmental model using the Stan probabilistic computing language. This is based largely by the case study, [Bayesian workflow for disease transmission modeling in Stan](https://mc-stan.org/users/documentation/case-studies/boarding_school_case_study.html) which has been expanded to include a second compartment for exposed individuals as well as utilise case incidence data rather than prevalence.

Negative Binomial Distribution and Epidemics

Super-spreading events can be characterised by a single case spreading to a larger than expected number of people. This phenomenon can be well-represented by a negative binomial distribution versus a standard Poisson distribution. In this post I review the overdispersion factor and how it can be parameterised in a model.

Optimisation with Stan

Using Stan for optimization.

ggdist and Epidemic Curves

This post explores using tools to summarise curves rather than fixed time summary methods. This includes using odin and ggdist to explore the risk of underestimating epidemic curves.

julia ABM SIR

Use Julia and R to run agent based models in Julia and visualise them in R.

Sensitivity and Specificity

Here I explore the implications of different levels of sensitivity and specificity in a Bayesian framework. All of this work is based on Gelman and Carpenter.

Flatten the Curve

In the post I explore the potential growth rate of Covid-19 to Forsyth County, NC. This also includes looking at the kind of load that this virus could place on our existing healthcare systems. I strongly advocate for acting to delay to flood of potential community acquired infections.

Airflow on Windows Linux Subsystem

In this I detail the process for getting a working instance of Apache Airflow on Windows Linux Subsystem. This is a combination of several different posts spread across the internet. Apache Airflow is an exceptional program for scheduling and running tasks.

2020 Plans

A preview of some of the items that I will try to write about in 2020.

How About Impeachment?

In a previous blog post I looked at approval ratings. Now that impeach is the topic of the day, I think it would be wise to try the same methodology with the public opinion surrounding impeachment. While the data are much more sparse, it will be fun to examine.

Approval Rating Now?

Given the current controversy regarding President Trump, let's use a state-space Bayesian model to see what his approval rating currently is. As more surveys go into the field this will change, but let's just look now.

Integrating Over Your Loss Function

Often times when doing an analysis, it is important to put the results in the context of the loss. For example, a small effect that is cheaply implemented might be the best use of resources. Using Bayesian modeling and loss functions we can better assess the impact and provide better information for decision-making when it comes to allocation of scarce resources (especially in the world of small effect sizes).

Remembering Apollo

Some ruminations about the legacy of Apollo and doing things when failure isn't an option.

On the use of command line tools

Using `AWK` to parse court calendars

Defining a Project Workflow

Having a defined project workflow is important for many reasons. Consistency of design allows for easier sharing (you or other collaborators don't have to look for things) and reduces some cognitive load by allowing you to focus on content and less on form. This is my lightly opinionated project structure. Of course these fews are ever evolving.

Finding the Needle in the Haystack

Sometimes instead of accuracy we need to look at different metrics. One such metric is sensitivity, which is a measure of those who are actually targets how many does the model correctly identify. This can be the metric of choice over accuracy when you are dealing with a raw event such as a terrorist attack or even student retention. It is always important to understand what metrics you are optimising your models on.

State Space Models for Poll Prediction

In this section I replicate some state space poll modeling that James Savage and Peter Ellis used in a few different scenarios. State space modeling provides a great way to model times series effects when the data are collected at irregular intervals (e.g. opinion polling).

Re-districting in Winston-Salem

In this post I explore a potential outcomes to the composition of the Winston-Salem city council.

Omitted Variable Bias

A short description of the post.

MRP Redux

Using fake data simulations to understand the our MRP model.

Speeding Things Up with Rcpp

Metropolis Hasting samplers are typically slow in R because of inability to parallelise or vectorise operations. The Rcpp package allows a way to use C++ to conduct these MCMC operations at a much greater speed. This post explores how one would do this, achieving a >20x speed up.

Latex in ggplot2

This is a quick overview of a trick to add LaTex in ggplot2.

MRP using brms

This post explores MRP using brms and tidyverse modeling.

Replicating gsynth

The purpose of this post is to replicate the examples in the gsynth package for synthetic controls. This is a methodology for causal inference especially at the state level.

Hierarchical Time Series with hts

This is just a quick reproduction of the items discussed in the hts package. This allows for hierarchical time series which is an important feature when looking at data that take a hierarchical format like counties within a state or precincts within counties within states.

the power of fake data simulations

Looking at a blog post that Andrew Gelman posted on fake data simulations and HLM. The power of fake data simulations is that it really makes you think twice about what kind of effect for which you are looking as well as the power of your research design to detect it. This illustrates a really good practice for anyone looking to do this kind of analysis.

a foray into network analysis

Network analysis provides an way to analyse the interconnectedness of different networks. This can provide insight into social networks, interconnected groups of text, tweets, etc. Visualisations help to show these relationships but also some numeric values to quantify them.

models of microeconomics

Exploring the examples in Kleiber and Zeileis' Applied Economics in R

Analysis of Short Time Series

Using Fourier Transform as coefficients in short time series data helps with prediction.

make your own api

Exploring the concept of developing internal APIs. An API could also be an R package that can be used by people in your organisation to more easily connect to common data sources. This is a good example of some internal tooling that can make data access easier.

IRT and the Rasch Model

Item Response Theory (IRT) is a method by which item difficulty is assessed and used to measure latent factors. Classical test theory has a shortcoming where the test-taker's ability and the difficulty of the item cannot be separated. Thus there is a question of universalisability outside of the instrument. Additionally, the models make some assumptions that mathematically may not be justified. In come IRT which handles some of these issues.


So I'm moving to radix

Welcome to Michael DeWitt's Blog

Welcome to the rebooted blog!

Exploring forecast

Let's examine some of the functions inside for forecast

Speed it up!

This post explores how to see opportunities to make your code run faster.

Bayesian Time Series Analysis with bsts

Exploring the bsts package and what it provides for Bayesian structural time series modeling


ggrough is a great package that can be used to make graphs that look hand-drawn. This can be a great aesthetic choice when giving presentations and making handouts.

gghighlight for the win

Exploring the power of gghighlight package to automatically highlight charts

Let's Try Some Visualisation

An example of the value suppressing uncertainty scale. Great uses include forecast uncertainity.

More articles »

Michael DeWitt


Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".