Omitted Variable Bias

A short description of the post.

Michael DeWitt

One important concept to discuss is that of omited variable bias. This occurs when you have endogenous predictors that you do not adequately control for in your analysis.

Fake Data Simulation

As with all analysis it is best to begin with a fake data simulation in order to build intuition about the problem. In this example suppose that we have some relationship that we would like to test where X predicts Y. Additionally, let’s suppose that there is some variable that affects by X and Y called Z.

Create the Fake Data

n <- 1000

U <- rnorm(n, 5, 1)

Z <- rnorm(n, 1, 1)

X <- rnorm( n, U + 1* Z, 1)

Y <- rnorm(n ,X + 1*Z, 1)

Let’s inspect out data and see what kind of relationship we would expect:

So if we were to do a naive linear regression of X on Y we would get the following results:

fit1 <- lm(Y ~ X)


lm(formula = Y ~ X)
(Intercept) -0.98     0.15  
X            1.33     0.02  
n = 1000, k = 2
residual sd = 1.27, R-Squared = 0.76

It’s a pretty good fit, but let’s look at when we include our omitted variable.

fit2 <- lm(Y ~ X + Z)

lm(formula = Y ~ X + Z)
(Intercept) 0.17     0.13   
X           0.98     0.02   
Z           0.98     0.04   
n = 1000, k = 3
residual sd = 1.00, R-Squared = 0.85

So here we see that when we include our omitted variable our R2 increases and our coefficient estimates change slightly, though the biggest change is a shrinking of our standard errors.

All this to say that it is good to inspect for omitted variable and more importantly to do the fake data simulations to see how sensitive your model is to them.


If you see mistakes or want to suggest changes, please create an issue on the source repository.


Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".


For attribution, please cite this work as

DeWitt (2019, April 7). Michael DeWitt: Omitted Variable Bias. Retrieved from

BibTeX citation

  author = {DeWitt, Michael},
  title = {Michael DeWitt: Omitted Variable Bias},
  url = {},
  year = {2019}