A short description of the post.

One important concept to discuss is that of omited variable bias. This occurs when you have endogenous predictors that you do not adequately control for in your analysis.

As with all analysis it is best to begin with a fake data simulation in order to build intuition about the problem. In this example suppose that we have some relationship that we would like to test where X predicts Y. Additionally, let’s suppose that there is some variable that affects by X and Y called Z.

```
n <- 1000
U <- rnorm(n, 5, 1)
Z <- rnorm(n, 1, 1)
X <- rnorm( n, U + 1* Z, 1)
Y <- rnorm(n ,X + 1*Z, 1)
```

Let’s inspect out data and see what kind of relationship we would expect:

So if we were to do a naive linear regression of X on Y we would get the following results:

```
fit1 <- lm(Y ~ X)
arm::display(fit1)
```

```
lm(formula = Y ~ X)
coef.est coef.se
(Intercept) -0.98 0.15
X 1.33 0.02
---
n = 1000, k = 2
residual sd = 1.27, R-Squared = 0.76
```

It’s a pretty good fit, but let’s look at when we include our omitted variable.

```
fit2 <- lm(Y ~ X + Z)
arm::display(fit2)
```

```
lm(formula = Y ~ X + Z)
coef.est coef.se
(Intercept) 0.17 0.13
X 0.98 0.02
Z 0.98 0.04
---
n = 1000, k = 3
residual sd = 1.00, R-Squared = 0.85
```

So here we see that when we include our omitted variable our R^{2} increases and our coefficient estimates change slightly, though the biggest change is a shrinking of our standard errors.

All this to say that it is good to inspect for omitted variable and more importantly to do the fake data simulations to see how sensitive your model is to them.

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/medewitt/medewitt.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

DeWitt (2019, April 7). Michael DeWitt: Omitted Variable Bias. Retrieved from https://michaeldewittjr.com/programming/2019-04-07-omitted-variable-bias/

BibTeX citation

@misc{dewitt2019omitted, author = {DeWitt, Michael}, title = {Michael DeWitt: Omitted Variable Bias}, url = {https://michaeldewittjr.com/programming/2019-04-07-omitted-variable-bias/}, year = {2019} }