I just wanted to explore a little more some of the topics covered in the fantastic Applied Econometrics with R. All of these examples come from their text in Chapter 3.

Loading required package: car

Loading required package: carData

Loading required package: lmtest

Loading required package: zoo


Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

Loading required package: sandwich

Loading required package: survival

Binary Depedent Modeling

  participation   income age education youngkids oldkids foreign
1            no 10.78750 3.0         8         1       1      no
2           yes 10.52425 4.5         8         0       1      no
3            no 10.96858 4.6         9         0       0      no
4            no 11.10500 3.1        11         2       0      no
5            no 11.10847 4.4        12         0       2      no
6           yes 11.02825 4.2        12         0       1      no


Call:
glm(formula = participation ~ . + I(age^2), family = binomial(link = "probit"), 
    data = SwissLabor)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9191  -0.9695  -0.4792   1.0209   2.4803  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.74909    1.40695   2.665  0.00771 ** 
income      -0.66694    0.13196  -5.054 4.33e-07 ***
age          2.07530    0.40544   5.119 3.08e-07 ***
education    0.01920    0.01793   1.071  0.28428    
youngkids   -0.71449    0.10039  -7.117 1.10e-12 ***
oldkids     -0.14698    0.05089  -2.888  0.00387 ** 
foreignyes   0.71437    0.12133   5.888 3.92e-09 ***
I(age^2)    -0.29434    0.04995  -5.893 3.79e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1203.2  on 871  degrees of freedom
Residual deviance: 1017.2  on 864  degrees of freedom
AIC: 1033.2

Number of Fisher Scoring iterations: 4

Visualisation of the Data

Retrieving Average Marginal Effects

Average of the sample marginal effects is determined by the following:

 (Intercept)       income          age    education    youngkids      oldkids 
 1.241929965 -0.220931858  0.687466185  0.006358743 -0.236682273 -0.048690170 
  foreignyes     I(age^2) 
 0.236644422 -0.097504844

Goodness of Fit

This can be evauluated with a pseudo R^2 called _McFadden’s pseudo-R^2.

\[R^2 = 1- \frac{l(\hat\beta)}{l(\bar y)}\]

Warning in update.default(swiss_probit, formula = . ~ 1): partial argument match
of 'formula' to 'formula.'

Warning in match.call(definition, call, expand.dots, envir): partial argument
match of 'formula' to 'formula.'

[1] 0.1546416

     pred
true    0   1
  no  337 134
  yes 146 255


Attaching package: 'ROCR'

The following object is masked from 'package:margins':

    prediction

Residuals and Diagnostics


z test of coefficients:

             Estimate Std. Error z value  Pr(>|z|)    
(Intercept)  3.749091   1.327072  2.8251  0.004727 ** 
income      -0.666941   0.127292 -5.2395 1.611e-07 ***
age          2.075297   0.398580  5.2067 1.922e-07 ***
education    0.019196   0.017935  1.0703  0.284479    
youngkids   -0.714487   0.106095 -6.7344 1.646e-11 ***
oldkids     -0.146984   0.051609 -2.8480  0.004399 ** 
foreignyes   0.714373   0.122437  5.8346 5.391e-09 ***
I(age^2)    -0.294344   0.049527 -5.9430 2.798e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Count Data

  trips quality ski income userfee  costC   costS   costH
1     0       0 yes      4      no  67.59  68.620  76.800
2     0       0  no      9      no  68.86  70.936  84.780
3     0       0 yes      5      no  58.12  59.465  72.110
4     0       0  no      2      no  15.79  13.750  23.680
5     0       0 yes      3      no  24.02  34.033  34.547
6     0       0 yes      5      no 129.46 137.377 137.850


z test of coefficients:

              Estimate Std. Error  z value  Pr(>|z|)    
(Intercept)  0.2649934  0.0937222   2.8274  0.004692 ** 
quality      0.4717259  0.0170905  27.6016 < 2.2e-16 ***
skiyes       0.4182137  0.0571902   7.3127 2.619e-13 ***
income      -0.1113232  0.0195884  -5.6831 1.323e-08 ***
userfeeyes   0.8981653  0.0789851  11.3713 < 2.2e-16 ***
costC       -0.0034297  0.0031178  -1.1001  0.271309    
costS       -0.0425364  0.0016703 -25.4667 < 2.2e-16 ***
costH        0.0361336  0.0027096  13.3353 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Overdispersion??


    Overdispersion test

data:  rd_poisson
z = 2.4116, p-value = 0.007941
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion 
    6.5658

Yup

Warning: partial match of 'coef' to 'coefficients'


    Overdispersion test

data:  rd_poisson
z = 2.9381, p-value = 0.001651
alternative hypothesis: true alpha is greater than 0
sample estimates:
   alpha 
1.316051

Negative Binomial

Warning in glm.fitter(x = X, y = Y, w = w, start = start, etastart = etastart, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'

Warning in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
partial argument match of 'w' to 'weights'


z test of coefficients:

              Estimate Std. Error  z value  Pr(>|z|)    
(Intercept) -1.1219363  0.2143029  -5.2353 1.647e-07 ***
quality      0.7219990  0.0401165  17.9976 < 2.2e-16 ***
skiyes       0.6121388  0.1503029   4.0727 4.647e-05 ***
income      -0.0260588  0.0424527  -0.6138   0.53933    
userfeeyes   0.6691676  0.3530211   1.8955   0.05802 .  
costC        0.0480087  0.0091848   5.2270 1.723e-07 ***
costS       -0.0926910  0.0066534 -13.9314 < 2.2e-16 ***
costH        0.0388357  0.0077505   5.0107 5.423e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                        [,1]
poisson           -1529.4313
negative_binomial  -825.5576

Improvement!

      0   1  2  3  4  5  6  7  8 9
obs 417  68 38 34 17 13 11  2  8 1
exp 277 146 68 41 30 23 17 13 10 7

This model is under-predicting the zero number of trips. Perhaps it is time to use a zero-inflated model that will help to correct this undercounting.

\[f_{zeroinfl}(y) = p_i * I_{0}(y)+(1-p_i)*f_{count}(y;\mu_i)\]

Thus the linear predictor portion uses all of the independent variables and the inflation component to be a function of quality and income.

Classes and Methods for R developed in the
Political Science Computational Laboratory
Department of Political Science
Stanford University
Simon Jackman
hurdle and zeroinfl functions by Achim Zeileis

Warning in model.matrix.default(object$terms$count, object$model, contrasts
= object$contrasts$count): partial argument match of 'contrasts' to
'contrasts.arg'

Warning in model.matrix.default(object$terms$zero, object$model, contrasts =
object$contrasts$zero): partial argument match of 'contrasts' to 'contrasts.arg'

Warning in model.matrix.default(object$terms$count, object$model, contrasts
= object$contrasts$count): partial argument match of 'contrasts' to
'contrasts.arg'

Warning in model.matrix.default(object$terms$zero, object$model, contrasts =
object$contrasts$zero): partial argument match of 'contrasts' to 'contrasts.arg'


Call:
zeroinfl(formula = trips ~ . | quality + income, data = RecreationDemand, 
    dist = "negbin")

Pearson residuals:
     Min       1Q   Median       3Q      Max 
-1.08868 -0.20032 -0.05687 -0.04525 39.95749 

Count model coefficients (negbin with log link):
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.096094   0.257075   4.264 2.01e-05 ***
quality      0.169019   0.053135   3.181 0.001468 ** 
skiyes       0.500479   0.134496   3.721 0.000198 ***
income      -0.069203   0.043802  -1.580 0.114130    
userfeeyes   0.542557   0.282819   1.918 0.055062 .  
costC        0.040427   0.014522   2.784 0.005372 ** 
costS       -0.066202   0.007746  -8.547  < 2e-16 ***
costH        0.020609   0.010235   2.014 0.044061 *  
Log(theta)   0.189859   0.113134   1.678 0.093312 .  

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   5.7184     1.5596   3.667 0.000246 ***
quality      -8.3596     3.9380  -2.123 0.033768 *  
income       -0.2516     0.2847  -0.884 0.376832    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta = 1.2091

Warning: partial match of 'count' to 'counts'

Number of iterations in BFGS optimization: 26 
Log-likelihood:  -722 on 12 Df

Let’s check the fit!

Warning in model.matrix.default(object$terms$count, object$model, contrasts
= object$contrasts$count): partial argument match of 'contrasts' to
'contrasts.arg'

Warning in model.matrix.default(object$terms$zero, object$model, contrasts =
object$contrasts$zero): partial argument match of 'contrasts' to 'contrasts.arg'

      0  1  2  3  4  5  6  7 8 9
obs 417 68 38 34 17 13 11  2 8 1
exp 433 47 35 27 20 16 12 10 8 7

Looks a great deal better!

Hurdle Models

Useful for an excessive number of zeros (or a small number of zeros). This is more widely used in economics according to the text. The hurdle consists of two parts

Binary part given by a count distribution that is right censored at y = 1 (e.g. is the hurdle crossed)
A count part given by a left-truncated distribution at y = 1 (e,g, if y > 0, how large is y)

Warning in model.matrix.default(object$terms$count, object$model, contrasts
= object$contrasts$count): partial argument match of 'contrasts' to
'contrasts.arg'

Warning in model.matrix.default(object$terms$zero, object$model, contrasts =
object$contrasts$zero): partial argument match of 'contrasts' to 'contrasts.arg'

      0  1  2  3  4  5  6 7 8 9
obs 417 68 38 34 17 13 11 2 8 1
exp 417 74 42 27 19 14 10 8 6 5

Censored Depdent Variables

A Tobit model posits that Gaussian linear predictor exists for a latent variable, \(y_0\) exists. IT is reported only if the latent variable is positive.

Thus:

\[y_i^0= x_i^T\beta+\epsilon_i\]

\[y_i = \begin{cases} y_i, y_i^0 >0\\ 0, y_i^0 \le 0 \end{cases},\]

   affairs gender age yearsmarried children religiousness education occupation
4        0   male  37        10.00       no             3        18          7
5        0 female  27         4.00       no             4        14          6
11       0 female  32        15.00      yes             1        12          1
16       0   male  57        15.00      yes             5        18          6
23       0   male  22         0.75       no             2        17          6
29       0 female  32         1.50       no             2        17          5
   rating
4       4
5       4
11      4
16      5
23      3
29      5

Warning: partial match of 'coef' to 'coefficients'


Call:
tobit(formula = affairs ~ age + yearsmarried + religiousness + 
    occupation + rating, data = Affairs)

Observations:
         Total  Left-censored     Uncensored Right-censored 
           601            451            150              0 

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    8.17420    2.74145   2.982  0.00287 ** 
age           -0.17933    0.07909  -2.267  0.02337 *  
yearsmarried   0.55414    0.13452   4.119 3.80e-05 ***
religiousness -1.68622    0.40375  -4.176 2.96e-05 ***
occupation     0.32605    0.25442   1.282  0.20001    
rating        -2.28497    0.40783  -5.603 2.11e-08 ***
Log(scale)     2.10986    0.06710  31.444  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Scale: 8.247 

Gaussian distribution
Number of Newton-Raphson Iterations: 4 
Log-likelihood: -705.6 on 7 Df
Wald-statistic: 67.71 on 5 Df, p-value: 3.0718e-13

Warning in linearHypothesis.tobit(aff_tob, c("age = 0", "occupation = 0"), :
partial argument match of 'vcov' to 'vcov.'

Linear hypothesis test

Hypothesis:
age = 0
occupation = 0

Model 1: restricted model
Model 2: affairs ~ age + yearsmarried + religiousness + occupation + rating

Note: Coefficient covariance matrix supplied.

  Res.Df Df  Chisq Pr(>Chisq)  
1    596                       
2    594  2 4.9078    0.08596 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Age and occupation are joinly weakyl significant.

Ordinal Response Variables


z test of coefficients:

                 Estimate Std. Error z value  Pr(>|z|)    
education        0.869998   0.093071  9.3476 < 2.2e-16 ***
minorityyes     -1.056438   0.411994 -2.5642   0.01034 *  
custodial|admin  7.951359   1.076932  7.3833 1.544e-13 ***
admin|manage    14.172125   1.474364  9.6124 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Reuse

https://creativecommons.org/licenses/by/4.0/

Citation

BibTeX citation:

@online{dewitt2018,
  author = {Michael DeWitt},
  title = {Models of Microeconomics},
  date = {2018-09-16},
  url = {https://michaeldewittjr.com/programming/2018-09-16-models-of-microeconomics},
  langid = {en}
}

For attribution, please cite this work as:

Michael DeWitt. 2018. “Models of Microeconomics.” September 16, 2018. https://michaeldewittjr.com/programming/2018-09-16-models-of-microeconomics.