Replicating gsynth

Introduction

Synthethic methods are a method of causal inference that seeks to combine traditional difference-in-difference types studies with time series cross sectional differences with factor analysis for uncontrolled/ unobserved measures. This method has been growing our of work initially from Abadie (Abadie, Diamond, and Hainmueller 2011a) and growing in importance/ research for state level policy analysis. It is interesting from the fact that it combines some elements of factor analysis to develop predictors and regression analysis to try to capture explained and unexplained variance.

Package

library(tidyverse)
library(gsynth)
library(panelView)

Method Assumptions

This methodology makes several critical assumptions.

  1. The model takes the functional form of Yi,t=δi,tDi,t+xi,tβ+λifi+ϵi,tY_{i,t}=\delta_{i,t}D_{i,t}+x_{i,t}^\prime\beta+\lambda_i^{\prime}f_i+\epsilon_{i,t}
  2. Strict exogeneity (e.g. the error terms are indepdent to D, X, λ\lambda and f)
  3. Weak serial dependence of the error terms
  4. Regularity of conditions

The method then uses bootstrapping for confidence intervals.

There is some basic data available in the package

data(gsynth)

The data that will be initially used will be the simdata dataset

idtimeYDX1X2efferrormualphaxiF1L1F2L2
11614.9600.750.230.00-2.605-0.361.130.251.390.01-0.18
1142916.9000.872.838.960.505-1.533.240.180.25-0.38-0.78
101101.4401.52-0.550.00-0.745-0.06-1.600.22-0.041.15-0.88
125615.7802.202.910.00-0.995-0.650.701.520.540.33-0.09
129119.6300.141.340.000.1951.62-0.530.38-0.730.99-0.51
101125.4701.430.330.000.045-0.06-1.46-0.50-0.040.55-0.88
119910.9402.652.170.00-0.095-1.20-1.510.12-0.60-0.331.02
1341813.7101.722.280.00-0.965-0.810.330.051.021.531.00
14047.8302.38-0.180.00-0.2151.281.911.37-0.930.64-1.13
1332512.0501.081.973.450.3050.32-0.37-0.690.66-0.95-0.29
1091517.4302.982.980.00-0.5650.69-1.29-1.070.181.361.37
1042321.1712.602.273.520.0751.34-0.371.012.02-0.25-0.62
128304.7902.31-0.508.95-2.5551.63-0.42-0.140.330.920.39
11986.0200.830.990.00-0.095-1.20-1.570.58-0.600.441.02
12278.5000.711.230.00-0.6650.56-0.26-1.550.671.100.45
12229.6802.001.010.000.3950.56-1.46-0.030.670.390.45
104259.3110.39-0.535.330.0251.34-0.37-0.692.02-0.95-0.62
1091612.810-1.412.240.001.7850.690.790.300.18-0.601.37
14386.430-0.360.900.000.4850.60-1.570.58-0.300.44-0.54
134272.320-0.130.368.13-0.715-0.81-1.05-0.281.02-0.781.00
125163.0000.66-0.670.00-1.015-0.650.790.300.54-0.60-0.09
13017-7.680-0.10-3.480.00-0.1950.790.770.45-1.222.19-1.35
129107.4701.150.750.00-0.2151.62-1.600.22-0.731.15-0.51
1322910.770-0.021.2311.04-0.195-0.963.240.181.39-0.380.58
1432913.1502.110.248.821.3350.603.240.18-0.30-0.38-0.54

It can be examined via the panelView package:

panelview(Y ~ D, data = simdata,  index = c("id","time")) 

Modeling

out <- gsynth(Y ~ D + X1 + X2, data = simdata, 
                  index = c("id","time"), force = "two-way",
                  CV = TRUE, r = c(0, 5), se = TRUE, 
                  inference = "parametric", nboots = 1000,
                  parallel = FALSE)
Cross-validating ... 
 r = 0; sigma2 = 1.84865; IC = 1.02023; PC = 1.74458; MSPE = 2.37280
 r = 1; sigma2 = 1.51541; IC = 1.20588; PC = 1.99818; MSPE = 1.71743
 r = 2; sigma2 = 0.99737; IC = 1.16130; PC = 1.69046; MSPE = 1.14540*
 r = 3; sigma2 = 0.94664; IC = 1.47216; PC = 1.96215; MSPE = 1.15032
 r = 4; sigma2 = 0.89411; IC = 1.76745; PC = 2.19241; MSPE = 1.21397
 r = 5; sigma2 = 0.85060; IC = 2.05928; PC = 2.40964; MSPE = 1.23876

 r* = 2


Simulating errors .............
Bootstrapping ...
..........
print(out)
Call:
gsynth.formula(formula = Y ~ D + X1 + X2, data = simdata, index = c("id", 
    "time"), force = "two-way", r = c(0, 5), CV = TRUE, se = TRUE, 
    nboots = 1000, inference = "parametric", parallel = FALSE)

Average Treatment Effect on the Treated:
        Estimate  S.E. CI.lower CI.upper p.value
ATT.avg    5.544 0.262     5.03    6.057       0

   ~ by Period (including Pre-treatment Periods):
          ATT   S.E. CI.lower CI.upper   p.value n.Treated
-19  0.392160 0.5468  -0.6796  1.46396 4.733e-01         0
-18  0.276548 0.4461  -0.5979  1.15098 5.354e-01         0
-17 -0.275393 0.5428  -1.3392  0.78841 6.119e-01         0
-16  0.441201 0.4389  -0.4191  1.30150 3.148e-01         0
-15 -0.889595 0.4914  -1.8527  0.07348 7.023e-02         0
-14  0.593891 0.4633  -0.3142  1.50199 1.999e-01         0
-13  0.528749 0.4132  -0.2811  1.33860 2.007e-01         0
-12  0.171569 0.5975  -0.9994  1.34258 7.740e-01         0
-11  0.610832 0.4474  -0.2661  1.48774 1.722e-01         0
-10  0.170597 0.4516  -0.7146  1.05581 7.056e-01         0
-9  -0.271892 0.5871  -1.4226  0.87886 6.433e-01         0
-8   0.094843 0.5037  -0.8924  1.08204 8.506e-01         0
-7  -0.651976 0.5567  -1.7430  0.43904 2.415e-01         0
-6   0.573686 0.4783  -0.3638  1.51113 2.304e-01         0
-5  -0.469686 0.5150  -1.4792  0.53978 3.618e-01         0
-4  -0.077766 0.5287  -1.1140  0.95850 8.831e-01         0
-3  -0.141785 0.5473  -1.2144  0.93085 7.956e-01         0
-2  -0.157100 0.4186  -0.9776  0.66341 7.075e-01         0
-1  -0.915575 0.4856  -1.8673  0.03616 5.936e-02         0
0   -0.003309 0.3638  -0.7164  0.70975 9.927e-01         0
1    1.235962 0.7289  -0.1927  2.66458 8.995e-02         5
2    1.630264 0.5615   0.5297  2.73079 3.691e-03         5
3    2.712178 0.5508   1.6326  3.79178 8.487e-07         5
4    3.466758 0.7437   2.0092  4.92431 3.136e-06         5
5    5.740132 0.5367   4.6882  6.79203 0.000e+00         5
6    5.280035 0.5741   4.1548  6.40529 0.000e+00         5
7    8.436485 0.4843   7.4874  9.38561 0.000e+00         5
8    7.839902 0.6408   6.5839  9.09592 0.000e+00         5
9    9.455115 0.5450   8.3869 10.52329 0.000e+00         5
10   9.638509 0.5035   8.6517 10.62527 0.000e+00         5

Coefficients for the Covariates:
    beta    S.E. CI.lower CI.upper p.value
X1 1.022 0.03004    0.963    1.081       0
X2 3.053 0.02958    2.995    3.111       0
out$est.att
$```

                 ATT      S.E.   CI.lower    CI.upper      p.value n.Treated
    -19  0.392159788 0.5468489 -0.6796443  1.46396386 4.732961e-01         0
    -18  0.276547958 0.4461493 -0.5978887  1.15098460 5.353532e-01         0
    -17 -0.275392926 0.5427657 -1.3391941  0.78840827 6.118824e-01         0
    -16  0.441201288 0.4389354 -0.4190963  1.30149888 3.148187e-01         0
    -15 -0.889595124 0.4913755 -1.8526735  0.07348322 7.023098e-02         0
    -14  0.593890957 0.4633226 -0.3142046  1.50198651 1.999097e-01         0
    -13  0.528749012 0.4131945 -0.2810973  1.33859533 2.006643e-01         0
    -12  0.171568737 0.5974650 -0.9994412  1.34257868 7.739889e-01         0
    -11  0.610832288 0.4474116 -0.2660784  1.48774295 1.721720e-01         0
    -10  0.170597468 0.4516495 -0.7146193  1.05581419 7.056379e-01         0
    -9  -0.271891657 0.5871305 -1.4226462  0.87886293 6.433030e-01         0
    -8   0.094842558 0.5036802 -0.8923526  1.08203768 8.506422e-01         0
    -7  -0.651975701 0.5566512 -1.7429921  0.43904067 2.414998e-01         0
    -6   0.573686472 0.4782973 -0.3637590  1.51113193 2.303589e-01         0
    -5  -0.469685905 0.5150445 -1.4791545  0.53978268 3.618041e-01         0
    -4  -0.077766449 0.5287181 -1.1140349  0.95850205 8.830650e-01         0
    -3  -0.141784521 0.5472736 -1.2144212  0.93085211 7.955779e-01         0
    -2  -0.157100323 0.4186374 -0.9776145  0.66341385 7.074627e-01         0
    -1  -0.915575087 0.4855890 -1.8673120  0.03616184 5.936318e-02         0
    0   -0.003308833 0.3638145 -0.7163722  0.70975454 9.927435e-01         0
    1    1.235962010 0.7289023 -0.1926602  2.66458419 8.995247e-02         5
    2    1.630264312 0.5615032  0.5297382  2.73079044 3.691436e-03         5
    3    2.712177702 0.5508277  1.6325753  3.79178010 8.486982e-07         5
    4    3.466757691 0.7436652  2.0092007  4.92431464 3.135799e-06         5
    5    5.740132310 0.5366920  4.6882353  6.79202929 0.000000e+00         5
    6    5.280034526 0.5741182  4.1547836  6.40528546 0.000000e+00         5
    7    8.436484821 0.4842566  7.4873594  9.38561026 0.000000e+00         5
    8    7.839901526 0.6408388  6.5838805  9.09592251 0.000000e+00         5
    9    9.455114684 0.5449999  8.3869345 10.52329485 0.000000e+00         5
    10   9.638509457 0.5034590  8.6517480 10.62527087 0.000000e+00         5

``` r
out$est.avg
$```

            Estimate     S.E. CI.lower CI.upper p.value
    ATT.avg 5.543534 0.262005 5.030014 6.057054       0

``` r
out$est.beta
$```

           beta       S.E.  CI.lower CI.upper p.value
    X1 1.021890 0.03004240 0.9630082 1.080772       0
    X2 3.052994 0.02957771 2.9950231 3.110966       0

The default plot:

``` r
plot(out) +
  theme(plot.title = element_text(size = 12))

Modified:

plot(out, theme.bw = TRUE) +
  theme(plot.title = element_text(size = 12))

With the raw data:

plot(out, type = "raw", theme.bw = TRUE)+
  theme(plot.title = element_text(size = 12))
plot(out,type = "raw", legendOff = TRUE, ylim=c(-10,40), main="")+
  theme(plot.title = element_text(size = 12))
plot(out, type = "counterfactual", raw = "none", main="")+
  theme(plot.title = element_text(size = 12))
plot(out, type = "counterfactual", raw = "band", xlab = "Time", 
     ylim = c(-5,35), theme.bw = TRUE)+
  theme(plot.title = element_text(size = 12))
plot(out, type = "counterfactual", raw = "all")+
  theme(plot.title = element_text(size = 12))
plot(out, type = "factors", xlab = "Time")+
  theme(plot.title = element_text(size = 12))

Voter Turnout

Now I am going to try to reproduce the results from Xu’s paper xugeneralized2017xu_generalized_2017 and implemented in his package (Xu and Liu 2018).

Data

abbyearturnoutpolicy_edrpolicy_mail_inpolicy_motor
AL192021.02107000
AL192413.58233000
AL192819.03850000
AL193217.62002000
AL193618.69238000
AL194018.86406000

Model

The paper builds two models, one will only the impact of voter education, another with additional controls. For the sake of time I will build the model with all of the controls used.

out <- gsynth(turnout ~ policy_edr + policy_mail_in + policy_motor, data = turnout, 
                  index = c("abb","year"), force = "two-way",
                  CV = TRUE, r = c(0, 5), se = TRUE, 
                  inference = "parametric", nboots = 1000,
                  parallel = FALSE)
Cross-validating ... 
 r = 0; sigma2 = 77.99366; IC = 4.82744; PC = 72.60594; MSPE = 22.13889
 r = 1; sigma2 = 13.65239; IC = 3.52566; PC = 21.67581; MSPE = 12.03686
 r = 2; sigma2 = 8.56312; IC = 3.48518; PC = 19.23841; MSPE = 10.31254*
 r = 3; sigma2 = 6.40268; IC = 3.60547; PC = 18.61783; MSPE = 11.48390
 r = 4; sigma2 = 4.74273; IC = 3.70145; PC = 16.93707; MSPE = 16.28613
 r = 5; sigma2 = 4.02488; IC = 3.91847; PC = 17.05226; MSPE = 15.78683

 r* = 2


Simulating errors .............
Bootstrapping ...
..........

The model completed and appeared that it did well. Now to examine the figures to see if they match:

plot(out)

Now we can look at the estimated average treatment

out$est.avg %>% 
$  knitr::kable()
EstimateS.E.CI.lowerCI.upperp.value
ATT.avg4.8957882.3152320.35801639.433560.0344641
plot(out, type = "counterfactual", raw = "all")+
  theme(plot.title = element_text(size = 12))

Now with Synth

Now I want to try to replicate the previous case study used in Synth (Abadie, Diamond, and Hainmueller 2011b) and see if it is possible.

library(Synth)

I am going to recode some variable in order to fit with the methods used in gsynth. The Synth package uses a dataprep function in order to format the data into the requirements.

data("basque")
basque_treat <- basque %>% 
  mutate(treat = ifelse(grepl(pattern = "Basque", x = regionname) & year > 1969,1,0))
head(basque_treat) %>% 
  knitr::kable()
regionnoregionnameyeargdpcapsec.agriculturesec.energysec.industrysec.constructionsec.services.ventasec.services.nonventaschool.illitschool.primschool.medschool.highschool.post.highpopdensinvesttreat
1Spain (Espana)19552.354542NANANANANANANANANANANANANA0
1Spain (Espana)19562.480149NANANANANANANANANANANANANA0
1Spain (Espana)19572.603613NANANANANANANANANANANANANA0
1Spain (Espana)19582.637104NANANANANANANANANANANANANA0
1Spain (Espana)19592.669880NANANANANANANANANANANANANA0
1Spain (Espana)19602.869966NANANANANANANANANANANANANA0
out_basque <- gsynth(gdpcap ~ treat, data = basque_treat, index = c("regionname","year"), force = "two-way",
                  CV = TRUE, r = c(0, 5), se = TRUE, 
                  inference = "parametric", nboots = 1000,
                  parallel = FALSE, )
Cross-validating ... 
 r = 0; sigma2 = 0.15833; IC = -1.31080; PC = 0.14556; MSPE = 0.02963
 r = 1; sigma2 = 0.02499; IC = -2.64303; PC = 0.04627; MSPE = 0.00994
 r = 2; sigma2 = 0.00985; IC = -3.07772; PC = 0.02745; MSPE = 0.00755*
 r = 3; sigma2 = 0.00581; IC = -3.12683; PC = 0.02166; MSPE = 0.00981
 r = 4; sigma2 = 0.00395; IC = -3.05355; PC = 0.01843; MSPE = 0.00329
 r = 5; sigma2 = 0.00256; IC = -3.04564; PC = 0.01435; MSPE = 0.00260

 r* = 5


Simulating errors .............
Bootstrapping ...
..........

Let’s check the results

plot(out_basque)+
  theme(plot.title = element_text(size = 12))
plot(out_basque, type = "counterfactual", raw = "all")+
  theme(plot.title = element_text(size = 12))

Interesting that this method lands on a slightly different estimate than the previous paper. I think that this is due to some missing covariates. It appears that gsynth doesn’t handle missing data too well, which is ok.

Thoughts

I think that a combination of Bayesian Hierarchical modeling with structural timeseries modeling could get somewhere close to the method, and take advantage of specifying knowns, group effects, etc. But the method is neat and a good way to do some quick analysis.

References

Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2011a. “Synth: An R Package for Synthetic Control Methods in Comparative Case Studies.” Journal of Statistical Software 42 (13): 1–17. http://www.jstatsoft.org/v42/i13/.

———. 2011b. “Synth: An r Package for Synthetic Control Methods in Comparative Case Studies.” Journal of Statistical Software, Articles 42 (13): 1–17. https://doi.org/10.18637/jss.v042.i13.

Xu, Yiqing, and Licheng Liu. 2018. Gsynth: Generalized Synthetic Control Method. https://CRAN.R-project.org/package=gsynth.

Reuse

Citation

BibTeX citation:
@online{dewitt2018,
  author = {Michael E. DeWitt},
  title = {Replicating gsynth},
  date = {2018-10-28},
  url = {https://michaeldewittjr.com/blog/2018-10-28-replicating-gsynth/},
  langid = {en}
}
For attribution, please cite this work as:
Michael E. DeWitt. October 28, 2018. "Replicating gsynth". https://michaeldewittjr.com/blog/2018-10-28-replicating-gsynth/.