The goal of jive is to implement jackknife instrumental-variable estimators (JIVE) and various alternatives.
You can install the development version of jive like so:
remotes::install_github("kylebutts/jive")
This package requires sparse_model_matrix
from the dev version of
fixest
. You can install that via
remotes::install_github("lrberge/fixest")
We are going to use the data from Stevenson (2018). Stevenson leverages the quasi-random assignment of 8 judges (magistrates) in Philadelphia to study the effects pretrial detention on several outcomes, including whether or not a defendant subsequently pleads guilty.
library(jive)
#> Loading required package: fixest
data(stevenson)
jive(
guilt ~ i(black) + i(white) | bailDate | jail3 ~ 0 | judge_pre,
data = stevenson
)
#> Coefficients:
#> Estimate Robust SE Z value Pr(>z)
#> jail3 -0.0218460 -0.0075176 2.906 0.003661 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 331,971 observations, 7 instruments, 2,352 covariates
#> First-stage F: stat = 32.627
#> Sargan: stat = 3.342, p = 0.765
#> CD: stat = 3.319, p = 0.768
ujive(
guilt ~ i(black) + i(white) | bailDate | jail3 ~ 0 | judge_pre,
data = stevenson
)
#> Coefficients:
#> Estimate Robust SE Z value Pr(>z)
#> jail3 0.159077 0.070567 2.2543 0.02418 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 331,971 observations, 7 instruments, 2,352 covariates
#> First-stage F: stat = 32.627
#> Sargan: stat = 3.342, p = 0.765
#> CD: stat = 3.319, p = 0.768
ijive(
guilt ~ i(black) + i(white) | bailDate | jail3 ~ 0 | judge_pre,
data = stevenson
)
#> Coefficients:
#> Estimate Robust SE Z value Pr(>z)
#> jail3 0.159527 0.070533 2.2617 0.02371 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 331,971 observations, 7 instruments, 2,352 covariates
#> First-stage F: stat = 32.627
#> Sargan: stat = 3.342, p = 0.765
#> CD: stat = 3.319, p = 0.768
# Leave-cluster out
ijive(
guilt ~ i(black) + i(white) | bailDate | jail3 ~ 0 | judge_pre,
data = stevenson,
cluster = ~bailDate,
lo_cluster = TRUE # Default, but just to be explicit
)
#> Coefficients:
#> Estimate Clustered SE Z value Pr(>z)
#> jail3 0.174206 0.073553 2.3685 0.01786 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 331,971 observations, 7 instruments, 2,352 covariates
#> First-stage F: stat = 32.627
#> Sargan: stat = 3.342, p = 0.765
#> CD: stat = 3.319, p = 0.768
The package will allow you to estimate (leave-out) leniency measures:
out = ijive(
guilt ~ i(black) + i(white) | bailDate | jail3 ~ 0 | judge_pre,
data = stevenson,
return_leniency = TRUE
)
stevenson$judge_lo_leniency = out$That
hist(stevenson$judge_lo_leniency, breaks = 30, xlab = "Judge leave-one-out leniency", main = NULL)
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.4 ✔ readr 2.1.5
#> ✔ forcats 1.0.0 ✔ stringr 1.5.1
#> ✔ ggplot2 3.5.2.9000 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
#> ✔ purrr 1.0.4
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
judge_summary <- stevenson |>
summarize(
.by = judge_pre,
judge_leniency = mean(judge_lo_leniency),
prop_jail3 = mean(jail3),
prop_guilt = mean(guilt),
)
# First-stage plot
ggplot(judge_summary, aes(x = judge_leniency, y = prop_jail3)) +
geom_point() +
# using `lm` because we have so few judges in our dataset
stat_smooth(
formula = y ~ x,
method = "lm",
geom = "ribbon",
color = "#e64173",
fill = NA,
linetype = "dashed",
linewidth = 1.25
) +
stat_smooth(
formula = y ~ x,
method = "lm",
geom = "line",
color = "#e64173",
linewidth = 1.25
) +
labs(title = "First-stage", x = "Judge leniency", y = "Judge pre-trial detention rate") +
theme_bw()
# Reduced-form plot
ggplot(judge_summary, aes(x = judge_leniency, y = prop_guilt)) +
geom_point() +
# using `lm` because we have so few judges in our dataset
stat_smooth(
formula = y ~ x,
method = "lm",
geom = "ribbon",
color = "#e64173",
fill = NA,
linetype = "dashed",
linewidth = 1.25
) +
stat_smooth(
formula = y ~ x,
method = "lm",
geom = "line",
color = "#e64173",
linewidth = 1.25
) +
labs(title = "Reduced Form", x = "Judge leniency", y = "Judge guilty verdict rate") +
theme_bw()
library(tidyverse)
library(fixest)
# Take residuals from first-stage but add back in judge fixed effects
# This is what Dobbie, Goldin, and Yang do in Figure 1
est_fs <- feols(
jail3 ~ 0 + i(black) + i(white) | judge_pre + bailDate,
data = stevenson
)
stevenson$resid <- resid(est_fs) +
predict(est_fs, fixef = TRUE)[, "judge_pre"]
# First-stage plot
ggplot(stevenson, aes(x = judge_lo_leniency, y = resid)) +
stat_smooth(
geom = "ribbon",
method = "lm",
formula = y ~ x,
color = "#e64173",
fill = NA,
linetype = "dashed",
linewidth = 1.25
) +
stat_smooth(
geom = "line",
method = "lm",
formula = y ~ x,
color = "#e64173",
linewidth = 1.25
) +
labs(
title = "First-stage",
y = "Residualized rate of pretrial release",
x = "Judge Leniency (Leave-out measure)"
) +
theme_bw()
Consider the following instrumental variables setup
where
Then, the prediction,
When the dimension of
In general, the JIVE estimator (and variants) are given by
where
Source: Kolesar (2013) and Angrist, Imbens, and Kreuger (1999)
The original JIVE estimate produces
where
Source: Kolesar (2013)
For UJIVE, a leave-out procedure is used in the first-stage for fitted
values
Source: Ackerberg and Devereux (2009)
The IJIVE procedure, first residualizes
Note that
Source: Frandsen, Leslie, and McIntyre (2023)
This is a modified version of IJIVE as proposed by Frandsen, Leslie, and McIntyre (2023). This is necessary if the errors are correlated within clusters (e.g. court cases assigned on the same day to the same judge). The modified version is given by:
where
In this package, the same adjustment (replacing the diagonal
Heteroskedastic-robust standard errors are given by
where
Quoting from the papers that propose each:
- UJIVE: “UJIVE is consistent for a convex combination of local average treatment effects under many instrument asymptotics that also allow for many covariates and heteroscedasticity”
- IJIVE: “We introduce two simple new variants of the jackknife instrumental variables (JIVE) estimator for overidentified linear models and show that they are superior to the existing JIVE estimator, significantly improving on its small-sample-bias properties”
- CJIVE: “In settings where inference must be clustered, however, the [IJIVE] fails to eliminate the many-instruments bias. We propose a cluster-jackknife approach in which first-stage predicted values for each observation are constructed from a regression that leaves out the observation’s entire cluster, not just the observation itself. The cluster-jackknife instrumental variables estimator (CJIVE) eliminates many-instruments bias, and consistently estimates causal effects in the traditional linear model and local average treatment effects in the heterogeneous treatment effects framework.”
This package uses a ton of algebra tricks to speed up the computation of
the JIVE (and variants). First, the package is written using fixest
which estiamtes high-dimensional fixed effects very quickly. All the
credit to @lrberge for this.
All uses of the projection matrix, fixest
estimates.
For IJIVE, another trick is used. Instead of actually residualizing
everything, it’s useful to use the original fixest
to
estimate them. For this reason, we can use block projection matrix
decomposition
to transform the hat matrix into two components that are easier to
calculate with fixest
regressions:
Then, through some algebra
Angrist, Imbens, and Krueger (1999)