Martin Feldstein, Joel Slemrod, and Shlomo Yitzhaki

This document seeks to replicate a 1980 paper by Martin Feldstein, Joel Slemrod, and Shlomo Yitzhaki titled “The Effects of Taxation on the Selling of Corporate Stock and The Realization of Capital Gains” with and without formal privacy.

Preprocessing

Load Packages and Tax Functions

library(tidyverse)
library(stargazer)
library(gt)
library(broom)
library(yardstick)
library(DT)

theme_set(theme_minimal())
theme_update(plot.title.position = "plot")

source(here::here("tax-code", "puf.preprocess.R"))
source(here::here("tax-code", "utility.taxcalc.R"))
source(here::here("tax-code", "utility.taxcalc.prep_for_tax_calc.R"))
source(here::here("tax-code", "utility.taxcalc.run_calculator_distribution.R"))
source(here::here("tax-code", "utility.taxcalc.run_calculator_reform.R"))

Impute Age 65

Age is missing for about half of observations in the data. This will create issues for replicating Feldstein, Slemrod, and Yitzhaki (1980).

We impute age by assigning observations with missing age to age 65+ if the record has non-zero total pensions and annuities received, non-zero pensions and annuities in AGI, or Social Security benefits. All other records with missing age are assigned to ages 0-64. This is crude but should be somewhat precise for records with more than $3,000 in dividends. Furthermore, the estimates with and without noise will be flawed in the same way.

df <- synthpuf::puf_2012 %>%
  # drop the aggregate observations
  filter(!RECID %in% 999996:999999)

# impute age 65
df <- df %>%
  mutate(
    age65 = case_when(
      AGERANGE == 6 ~ "Age 65+",
      AGERANGE == 3 & DSI == 1 ~ "Age 65+",
      # total pensions and annuities received
      E01500 > 0 ~ "Age 65+",
      # pensions and annuities in AGI
      E01700 > 0 ~ "Age 65+",
      # social security benefits
      E02400 > 0 ~ "Age 65+",
      TRUE ~ "Age 0-64"
    )
  )

df <- df %>%
  mutate(AGERANGE = if_else(is.na(AGERANGE) & age65 == "Age 65+", 6, 0))

# standard preprocessing
puf <- puf.preprocess(df)

Marginal Tax Rates

“Last dollar” marginal tax rates on long-term capital gains are endogenous to long-term capital gains because filers may adjust their capital gains to hit lower tax rates through timing sales or manipulating reporting. We calculate the last dollar and first dollar marginal tax rates on long-term capital gains. The last dollar rate is calculated by running a tax calculator with observed long-term capital gains and then running a tax calculator with observed long-term capital gains plus one penny and then comparing the amount of taxes paid. The first dollar rate is calculated by running a tax calculator with zero long-term capital gains and then running a tax calculator with one penny of long-term capital gains and then comparing the amount of taxes paid.

Run the Open Source Policy Center’s tax calculator to calculate both marginal capital gains rates.

# calculate first dollar marginal tax rate on capital gains
results_first_dollar <- 
  puf %>%
  mutate(P23250 = 0) %>%
  utility.taxcalc(data = .,
                  year = 2012,
                  enforce_constraints = TRUE,
                  condaenv = "taxcalc-dev",
                  conda = "auto")

# calculate last dollar marginal tax rate on capital gains
results_last_dollar <- 
  utility.taxcalc(data = puf,
                  year = 2012,
                  enforce_constraints = TRUE,
                  condaenv = "taxcalc-dev",
                  conda = "auto")

# create one output data set
puf <- bind_cols(
  puf,
  mcg_first_dollar = results_first_dollar[[3]],
  mcg_last_dollar = results_last_dollar[[3]]
)

Visualize the results:

puf %>%
  filter(mcg_first_dollar < 1) %>%
  ggplot(aes(mcg_first_dollar)) +
  geom_density() +
  labs(title = "Density of Marginal Tax Rate on the First Dollar of Long-Term Capital Gains", 
       x = "Marginal Tax Rate on the First Dollar of Long-Term Capital Gains")

puf %>%
  filter(mcg_last_dollar < 1) %>%
  ggplot(aes(mcg_last_dollar)) +
  geom_density() +
  labs(title = "Density of Marginal Tax Rate on the Last Dollar of Long-Term Capital Gains", 
       x = "Marginal Tax Rate on the Last Dollar of Long-Term Capital Gains")

Prepare the necessary variables for the regression analysis.

# prepare variables
final_vars <- puf %>%
  mutate(
    net_agi = pmax(E00100 - P23250, 0.00001)
  ) %>%
  select(
    ltcg = P23250, 
    dividends = E00600,
    agi = E00100,
    net_agi,
    mcg_first_dollar,
    mcg_last_dollar,
    age65
  ) %>%
  mutate(ltcg_dividends = ltcg / dividends)

Results Without Noise

Last Dollar

Feldstein et al. filtered observations to those with $3,000 or more in dividends in 1973. We adjust the $3,000 threshold into 2012 dollars using the CPI-U.

# create inflation adjusted threshold
analysis_adjusted <- final_vars %>%
  filter(dividends > 12513.11)

# estimate the model
model_adjusted <- lm(ltcg_dividends ~ 
                       mcg_last_dollar + 
                       factor(age65) + 
                       log(dividends) + 
                       log(net_agi), 
                     data = analysis_adjusted)
stargazer(model_adjusted, type = "html")
Dependent variable:
ltcg_dividends
mcg_last_dollar 72.714***
(5.067)
factor(age65)Age 65+ -2.762***
(0.732)
log(dividends) -3.929***
(0.267)
log(net_agi) -0.671***
(0.052)
Constant 52.159***
(3.115)
Observations 30,016
R2 0.018
Adjusted R2 0.018
Residual Std. Error 63.227 (df = 30011)
F Statistic 135.166*** (df = 4; 30011)
Note: p<0.1; p<0.05; p<0.01

Because of endogeneity, the estimated coefficient for the marginal tax rate on the last dollar of capital gains is positive.

First dollar

Here, we use the first marginal tax rate. The estimated coefficient is now negative.

# create inflation adjusted threshold
analysis_adjusted <- final_vars %>%
  filter(dividends > 12513.11)

# estimate the model
model_adjusted <- lm(ltcg_dividends ~ mcg_first_dollar + factor(age65) + log(dividends) + log(net_agi), 
                     data = analysis_adjusted)
stargazer(model_adjusted, type = "html")
Dependent variable:
ltcg_dividends
mcg_first_dollar -28.381***
(8.204)
factor(age65)Age 65+ -3.163***
(0.735)
log(dividends) -3.533***
(0.269)
log(net_agi) -0.433***
(0.062)
Constant 61.881***
(3.148)
Observations 30,016
R2 0.011
Adjusted R2 0.011
Residual Std. Error 63.431 (df = 30011)
F Statistic 86.129*** (df = 4; 30011)
Note: p<0.1; p<0.05; p<0.01


These results are pretty similar to equation (1) from Table 1 in Feldstein et al. (1980).

Variable Coefficient Standard Error
Tax -62.4 (5.98)
Age 65 -1.13 (0.557)
Log Dividends -2.80 (0.188)
Log Adjusted Net AGI -0.483 (0.184)
Intercept 38.8 (2.11)
boom <- analysis_adjusted %>%
  mutate(
    log_dividends = log(dividends),
    log_net_agi = log(net_agi),
    age65 = factor(age65)
  ) 

boom %>%
  select(ltcg_dividends, mcg_first_dollar, age65, log_dividends, log_net_agi) %>%
  GGally::ggpairs()

model_adjusted %>%
  ggplot(aes(ltcg_dividends)) +
  geom_density()

model_adjusted %>%
  ggplot(aes(ltcg_dividends)) +
  geom_density() +
  scale_x_continuous(limits = c(0, 100))

model_adjusted %>%
  ggplot(aes(mcg_first_dollar)) +
  geom_density() +
  labs(title = "First Dollar MCG")

boom %>%
  ggplot(aes(log_dividends)) +
  geom_density() +
  labs(title = "log(dividends)")

boom %>%
  ggplot(aes(log_net_agi)) +
  geom_density() +
  labs(title = "log(net agi)")

Noisy Results

Our tests generated thousands of results and the problem is highly multi-dimensional with multiple methods, multiple tuning parameters, multiple coefficients, and multiple metrics. Our strategy is

  • Look at one value of Epsilon and Delta
    • Look at the medians for metrics for a sense of the “average result”
    • Look at deciles for metrics for a sense of the spread of results
  • Look at results across values of Epsilon and Delta

We focus on six metrics

  1. Relative bias
  2. Confidence Interval Overlap
  3. Relative Confidence Interval Length
  4. Proportion with matched signs
  5. Proportion with matched significance
  6. Relative RMSE
coefficients <- tidy(model_adjusted, conf.int = TRUE) %>%
  mutate(
    ci_length = conf.high - conf.low,
    ci_contains_zero = conf.high > 0 & conf.low < 0
  ) %>%
  mutate(term = 
           recode(
             term,
             `factor(age65)Age 65+` = "age65Age 65+",
             `log(dividends)` = "log_dividends",
             `log(net_agi)` = "log_net_agi"
           )
  )

coefficients_bootstrap <- read_csv(here::here("results", "02_soi_regression_coefficient_asymptotic.csv"),
                                   col_types = cols(
                                     method = col_character(),
                                     epsilon = col_double(),
                                     replicate = col_double(),
                                     term = col_character(),
                                     estimate_results = col_double(),
                                     estimate_noisy_results = col_double(),
                                     bias = col_double(),
                                     ci_overlap = col_double(),
                                     ci_length = col_double(),
                                     sign_match = col_logical(),
                                     ci_contains_zero = col_logical(),
                                     delta = col_double(),
                                     n = col_double()
                                   )) %>%
  mutate(method = if_else(method == "Brawner and Honaker Method", "BHM", method)) %>%
  mutate(method = if_else(!is.na(n), paste0(method, " (n = ", n, ")"), method)) 

coefficients_bootstrap <- read_csv(here::here("results", "02_soi_regression_coefficient_bootstrap.csv"),
                                   col_types = cols(
                                     method = col_character(),
                                     epsilon = col_double(),
                                     replicate = col_double(),
                                     term = col_character(),
                                     estimate_results = col_double(),
                                     estimate_noisy_results = col_double(),
                                     bias = col_double(),
                                     ci_overlap = col_double(),
                                     ci_length = col_double(),
                                     sign_match = col_logical(),
                                     ci_contains_zero = col_logical(),
                                     delta = col_double(),
                                     n = col_double()
                                   )) %>%
  mutate(method = if_else(method == "Brawner and Honaker Method", "BHM", method)) %>%
  mutate(method = if_else(!is.na(n), paste0(method, " (n = ", n, ")"), method)) 

true_rmse <- analysis_adjusted %>%
  modelr::add_predictions(model_adjusted) %>%
  rmse(truth = ltcg_dividends, estimate = pred) %>%
  pull(.estimate)

rmse_asymptotic <- read_csv(here::here("results", "02_soi_regression_rmse_asymptotic.csv"), 
                            col_types = cols(
                              method = col_character(),
                              epsilon = col_double(),
                              replicate = col_double(),
                              rmse = col_double(),
                              delta = col_double(),
                              n = col_double()
                            )) %>%
  mutate(method = if_else(method == "Brawner and Honaker Method", "BHM", method)) %>%
  mutate(method = if_else(!is.na(n), paste0(method, " (n = ", n, ")"), method)) 

rmse_bootstrap <- read_csv(here::here("results", "02_soi_regression_rmse_bootstrap.csv"),
                           col_types = cols(
                              method = col_character(),
                              epsilon = col_double(),
                              replicate = col_double(),
                              rmse = col_double(),
                              delta = col_double(),
                              n = col_double()
                            )) %>%
  mutate(method = if_else(method == "Brawner and Honaker Method", "BHM", method)) %>%
  mutate(method = if_else(!is.na(n), paste0(method, " (n = ", n, ")"), method)) 
coefficients_bootstrap_summary <- coefficients_bootstrap %>%
  left_join(
    coefficients,
    by = "term",
    suffix = c("_noisy", "_original")
  ) %>%
  group_by(method, epsilon, term, delta, n) %>%
  summarize(
    `Median bias` = median(bias),
    `Median Relative Bias` = median(bias / estimate_results),
    `Median CI Overlap` = median(ci_overlap),
    `Median Relative CI Length` = median(ci_length_noisy / ci_length_original),
    `Prop Sign Match` = mean(sign_match),
    `Prop. Sig. Match` = mean(ci_contains_zero_original == ci_contains_zero_noisy)
  ) %>%
  ungroup()

Summary Table

This table summarizes the results across all methods for regression. Detailed data visualizations are in subsequent sections.

coefficients_bootstrap_summary %>%
  mutate(
    `Median Relative Bias` = round(`Median Relative Bias`, digits = 3),
    `Median CI Overlap` = round(`Median CI Overlap`, digits = 3),
    `Median Relative CI Length` = round(`Median Relative CI Length`, digits = 3)
  ) %>%
  datatable()

Compare Methods with \(\epsilon = 5\) and \(\delta = 0.000001\)

eps5 <- coefficients_bootstrap %>%
  left_join(
    coefficients,
    by = "term",
    suffix = c("_noisy", "_original")
  ) %>%
  filter(epsilon == 5, delta %in% c(NA, 0.000001))

eps5 <- eps5 %>%
  mutate(
    relative_bias = bias / estimate_results,
    relative_ci_length = ci_length_noisy / ci_length_original
  )

probs <- seq(0.1, 0.9, 0.1)

eps5_summary <- eps5 %>%
  group_by(method, term) %>%
  summarize(
    n(), 
    q = probs,
    bias = quantile(bias, probs = probs),
    relative_bias = quantile(relative_bias, probs = probs),
    ci_overlap = quantile(ci_overlap, probs = probs),
    relative_ci_length = quantile(relative_ci_length, probs = probs)
  ) 

rmse_bootstrap_summary <- rmse_bootstrap %>%
  group_by(method, epsilon, delta, n) %>%
  summarize(
    q = probs,
    relative_rmse = quantile(rmse / true_rmse, probs = probs)
  ) %>%
  ungroup() 

Bias

Median

eps5_summary %>%
  filter(q == 0.5) %>%
  ggplot(aes(bias, method)) +
  geom_point() +
  geom_vline(xintercept = 0, color = "red") +
  facet_wrap(~term, scales = "free_x") +
  labs(
    title = "Median Bias with Epsilon == 5 and Delta == 0.000001",
    y = NULL
  )

Deciles

eps5_summary %>%
  filter(!method %in% c("BHM (n = 10)", "BHM (n = 25)")) %>% 
  ggplot(aes(bias, method)) +
  geom_vline(xintercept = 0, color = "red") +  
  geom_point(alpha = 0.2) +
  facet_wrap(~term, scales = "free_x") +
  labs(
    title = "Bias with Epsilon == 5 and Delta == 0.000001",
    subtitle = "Deciles", 
    y = NULL,
    caption = "BHM dropped for clarity"
  )

Violin Plots

eps5 %>%
  filter(!method %in% c("BHM (n = 10)", "BHM (n = 25)")) %>%  
  ggplot(aes(bias, method)) +
  geom_vline(xintercept = 0, color = "red") + 
  geom_violin(draw_quantiles = 0.5, alpha = 0.2) +
  facet_wrap(~term, scales = "free_x") +
  labs(
    title = "Bias with Epsilon == 5 and Delta == 0.000001",
    y = NULL,
    caption = "BHM dropped for clarity"
  )

Relative Bias

Median

eps5_summary %>%
  filter(q == 0.5) %>%
  ggplot(aes(relative_bias, method)) +
  geom_point() +
  geom_vline(xintercept = 0, color = "red") +
  facet_wrap(~term, scales = "free_x") +
  labs(
    title = "Median Relative Bias with Epsilon == 5 and Delta == 0.000001",
    y = NULL
  )

Deciles

eps5_summary %>%
  filter(!method %in% c("BHM (n = 10)", "BHM (n = 25)")) %>% 
  ggplot(aes(relative_bias, method)) +
  geom_vline(xintercept = 0, color = "red") +  
  geom_point(alpha = 0.2) +
  facet_wrap(~term, scales = "free_x") +
  labs(
    title = "Relative Bias with Epsilon == 5 and Delta == 0.000001",
    subtitle = "Deciles", 
    y = NULL,
    caption = "BHM dropped for clarity"
  )

Violin plots

eps5 %>%
  filter(!method %in% c("BHM (n = 10)", "BHM (n = 25)")) %>%  
  ggplot(aes(relative_bias, method)) +
  geom_vline(xintercept = 0, color = "red") + 
  geom_violin(draw_quantiles = 0.5, alpha = 0.2) +
  facet_wrap(~term, scales = "free_x") +
  labs(
    title = "Relative Bias with Epsilon == 5 and Delta == 0.000001",
    y = NULL,
    caption = "BHM dropped for clarity"
  )

CI Overlap

Median

eps5_summary %>%
  filter(q == 0.5) %>%  
  ggplot(aes(ci_overlap, method)) +
  geom_point() +
  geom_vline(xintercept = 1, color = "red") +  
  facet_wrap(~term, scales = "free_x") +
  labs(
    title = "Median CI Overlap with Epsilon == 5 and Delta == 0.000001",
    y = NULL
  )

Deciles

eps5_summary %>%
  filter(!method %in% c("BHM (n = 10)", "BHM (n = 25)")) %>% 
  ggplot(aes(ci_overlap, method)) +
  geom_vline(xintercept = 1, color = "red") +  
  geom_point(alpha = 0.2) +
  facet_wrap(~term, scales = "free_x") +
  labs(
    title = "CI Overlap with Epsilon == 5 and Delta == 0.000001",
    subtitle = "Deciles",   
    y = NULL,
    caption = "BHM dropped for clarity"
  )

Violin plots

eps5 %>%
  filter(!method %in% c("BHM (n = 10)", "BHM (n = 25)")) %>% 
  ggplot(aes(ci_overlap, method)) +
  geom_vline(xintercept = 1, color = "red") + 
  geom_violin(draw_quantiles = 0.5, alpha = 0.2) +
  facet_wrap(~term, scales = "free_x") +
  labs(
    title = "CI Overlap with Epsilon == 5 and Delta == 0.000001",
    y = NULL,
    caption = "BHM dropped for clarity"
  )

Relative CI Length

Median

eps5_summary %>%
  filter(q == 0.5) %>%    
  ggplot(aes(relative_ci_length, method)) +
  geom_point() +
  geom_vline(xintercept = 1, color = "red") +    
  facet_wrap(~term, scales = "free_x") +
  labs(
    title = "Median Relative CI Length with Epsilon == 5 and Delta == 0.000001",
    y = NULL
  )

Deciles

eps5_summary %>%
  filter(!method %in% c("BHM (n = 10)", "BHM (n = 25)")) %>% 
  ggplot(aes(relative_ci_length, method)) +
  geom_vline(xintercept = 1, color = "red") +  
  geom_point(alpha = 0.2) +
  facet_wrap(~term, scales = "free_x") +
  labs(
    title = "Relative CI Length with Epsilon == 5 and Delta == 0.000001",
    subtitle = "Deciles", 
    y = NULL,
    caption = "BHM dropped for clarity"
  )

Violin plots

eps5 %>%
  filter(!method %in% c("BHM (n = 10)", "BHM (n = 25)")) %>%  
  ggplot(aes(relative_ci_length, method)) +
  geom_vline(xintercept = 1, color = "red") + 
  geom_violin(draw_quantiles = 0.5, alpha = 0.2) +
  facet_wrap(~term, scales = "free_x") +
  labs(
    title = "Relative CI Length with Epsilon == 5 and Delta == 0.000001",
    y = NULL,
    caption = "BHM dropped for clarity"
  )

Sign Match

Proportion

eps5 %>%
  group_by(term, method, n) %>%
  summarize(prop_sign_match = mean(sign_match)) %>%
  ggplot(aes(prop_sign_match, method)) +
  geom_col(position = "dodge") +
  geom_vline(xintercept = 1, color = "red") +  
  facet_wrap(~term) +
  labs(
    title = "Sign Match with Epsilon == 5 and Delta == 0.000001",
    y = NULL
  )

Significance Match

Proportion

eps5 %>%
  group_by(term, method, n) %>%
  summarize(prop_sig_match = mean(ci_contains_zero_original == ci_contains_zero_noisy)) %>%
  ggplot(aes(prop_sig_match, method)) +
  geom_col(position = "dodge") +
  geom_vline(xintercept = 1, color = "red") +  
  facet_wrap(~term) +
  labs(
    title = "Significance Match with Epsilon == 5 and Delta == 0.000001",
    y = NULL
  )

Relative RMSE

Median

rmse_bootstrap_summary %>%
  filter(q == 0.5) %>%
  filter(epsilon == 5, delta %in% c(NA, 0.000001)) %>%
  ggplot(aes(relative_rmse, method)) +
  geom_point() +
  geom_vline(xintercept = 1, color = "red") +
  labs(
    title = "Median Relative RMSE with Epsilon == 5 and Delta == 0.000001",
    y = NULL    
  )

Deciles

rmse_bootstrap_summary %>%
  filter(!method %in% c("BHM (n = 10)", "BHM (n = 25)")) %>% 
  filter(epsilon == 5, delta %in% c(NA, 0.000001)) %>%
  ggplot(aes(relative_rmse, method)) +
  geom_point(alpha = 0.2) +
  geom_vline(xintercept = 1, color = "red") +
  labs(
    title = "Relative RMSE with Epsilon == 5 and Delta == 0.000001",
    subtitle = "Deciles",
    y = NULL,
    caption = "BHM dropped for clarity"
  )

Violin plots

rmse_bootstrap %>%
  filter(!method %in% c("BHM (n = 10)", "BHM (n = 25)")) %>%  
  mutate(relative_rmse = rmse / true_rmse) %>%
  ggplot(aes(relative_rmse, method)) +
  geom_vline(xintercept = 1, color = "red") +
  geom_violin(draw_quantiles = 0.5) +
  labs(
    title = "Relative RMSE with Epsilon == 5 and Delta == 0.000001",
    y = NULL,
    caption = "BHM dropped for clarity"
  )

Across Values of Epsilon

coefficients_bootstrap_summary %>%
  filter(
    epsilon < 100, 
    delta %in% c(NA, 0.000001),
    !method %in% c("BHM (n = 10)", "BHM (n = 25)")
  ) %>%
  ggplot(aes(epsilon, `Median bias`)) +
  geom_hline(yintercept = 0, color = "red") +  
  geom_line(aes(group = method), alpha = 0.2) +  
  geom_point(aes(color = method), alpha = 0.4) +
  facet_wrap(~term, scales = "free_y") +
  labs(
    title = "Median Bias with Delta == 0.000001",
    caption = "BHM dropped for clarity"
  )

coefficients_bootstrap_summary %>%
  filter(
    epsilon < 100, 
    delta %in% c(NA, 0.000001),
    !method %in% c("BHM (n = 10)", "BHM (n = 25)")
  ) %>%
  ggplot(aes(epsilon, `Median Relative Bias`)) +
  geom_hline(yintercept = 0, color = "red") +  
  geom_line(aes(group = method), alpha = 0.2) +  
  geom_point(aes(color = method), alpha = 0.4) +
  facet_wrap(~term, scales = "free_y") +
  labs(
    title = "Median Relative Bias with Delta == 0.000001",
    caption = "BHM dropped for clarity"
  )

coefficients_bootstrap_summary %>%
  filter(
    epsilon < 100, 
    delta %in% c(NA, 0.000001),
    !method %in% c("BHM (n = 10)", "BHM (n = 25)")
  ) %>%
  ggplot(aes(x = epsilon, y = `Median CI Overlap`)) +
  geom_hline(yintercept = 1, color = "red") +  
  geom_line(aes(group = method), alpha = 0.2) +  
  geom_point(aes(color = method), alpha = 0.4) + 
  facet_wrap(~term, scales = "free_y") +
  labs(
    title = "Median CI Overlap with Delta == 0.000001",
    caption = "BHM dropped for clarity"
  )

coefficients_bootstrap_summary %>%
  filter(
    epsilon < 100, 
    delta %in% c(NA, 0.000001),
    !method %in% c("BHM (n = 10)", "BHM (n = 25)")
  ) %>%
  ggplot(aes(x = epsilon, y = `Median Relative CI Length`)) +
  geom_hline(yintercept = 1, color = "red") +  
  geom_line(aes(group = method), alpha = 0.2) +  
  geom_point(aes(color = method), alpha = 0.4) +
  facet_wrap(~term, scales = "free_y") +
  labs(
    title = "Median Relative CI Length with Delta == 0.000001",
    caption = "BHM dropped for clarity"
  )

coefficients_bootstrap_summary %>%
  filter(
    epsilon < 100, 
    delta %in% c(NA, 0.000001),
    !method %in% c("BHM (n = 10)", "BHM (n = 25)")
  ) %>%
  ggplot(aes(`Prop Sign Match`, method, fill = factor(epsilon))) +
  geom_col(position = "dodge") +
  geom_vline(xintercept = 1, color = "red") +
  facet_wrap(~term) +
  labs(
    title = "Proportion Sign Match with Delta == 0.000001",
    caption = "BHM dropped for clarity"
  )

coefficients_bootstrap_summary %>%
  filter(
    epsilon < 100, 
    delta %in% c(NA, 0.000001),
    !method %in% c("BHM (n = 10)", "BHM (n = 25)")
  ) %>%
  ggplot(aes(`Prop. Sig. Match`, method, fill = factor(epsilon))) +
  geom_col(position = "dodge") +
  geom_vline(xintercept = 1, color = "red") +  
  facet_wrap(~term) +
  labs(
    title = "Proportion Significance Match with Delta == 0.000001",
    caption = "BHM dropped for clarity"
  )

# RMSE
rmse_bootstrap_summary %>%
  filter(
    q == 0.5,
    epsilon < 100, 
    delta %in% c(NA, 0.000001),
    !method %in% c("BHM (n = 10)", "BHM (n = 25)")
  ) %>%
  ggplot(aes(epsilon, relative_rmse)) +
  geom_hline(yintercept = 1, color = "red") +  
  geom_line(aes(group = method), alpha = 0.2) +  
  geom_point(aes(color = method), alpha = 0.4) + 
  facet_wrap(~method, scales = "free_y") +
  labs(
    title = "Relative RMSE with Delta == 0.000001",
    caption = "BHM dropped for clarity"
  )

Works Cited

  • Feldstein, Martin, Joel Slemrod, and Shlomo Yitzhaki, 1980. “The Effects of Taxation on the Selling of Corporate Stock and the Realization of Capital Gains.” The Quarterly Journal of Economics.