Drawing Inferences while Accounting for Survey Error
quantified-survey-error.Rmd
Because data from the American Community Survey are–you guessed it–survey data, they are subject to sampling error. Sampling error means, in very brief, that when we extrapolate from our sample–say, 200 households out of a neighborhood population of 1,000 households–we make assumptions that the sample is representative of the population.1 These assumptions are inherently imperfect, and so the estimates we use based on sample responses have a (quantifiable) amount of error associated with them.
Using the known error around an estimate allows us to better say what we do and do not know about the population. Without this information, we might say, based on our sample of 20 households, that 5% of neighborhood households are housing cost-burdened renter households. But when we factor in the error around the estimate, we might say something more nuanced, like: “Between 3.5% and 6.5% of households are housing cost-burdened renter households.” This reflects that survey estimates are more accurately thought of as a range of likely values, rather than a single absolute value.
Using quantified error around survey estimates will allow us to do three (at least) critical classes of activities to improve our inference-making with these data:
Evaluate estimate quality to inform whether we should, for example, aggregate our data to create more precise estimates or select a different measure or data source.
Conduct statistical significance testing to evaluate whether an estimate is statistically significantly greater than another estimate(s).
Report and visualize error to our audiences so that they can understand the limitations of the data.
Below, we illustrate how urbnindicators facilitates each of these three classes of activities. But first, a very quick overview of measures of error.
Margins Of Error (MOE) are our baseline measurement of error. The ACS reports MOEs at a 90% confidence level, which enables us to say something like: “20% (±5%) of households in the neighborhood have no access to a car.” (And then we should include, either as a footnote or in the body of the document, that this and other MOEs are calculated at the 90% confidence level). What this means in practice is that if we were to repeat 100 times–using exactly the same methods–our methods to calculate this estimate, 90 of those times we would produce a parallel estimate between 15% and 25%, while 10 of those times, our estimate would fall outside this range.
Standard Errors (SE) are derived from MOEs by dividing the MOE against a confidence level-related value. urbnindicators returns 90% SEs, which are calculated by dividing an MOE by 1.645.
Coefficients of Variation (CV) relate error to the size of the estimate. They are calculated by dividing the SE by the estimate and then multiplying by 100. CVs are helpful because they provide a unit-agnostic measurement of variable quality: a CV of 40 for a given variable means the same thing as a CV of 40 for another unrelated variable, whereas MOEs and SEs are not directly comparable across variables.
acs_df_county = compile_acs_data(
years = c(2022),
geography = "county",
states = "NJ")
#> Downloading: 15 kB Downloading: 15 kB Downloading: 66 kB Downloading: 66 kB Downloading: 66 kB Downloading: 66 kB Downloading: 87 kB Downloading: 87 kB Downloading: 95 kB Downloading: 95 kB Downloading: 110 kB Downloading: 110 kB Downloading: 110 kB Downloading: 110 kB Downloading: 130 kB Downloading: 130 kB Downloading: 150 kB Downloading: 150 kB Downloading: 150 kB Downloading: 150 kB Downloading: 160 kB Downloading: 160 kB Downloading: 160 kB Downloading: 160 kB Downloading: 180 kB Downloading: 180 kB Downloading: 200 kB Downloading: 200 kB Downloading: 200 kB Downloading: 200 kB Downloading: 220 kB Downloading: 220 kB Downloading: 220 kB Downloading: 220 kB Downloading: 250 kB Downloading: 250 kB Downloading: 250 kB Downloading: 250 kB Downloading: 250 kB Downloading: 250 kB Downloading: 260 kB Downloading: 260 kB Downloading: 280 kB Downloading: 280 kB Downloading: 280 kB Downloading: 280 kB Downloading: 280 kB Downloading: 280 kB Downloading: 280 kB Downloading: 280 kB Downloading: 290 kB Downloading: 290 kB Downloading: 290 kB Downloading: 290 kB Downloading: 310 kB Downloading: 310 kB Downloading: 310 kB Downloading: 310 kB Downloading: 310 kB Downloading: 310 kB Downloading: 380 kB Downloading: 380 kB Downloading: 440 kB Downloading: 440 kB Downloading: 440 kB Downloading: 440 kB Downloading: 510 kB Downloading: 510 kB Downloading: 510 kB Downloading: 510 kB Downloading: 520 kB Downloading: 520 kB Downloading: 570 kB Downloading: 570 kB Downloading: 570 kB Downloading: 570 kB Downloading: 590 kB Downloading: 590 kB Downloading: 640 kB Downloading: 640 kB Downloading: 710 kB Downloading: 710 kB Downloading: 710 kB Downloading: 710 kB Downloading: 710 kB Downloading: 710 kB Downloading: 780 kB Downloading: 780 kB Downloading: 780 kB Downloading: 780 kB Downloading: 780 kB Downloading: 780 kB Downloading: 840 kB Downloading: 840 kB Downloading: 840 kB Downloading: 840 kB Downloading: 840 kB Downloading: 840 kB Downloading: 910 kB Downloading: 910 kB Downloading: 970 kB Downloading: 970 kB Downloading: 970 kB Downloading: 970 kB Downloading: 970 kB Downloading: 970 kB Downloading: 1 MB Downloading: 1 MB Downloading: 1 MB Downloading: 1 MB Downloading: 1.1 MB Downloading: 1.1 MB Downloading: 1.1 MB Downloading: 1.1 MB Downloading: 1.1 MB Downloading: 1.1 MB Downloading: 1.1 MB Downloading: 1.1 MB Downloading: 1.1 MB Downloading: 1.1 MB Downloading: 1.3 MB Downloading: 1.3 MB Downloading: 1.3 MB Downloading: 1.3 MB Downloading: 1.3 MB Downloading: 1.3 MB Downloading: 1.3 MB Downloading: 1.3 MB Downloading: 1.4 MB Downloading: 1.4 MB Downloading: 1.4 MB Downloading: 1.4 MB Downloading: 1.4 MB Downloading: 1.4 MB Downloading: 1.5 MB Downloading: 1.5 MB Downloading: 1.5 MB Downloading: 1.5 MB Downloading: 1.5 MB Downloading: 1.5 MB Downloading: 1.6 MB Downloading: 1.6 MB Downloading: 1.6 MB Downloading: 1.6 MB Downloading: 1.7 MB Downloading: 1.7 MB Downloading: 1.7 MB Downloading: 1.7 MB Downloading: 1.7 MB Downloading: 1.7 MB Downloading: 1.7 MB Downloading: 1.7 MB Downloading: 1.9 MB Downloading: 1.9 MB Downloading: 2 MB Downloading: 2 MB Downloading: 2 MB Downloading: 2 MB Downloading: 2.1 MB Downloading: 2.1 MB Downloading: 2.2 MB Downloading: 2.2 MB Downloading: 2.2 MB Downloading: 2.2 MB Downloading: 2.2 MB Downloading: 2.2 MB Downloading: 2.3 MB Downloading: 2.3 MB Downloading: 2.3 MB Downloading: 2.3 MB Downloading: 2.3 MB Downloading: 2.3 MB Downloading: 2.5 MB Downloading: 2.5 MB Downloading: 2.5 MB Downloading: 2.5 MB Downloading: 2.5 MB Downloading: 2.5 MB Downloading: 2.5 MB Downloading: 2.5 MB Downloading: 2.6 MB Downloading: 2.6 MB Downloading: 2.8 MB Downloading: 2.8 MB Downloading: 2.8 MB Downloading: 2.8 MB Downloading: 2.8 MB Downloading: 2.8 MB Downloading: 2.9 MB Downloading: 2.9 MB Downloading: 3 MB Downloading: 3 MB Downloading: 3 MB Downloading: 3 MB Downloading: 3.1 MB Downloading: 3.1 MB Downloading: 3.2 MB Downloading: 3.2 MB Downloading: 3.3 MB Downloading: 3.3 MB Downloading: 3.3 MB Downloading: 3.3 MB Downloading: 3.4 MB Downloading: 3.4 MB Downloading: 3.4 MB Downloading: 3.4 MB Downloading: 3.4 MB Downloading: 3.4 MB Downloading: 3.5 MB Downloading: 3.5 MB Downloading: 3.5 MB Downloading: 3.5 MB Downloading: 5.1 MB Downloading: 5.1 MB Downloading: 6.7 MB Downloading: 6.7 MB Downloading: 8.4 MB Downloading: 8.4 MB Downloading: 10 MB Downloading: 10 MB Downloading: 12 MB Downloading: 12 MB Downloading: 12 MB Downloading: 12 MB Downloading: 12 MB Downloading: 12 MB
acs_df_tract = compile_acs_data(
years = c(2022),
geography = "tract",
states = "NJ",
spatial = TRUE)
#> | | | 0% | |==== | 6% | |======================= | 33% | |======================================= | 56% | |======================================================= | 79% | |======================================================================| 100%
Evaluate Estimate Quality
CVs allow us to assess whether estimates have problematically large errors. While there’s not a right-or-wrong threshold for what constitutes a good/bad CV, many people employ thresholds between 30 and 40 (that is, where the error is 30-40% of the size of the estimate).
As shown below, variables that rely on larger sample sizes tend to have smaller CVs. Typically, there are two strategies to reduce CVs: (1) aggregate estimates, either across geographies or across variables, or (2) use larger geographies.
We plan to add utilities to support users in aggregating estimates and calculating adjusted measurements of error. For now, we warn that any aggregation should be done with care, as error cannot be simply added (or otherwise summarize) the way that estimates can.
plot_df = bind_rows(
acs_df_county %>% dplyr::mutate(geography = "County"),
acs_df_tract %>% dplyr::mutate(geography = "Tract")) %>%
sf::st_drop_geometry() %>%
dplyr::select(
geography,
c(dplyr::matches("^age.*percent.*CV") & dplyr::matches("(_6|_7|_8)"))) %>%
dplyr::rename_with(
.cols = dplyr::everything(),
.fn = ~ .x %>%
stringr::str_replace_all(c("_" = " ", "percent" = "(%)", "age|CV" = "")) %>%
stringr::str_squish() %>% stringr::str_trim()) %>%
tidyr::pivot_longer(-geography) %>%
dplyr::mutate(plot_title = stringr::str_c(geography, ": ", name))
factor_levels = plot_df %>%
dplyr::arrange(name) %>%
dplyr::distinct(plot_title, .keep_all = TRUE) %>%
dplyr::pull(plot_title)
plot_df %>%
dplyr::mutate(
plot_title = factor(plot_title, levels = factor_levels, ordered = TRUE)) %>%
ggplot2::ggplot() +
ggplot2::geom_histogram(
ggplot2::aes(x = value, fill = if_else(value < 30, "1", "0"))) +
ggplot2::geom_vline(xintercept = 30, linetype = "dashed") +
ggplot2::facet_wrap(~ plot_title, ncol = 2, scales = "free") +
ggplot2::guides(fill = "none") +
urbnthemes::theme_urbn_print() +
theme(axis.text.y = element_blank()) +
ggplot2::labs(
x = "CV",
y = "Distribution")
Conduct Statistical Significance Testing
Statistical significance testing is critical to understanding whether estimates are meaningfully different. Estimates that may appear substantially different in isolation are frequently, especially at smaller geographies, not statistically significantly different because there errors are so significant.
We’ll illustrate this numerically and demonstrate the impacts of
accounting for error when visualizing data, leveraging
tidycensus::significance()
to conduct our actual tests.
Here, we’ll compare each tract-level value within a single county to
that of the corresponding county.
## utility to help us derive an MOE from a CV
cv_to_moe = function(cv, estimate) {
cv / 100 * 1.645 * estimate
}
plot_data = acs_df_tract %>%
dplyr::filter(str_detect(NAME, "Atlantic")) %>%
dplyr::select(GEOID, dplyr::matches("age_over_64_percent")) %>%
dplyr::mutate(county_geoid = stringr::str_sub(GEOID, 1, 5)) %>%
dplyr::left_join(
acs_df_county %>%
dplyr::filter(stringr::str_detect(NAME, "Atlantic")) %>%
dplyr::select(GEOID, dplyr::matches("age_over_64_percent")) %>%
dplyr::rename_with(
.cols = dplyr::matches("age_over_64_percent"),
.fn = ~ stringr::str_c(.x, "_county")),
by = c("county_geoid" = "GEOID")) %>%
dplyr::mutate(
naive_difference = dplyr::case_when(
age_over_64_percent > age_over_64_percent_county ~ "Larger",
age_over_64_percent < age_over_64_percent_county ~ "Smaller",
TRUE ~ "Equal"),
significance = tidycensus::significance(
est1 = age_over_64_percent,
est2 = age_over_64_percent_county,
moe1 = cv_to_moe(
cv = age_over_64_percent_CV, estimate = age_over_64_percent),
moe2 = cv_to_moe(
cv = age_over_64_percent_CV_county, estimate = age_over_64_percent_county),
clevel = 0.9),
statistical_difference = dplyr::case_when(
significance == FALSE ~ "Not significant",
significance == TRUE & age_over_64_percent > age_over_64_percent_county ~
"Larger",
significance == TRUE & age_over_64_percent < age_over_64_percent_county ~
"Smaller"))
plot_data %>%
dplyr::select(GEOID, naive_difference, statistical_difference) %>%
tidyr::pivot_longer(cols = -c(GEOID, geometry)) %>%
dplyr::mutate(
name = if_else(
name == "naive_difference",
"Difference in point estimates",
"Statistically-significant difference")) %>%
ggplot2::ggplot() +
ggplot2::geom_sf(aes(fill = value)) +
ggplot2::scale_fill_manual(
values = c(
"Larger" = "#73bfe2",
"Smaller" = "#fdd870",
"Not significant" = "lightgrey")) +
urbnthemes::theme_urbn_map() +
ggplot2::facet_wrap(~ name) +
ggplot2::theme(legend.position = "bottom", legend.direction = "horizontal") +
ggplot2::labs(
fill = "",
title = "Many Tract-Level Estimates Are Not Statistically Significantly Different from the County-Level Estimate" %>% stringr::str_wrap(90),
subtitle = "Tract-level share of the population 65+ vs. county-level estimate, Atlantic County, NJ")
signficance_test_data = plot_data %>%
dplyr::filter(GEOID %in% c("34001011901", "34001001900"))
## TRUE
significant_difference = tidycensus::significance(
est1 = signficance_test_data %>%
dplyr::filter(GEOID == "34001001900") %>%
dplyr::pull(age_over_64_percent),
est2 = signficance_test_data %>%
dplyr::filter(GEOID == "34001011901") %>%
dplyr::pull(age_over_64_percent),
moe1 = cv_to_moe(
cv = signficance_test_data %>%
dplyr::filter(GEOID == "34001001900") %>%
dplyr::pull(age_over_64_percent_CV),
estimate = signficance_test_data %>%
dplyr::filter(GEOID == "34001001900") %>%
dplyr::pull(age_over_64_percent)),
moe2 = cv_to_moe(
cv = signficance_test_data %>%
dplyr::filter(GEOID == "34001011901") %>%
dplyr::pull(age_over_64_percent_CV),
estimate = signficance_test_data %>%
dplyr::filter(GEOID == "34001011901") %>%
dplyr::pull(age_over_64_percent)))
plot_data %>%
dplyr::filter(GEOID %in% c("34001011901", "34001001900")) %>%
ggplot2::ggplot(aes(y = GEOID, x = age_over_64_percent, color = GEOID)) +
ggplot2::geom_point() +
ggplot2::geom_errorbar(
ggplot2::aes(
xmin = age_over_64_percent -
cv_to_moe(
cv = age_over_64_percent_CV,
estimate = age_over_64_percent),
xmax = age_over_64_percent +
cv_to_moe(
cv = age_over_64_percent_CV,
estimate = age_over_64_percent)),
width = 0.2) +
urbnthemes::theme_urbn_print() +
ggplot2::labs(
x = "Share of population over 64",
y = "Tract",
title = "One Tract Estimate Triples That of the Other, But the Difference is Barely Statistically Significant Owing to the Error" %>% str_wrap(100))
Report and Visualize Error
The ability to quantify error both allows us to conduct statistical significance testing and to directly report error, either in text and/or via visualizations. As illustrated above, MOEs can be reported directly alongside estimates in the text as parentheticals: estimate (±MOE). A similar approach can be employed for statistical significance testing, with statistical significance being notated as a “*”, for example, or via a longer “p < .10” statement.
For most such cases, it will be helpful to provide some guidance to your audience about what the MOE/statistical significance means and how it should shape their interpretation of point estimates, such as via a footnote/endnote, through a call-out box, or in a section on your methods.
In tables, you can either present MOEs/statistical significance in the same cell as the point estimates or via a companion column. Depending on your goals, CVs may be used to suppress altogether some estimates within a table, though this approach also has limitations (e.g., it may lead to obscuring important but small-population groups or geographies, not just in a single table but across many such tables). If you’re struggling with presenting results because many estimates are imprecise, this may be an indication that you need to re-evaluate your choice of geography and/or explore opportunities for collapsing variables to obtain more precise estimates.
There are myriad approaches to visualizing error. Generally, the use
of error bars (look no further than the visual immediately above) should
be avoided, as research consistently shows that most audiences do not
interpret such bars accurately. The excellent
library(ggdist)
provides an array of methods for different
styles and methods of visualizing error, while visualizing statistical
significance relative to a benchmark (see the map above) is another
common approach.
Limitations of Calculated Errors
MOEs for raw ACS estimates–i.e., those that are directly reported by
the Census Bureau–should be accurate measurements of survey error. This
is because these errors are calculated after accounting for estimate
covariance using individual-level responses (see footnote 1). However,
the calculations of error for derived estimates–that is, estimates that
library(urbnindicators)
calculates behind the scenes using
two or more raw ACS estimates are imperfect for at least two
reasons:
Errors for derived estimates do not account for covariance;
Errors for derived estimates are calculated using (Census Bureau-recommended) formulae that inherently inflate error when multiple variables are combined. For example,
age_over_64_percent
(see figure 1) is calculated by summing numerous raw estimates to produce the numerator and then dividing this numerator by the table universe. As shown, this process produces a much more precise estimate than is available for any of the component variables used to calculate the numerator–but the error for this newly-derived estimate is likely larger than the directly-calculated (and more accurate) error when using individual-level data.
So what to do? The Census Bureau produces a set of auxiliary variance
replicate estimate tables that allow users to calculate precise
errors akin to those calculated using individual-level responses. We’re
considering whether to integrate these tables into
library(urbnindicators)
. In the interim, referring to the
codebook can inform users’ understandings of the number of raw estimates
used to calculate a given variable, which in turn can shape their
thinking around the precision or possible imprecision of associated
errors.