Overview
urbnindicators aims to provide users with analysis-ready data from the American Community Survey (ACS).
With a single function call, you get:
Hundreds of standardized variables, such as percentages, in addition to the raw count variables used to produce them.
Meaningful, consistent variable names.
A codebook that describes how each variable is calculated.
The built-in capacity to pull data for multiple years and multiple states.
Supplemental measures, such as population density, that aren’t available from the ACS.
Built-in quality checks to help ensure that calculated variables are accurate.
Margins of error and coefficients of variation for (virtually) all variables.
Installation
Install the development version of urbnindicators
from GitHub with:
Note that this package is under active development with frequent updates–check to ensure you have the most recent version installed!
Use
Obtain data
# library(dplyr)
# library(tidyr)
# library(stringr)
# library(urbnindicators)
# library(ggplot2)
# library(urbnthemes)
# set_urbn_defaults(style = "print")
plot_data = compile_acs_data(
years = c(2017, 2022),
geography = "county",
states = "NJ") %>%
transmute(
county_name = NAME %>% str_remove(" County, New Jersey"),
race_personofcolor_percent,
data_source_year)
state_averages = plot_data %>%
group_by(data_source_year) %>%
summarize(mean_race_personofcolor_percent = mean(race_personofcolor_percent)) %>%
arrange(data_source_year) %>%
pull()
dumbbell_data = plot_data %>%
pivot_wider(
names_from = data_source_year,
values_from = race_personofcolor_percent,
names_prefix = "year_")
Plot data
ggplot() +
geom_segment(
data = dumbbell_data,
aes(
x = reorder(county_name, year_2017),
y = year_2017,
yend = year_2022),
color = palette_urbn_main[7],
linewidth = 1) +
geom_point(
data = plot_data,
aes(
x = reorder(county_name, race_personofcolor_percent),
y = race_personofcolor_percent,
color = factor(data_source_year))) +
annotate(
"text",
y = .31,
x = 21.5,
label = "State mean (2017)",
fontface = "bold.italic",
color = palette_urbn_main[1],
size = 9 / .pt) +
annotate(
"text",
y = .47,
x = 21.5,
label = "State mean (2022)",
fontface = "bold.italic",
color = palette_urbn_main[2],
size = 9 / .pt) +
geom_hline(
yintercept = state_averages[1],
linetype = "dashed",
color = palette_urbn_main[1]) +
geom_hline(
yintercept = state_averages[2],
linetype = "dashed",
color = palette_urbn_main[2]) +
labs(
title = "Change in NJ Counties' Racial Composition, 2017 to 2022",
subtitle = paste0(
"On average, counties in NJ saw a ",
round((state_averages[2] - state_averages[1]), digits = 3) * 100,
" pecentage point increase in the share of the population who are people of color from 2017 to 2022.") %>%
str_wrap(120),
x = "County",
y = "Share of population who are people of color",
color = "Year") +
scale_x_discrete(expand = expansion(mult = c(.03, 0.04))) +
scale_y_continuous(
breaks = c(0, .25, .50, .75, 1.0),
limits = c(0, .75),
labels = scales::percent) +
coord_flip() +
theme_urbn_print()
Learn More
A growing number of vignettes aim to support users in effectively using this package. These vignettes include:
A package overview to help users Get Started.
An interactive version of the package’s Codebook so that prospective users can know what to expect.
A brief description of the package’s Design Philosophy to clarify the use-cases that
urbnindicators
is built to support.An illustration of how Quantifying Survey Error can improve inference making.
Credits
This package is built on top of and enormously indebted to library(tidycensus)
, which provides the core functionality for accessing the Census Bureau API. For users who want additional variables, library(tidycensus)
exposes the entire range of pre-tabulated variables available from the ACS and provides access to ACS microdata and other Census Bureau datasets. Learn more here: https://walker-data.com/tidycensus/index.html.