Read IPUMS data leveraging a local cache
read_ipums_cached.RdThis script wraps a standard ipumsr::read_ipums*() query workflow, addressing two common challenges: (1) the default workflow downloads arbitrarily named raw data files that are sequentially numbered and dependent on the total number of extracts submitted by a given user; and (2) the default workflow does not provide an inbuilt capacity to check for a local version of the query before re-submitting to the API.
This script addresses these challenges by taking a user-supplied filename and file directory, checking if there is an existing file at that path, and otherwise downloading the extract (again user-specified) to the given filepath.
Value
A tibble containing IPUMS data corresponding to the supplied extract_definition. The structure varies by collection type:
- For microdata collections (e.g., "usa", "cps")
Returns individual-level records with columns corresponding to the variables specified in the extract definition. Column names and types are determined by IPUMS variable specifications. The data are read via
ipumsr::read_ipums_micro().- For aggregate collections ("nhgis", "ihgis")
Returns aggregate data (typically at geographic summary levels) with columns corresponding to the requested tables/variables. IPUMS variable attributes are applied via the collection's codebook. The data are read via
ipumsr::read_ipums_agg().
If a cached file exists at the specified path and refresh = FALSE, the cached data are returned with a warning. Otherwise, the extract is submitted to IPUMS, downloaded, and cached for future use.
Examples
if (FALSE) { # \dontrun{
read_ipums_cached(
filename = "acs_insurance_race_2022_1yr_repweights",
download_directory = file.path("data"),
extract_definition = ipumsr::define_extract_micro(
collection = "usa",
description = "2022 ACS 1-year sample with replicate weights - insurance and race",
samples = c("us2022a"),
variables = list(
"HCOVANY",
ipumsr::var_spec("RACE", case_selections = c("1", "2")))),
refresh = FALSE)
} # }