Skip to contents

urbnindicators makes a number of opinionated design choices. “Opinionated” doesn’t mean that these decisions are the best ones for every user or use-case, but these decisions are designed to either speed or improve the accuracy of common workflows.

Design choices

  • Support geographies from the tract level and up. Block groups are not supported because the margins of error for block group-level estimates are often so large as to make the estimates meaningless. Further, many estimates available from the tract level and up are not available for block groups.

  • Support five-year estimates only. One-year estimates bring margin of error challenges, even for relatively larger-population geographies, such as tracts, zip codes, and some places and counties.

  • Rename all variables. The default variable names returned by the API are not human-friendly. Not only is it challenging to determine what a given variable represents when you’re looking at a name like B01001_001E, but when you’re looking at a dozen or a hundred such variables, it’s very easy to accidentally misinterpret or mis-select the variable(s) you want. For these reasons, we apply more meaningful names to every returned variable while retaining consistency of variable names from within the same table so that it’s easy to select and operate on sets of interrelated variables. The downside of this approach is that the default API variable names are used in other publications, and that you will find no documentation anywhere (apart from the codebook returned by this package!) of a variable named, for example, race_personofcolor_percent. Variables in the codebook have their original API names included in their definitions so that you can cross-reference these as needed.

  • Use a consistent variable naming convention. Variable names follow the pattern [concept]_[subconcept]_[characteristic]_[metric]. For example, race_nonhispanic_white_alone_percent. The _percent suffix always denotes a derived percentage, _universe denotes the denominator used to calculate percentages for that table, and _M denotes a margin of error. This consistency makes it easy to use dplyr::matches() or dplyr::starts_with() to select related groups of variables.

  • Express percentages on a 0–1 scale. All derived percentages are expressed as proportions (e.g., 0.25 rather than 25). This avoids ambiguity and simplifies downstream calculations (e.g., multiplying a proportion by a population count). Use scales::percent() for display formatting. You can always just multiply these values (and the MOEs) by 100 if you prefer; this multiplication requires no other adjustments to the MOEs.

  • Always propagate margins of error. When urbnindicators derives a new variable from two or more raw ACS estimates, it also calculates a margin of error for that derived variable using Census Bureau-recommended formulae. This means every _percent variable in the output has a corresponding _percent_M variable. These derived MOEs have known limitations (see vignette("quantified-survey-error")) but are far preferable to dropping error information entirely.