15 Limitations
The methodology behind this tool has at least five limitations that users should take into account.
Inappropriate Baseline Dataset
Quantifying geographic representativeness of data requires some measure of ground truth (i.e., baseline measures) for comparison. The website tool allows a limited number of different baseline measures that are listed in Chapter 4. Though these baseline measures are sensible in many cases, they cannot apply to every possible use of the tool.
However, the API overcomes this issue by allowing users to upload custom baseline datasets. For more details, see Chapter 11.
Ecological Fallacy
The tool imputes demographic data from user-uploaded geographic data, which assumes that all data points coming from a census tract inherit the same attributes of that census tract. This is problematic because data points from a majority-white census tract could have been generated by nonwhite residents, and vice versa. Likewise, a program or resource located in a tract may not be used equally by all residents of the tract. These are examples of the the ecological fallacy.
We encourage users to try to incorporate information on which individuals within census tracts are more likely to use a resource or participate in a program where possible to contextualize the results provided by the tool.
Overly Exclusive Definition of Access
The tool currently assumes that only residents of the census tract where a data point resides have access to, or are provided services by, that point. In reality, many resources have catchment areas beyond their census tract. For example, senior centers, job-training centers, hospitals, and community colleges likely all provide services to residents who live outside of the census tract where the resource is located.
The Urban Institute Spatial Equity Data Tool team will be making updates to overcome this issue in 2024.
Too Few Data Points
This tool works best on medium to large datasets because the geographic unit of analysis for the demographic disparity score is the census tract. Although the tool can successfully run on fewer data points, a good rule of thumb is to aim for a number of points in the dataset at least as large as the number of tracts in the geography (e.g., approximately 73,000 for the US or hundreds of census tracts in a large city). With a small number of points, the tool generates less reliable disparity scores, as the vast majority of tracts will contain no points or just a few points.
Although we believe this tool may be useful on smaller datasets in certain cases, we recommend users rely on it for datasets that follow the rule of thumb above.
Analysis Does Not Span Geographies
This tool currently supports assessing disparities in a single geography (city, state, county, or the US). If a dataset spans multiple geographies, the tool will only operate on the most frequently occurring geography in the data and remove the remainder of the observations from the dataset. This is particularly problematic for regional analyses that span multiple geographies but do not cover the entire US.
This limitation can be mitigated by users repeating their analysis at a smaller geographic level. For instance, if interested in analyzing multiple states in the Southern US, instead of using the national-level scope, we recommend repeating the analysis for each of the states in the region of choice.
Definition of a City
As noted in Chapter 13, the tool’s operational definition for the boundary of a city might differ slightly from the official city boundary that the Census Bureau uses. The tool defines a city as all census tracts whose area is at least 1 percent covered by the relevant census place. Often the boundaries of census places and census tracts don’t overlap perfectly, meaning some tracts are only partially covered by the place boundary. This overinclusive definition will cause our tool to think that many cities—particularly small- and medium-sized ones—are bigger than they are, in both geographic size and population.
We encourage users conducting city-level analyses to clearly communicate this limitation when presenting findings informed by the calculations from the tool.