8 Interpreting the Results
You have run your analysis—but what exactly do the results mean? In this chapter, we describe the geographic and demographic disparity scores mathematically and give illustrative examples. We also recognize that tool can tell you what geographic areas and groups are under- or overrepresented in your data. However, we know the tool cannot tell you why those patterns exist, a first step to understanding how to address them. Consequently, we also explore potential reasons for disparities in the data.
The Disparity Scores
Because the process of calculating statistical significance for both the geographic and demographic disparity scores is somewhat lengthy, we describe that process separately in Chapter 14.
Geographic Disparity Scores
The geographic disparity score for a given subgeography is the difference between the proportion of the user-uploaded data within that subgeography and the proportion of the geography’s baseline population within that subgeography. For example, if a state-level analysis is selected, the geographic disparity score for a given county is the difference between the proportion of the user-uploaded data within that county and the proportion of the state’s baseline population within that county. For more information on the methodology for calculating the geographic disparity scores, see Chapter 3.
These geographic disparity scores tell us, at a high level, how the data are over- and underrepresented for each subgeography relative to each baseline population. For example, a geographic disparity score of 1.9 percent overrepresented indicates that the subgeography’s share of total data in the geography is 1.9 percentage points higher than its share of the total baseline population in the geography. Conversely, if the geographic disparity score indicates underrepresentation, the subgeography’s share of the data points in the geography is less than its share of the baseline population in the geography.
Demographic Disparity Scores
The demographic disparity score is the percentage point difference between the representation of a demographic group in the data (the data-implied percentage) and the representation of that group in the geography (the geography-wide percentage). For more information on the methodology for calculating the demographic disparity scores, see Chapter 3.
At a high level, the demographic disparity score tells us for a given demographic group, the difference between the population share in the tracts where the data points are located and the geography as a whole. For example, if the renter population has a demographic disparity score of 6 percent underrepresented, that indicates that percentage of the population that are renters in the tracts where the data are located is 6 percentage points lower than in the geography as a whole. Conversely, a demographic disparity score that indicates overrepresentation signifies that the demographic group makes up a greater population share of the tracts where data points are located than the geography as a whole.
For the state- and national-level analyses, we choose to show the demographic disparity score for the relevant geography (state or US) and the demographic disparity score for the smaller subgeographies that make up the main geography (counties and states, respectively). For more information on how the tool calculates the demographic disparity scores at each geographic level, see Chapter 3.
Potential Explanations for Under- and Overrepresentation
Data Collection Issues
The design and implementation of data collection systems can yield unequal representation. For example, resident-generated datasets, such as 311 requests, may reflect higher usage of the system by some groups (Kontokosta 2017) of residents. Therefore, the data may not accurately represent the true need for city services. We encourage you to use the results of the tool to discuss how to improve data collection efforts among underrepresented groups and geographic areas.
Program Implementation
The program captured in the data may not have been designed for the equity objective our tool is assessing. Some cities put public Wi-Fi hotspots in government buildings or downtown commercial centers to cater to the tourist and business populations. As a result, Wi-Fi hotspots would be overrepresented in commercial neighborhoods but underrepresented in less-central residential neighborhoods. In this case, you may decide that a subset of the data is more relevant to equity and use the tool’s filter function to examine the hotspots not located in government buildings. We encourage you to use the results of this tool to discuss how the design or implementation of a place-based program could yield more equitable results.
Historical Inequities
Data reflect the biases of the systems that generate them. For example, police arrest data are often concentrated in low-income communities of color because of previous policy decisions to overpolice these communities. These biased data are often fed into predictive policing algorithms, which, in turn, send even more police officers into these neighborhoods, generating even more disproportionate arrest records (Lum 2016). We encourage you to use the results of this tool to discuss how historical inequities inform current policies and data.
Mismatched Baseline Datasets
Although our tool offers several baseline datasets, it may not offer the baseline that best represents the most equitable distribution of your data. For example, when analyzing disparities in pothole-repair requests, the correct baseline dataset to compare against might be a dataset on traffic flow or some other measurement of likelihood of potholes. Unfortunately, traffic flow is not one of the datasets available in our tool. We encourage you to select the baseline dataset that most closely represents the ideal distribution of your data. If no built-in baseline dataset matches you ideal distribution, we encourage you to make use of the API’s functionality to add supplemental baseline and demographic datasets.
Mismatched Geographies
For both the geographic and demographic disparity analysis, we compare the distribution of your data against the baseline population distribution in the full geography. If your dataset is not intended to represent that entire geography (e.g., if your data only cover a single region of the US but you use national-level analysis), the results may not accurately reflect disparities in your data.