7 Interpreting the Results

You have run your analysis—but what exactly do the results mean? In this chapter, we describe the geographic and demographic disparity scores. We recognize that while the tool can tell you what geographic areas and groups are under- or overrepresented in your data, it cannot tell you why those patterns exist, a first step to understanding how to address them. Consequently, we explore potential reasons for disparities in the data.

The interpretations of the disparity scores will vary slightly when using the beta travel shed functionality. For details on interpreting results using travel sheds, see Chapter 19.

The Disparity Scores

Because the process of calculating statistical significance for both the geographic and demographic disparity scores is somewhat lengthy, we describe that process separately in Chapter 14.

Geographic Disparity Scores

The geographic disparity score for a given subgeography is the difference between the proportion of the user-uploaded data within that subgeography and the proportion of the geography’s baseline population within that subgeography. For example, if a state-level analysis is selected, the geographic disparity score for a given county is the difference between the proportion of the user-uploaded data within that county and the proportion of the state’s baseline population within that county. For more information on the methodology for calculating the geographic disparity scores, see Chapter 12.

The geographic disparity score is built on the assumption that an equitable allocation is one in which the allocation of points is proportional to the allocation of the baseline population. Or more precisely, an equitable allocation of user-uploaded data would be one where the proportion of the geography’s total baseline population in the subgeography would be equal to the proportion of the user-uploaded data in a subgeography. Because this assumption is rarely met, the geographic disparity scores tell us, at a high level, how the data are over- and underrepresented for each subgeography relative to each baseline population. For example, a geographic disparity score of 1.9 percent overrepresented indicates that the subgeography’s share of total data in the geography is 1.9 percentage points higher than its share of the total baseline population in the geography. Conversely, if the geographic disparity score indicates underrepresentation, the subgeography’s share of the data points in the geography is less than its share of the baseline population in the geography.

Demographic Disparity Scores

The demographic disparity score is the percentage point difference between the representation of a demographic group across the tracts where data points are located (the data-implied percentage) and the representation of that group in the overall geography (the geography-wide percentage). Positive scores mean that the demographic group tends to live in census tracts with disproportionately high levels of the original data. Negative scores indicate that the demographic group tends to live in census tracts with relatively low levels of the data. For more information on the methodology for calculating the demographic disparity scores, see Chapter 12.

At a high level, the demographic disparity score tells us for a given demographic group, the difference between the population share in the tracts where the data points are located and the geography as a whole. For example, if the renter population has a demographic disparity score of 6 percent underrepresented, that indicates the percentage of the population that are renters in the tracts where the data are located is 6 percentage points lower than in the geography as a whole. Conversely, a demographic disparity score that indicates overrepresentation signifies that the demographic group makes up a greater population share of the tracts where data points are located than the geography as a whole.

For the state- and national-level analyses, in the web tool, we choose to show the demographic disparity score for the relevant geography (state or US) and the demographic disparity score for the smaller subgeographies that make up the main geography (counties and states, respectively). For more information on how the tool calculates the demographic disparity scores at each geographic level, see Chapter 12.

Potential Explanations for Under- and Overrepresentation

Data Collection Issues

The design and implementation of data collection systems can yield unequal representation. For example, resident-generated datasets, such as 311 requests, may reflect higher usage of the system by some groups (Kontokosta 2017) of residents. Therefore, the data may not accurately represent the true need for city services. We encourage you to use the results of the tool to discuss how to improve data collection efforts among underrepresented groups and geographic areas.

Program Implementation

The program captured in the data may not have been designed for the equity objective our tool is assessing. Some cities put public Wi-Fi hotspots in government buildings or downtown commercial centers to cater to the tourist and business populations. As a result, Wi-Fi hotspots would be overrepresented in commercial neighborhoods but underrepresented in less-central residential neighborhoods. In this case, you may decide that a subset of the data is more relevant to equity and use the tool’s filter function to examine the hotspots not located in government buildings. We encourage you to use the results of this tool to discuss how the design or implementation of a place-based program could yield more equitable results.

Historical Inequities

Data reflect the biases of the systems that generate them. For example, police arrest data are often concentrated in low-income communities of color because of previous policy decisions to overpolice these communities. These biased data are often fed into predictive policing algorithms, which, in turn, send even more police officers into these neighborhoods, generating even more disproportionate arrest records (Lum 2016). We encourage you to use the results of this tool to discuss how historical inequities inform current policies and data.

Mismatched Baseline Datasets

Although our tool offers several baseline datasets, it may not offer the baseline that best represents the most equitable distribution of your data. For example, when analyzing disparities in pothole-repair requests, the correct baseline dataset to compare against might be a dataset on traffic flow or some other measurement of likelihood of potholes. Unfortunately, traffic flow is not one of the datasets available in our tool. We encourage you to select the baseline dataset that most closely represents the ideal distribution of your data. If no built-in baseline dataset matches your ideal distribution, we encourage you to make use of the API’s functionality to add supplemental baseline and demographic datasets.

Mismatched Geographies

For both the geographic and demographic disparity analysis, we compare the distribution of your data against the baseline population distribution in the full geography. If your dataset is not intended to represent that entire geography (e.g., if your data only cover a single region of the US but you use national-level analysis), the results may not accurately reflect disparities in your data.

Please note that we provide possible explanations for under- and overrepresentation in travel shed analyses in Chapter 19.