10 Custom Geographic and Demographic Datasets

Note

This feature is only available in the API.

As noted in the Chapter 1, the API allows users to compare resource data against additional geographic and demographic datasets. Supplemental geographic datasets allow geographic disparity scores to be calculated with the additional variables included in the uploaded file. Similarly, supplemental demographic datasets allow for additional demographic disparity scores to be calculated based on the variables in the supplemental disparity score dataset.

Both datasets should contain data in the form of counts of people at the census tract level. We envision users extracting data from the Census or other similar data to generate these supplemental files.

Both supplemental datasets should have the following characteristics:

Be a tabular dataset saved as a CSV file
Be at the census tract geographic level
Be no more than 200 MB
Have a geographic identifier column that stores the 11-digit census tract FIPS code
Have a column for each variable that the user wishes to include as an additional geographic or demographic variable and, if it exists, a separate column for that variable’s margin of error
Have data that reflect population counts of a specific group (this could be a population group such as SNAP recipients or an entity such as businesses)
Do not have a column named GEOID_urbaninstitute

The tool also throws warnings and errors when columns in the supplemental demographic and geographic datasets appear not to reflect population counts for specific groups. For a given population count (values) or margin of error (margin) column in either dataset, a warning is thrown under certain conditions below. For some warning cases, action is taken (also specified below). More specifically, warnings are thrown if the following conditions are met:

Any values columns are negative. We impute these values to 0.
Any values or margin of error columns are floats.
Greater than 0 but less than 50 percent of observations in a values column are greater than the total population of the census tract in the year the user specifies. We impute these values to the total population.
More than 0 percent of observations in a margin column are greater than the total population of the census tract in the year the user specifies. We do not update these values.
Greater than 0 but less than 50 percent of the values are missing. We impute these values to 0.
Any margin of error columns are missing.

For a given column in either dataset, if we determine a column highly likely to have errors, we drop the column and note it. We determine that a column is highly likely to have errors if the following conditions are met:

There are 50 percent or greater negative values.
There are 50 percent or greater values greater than the total population for values columns.
There are 50 percent or greater missing values.