Democratizing Access to Education Data

The Urban Institute’s Education Data Portal

Erika Tyagi

The Education Data Portal bridges the gap between data availability and data accessibility.

  1. What do I mean by the availability-accessibility gap?
  2. How does the portal bridge this gap so effectively?
  3. Why does bridging this gap matter?

The Education Data Portal

  • What? A freely available one-stop-shop
    for 100+ datasets released by government agencies and other institutions on schools, districts, and colleges in the U.S.
  • How? Harmonized to account for changes
    to data and file structures over time
  • Why? To make it easier for both technical and
    non-technical users to look at trends over time and combine data from different sources

What do I mean by the availability-accessibility gap?

Example: How has tuition at my alma mater risen over my lifetime?

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice anomalies in the data

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice anomalies in the data
  • Re-read the data documentation

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice anomalies in the data
  • Re-read the data documentation
  • Update the code per the documentation

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice anomalies in the data
  • Re-read the data documentation
  • Update the code per the documentation
  • Remember to repeat the process again next year

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice anomalies in the data
  • Re-read the data documentation
  • Update the code per the documentation
  • Remember to repeat the process again next year
  • (And hope nothing changes)

Without the Education Data Portal…

Example: How has tuition at my alma mater risen over my lifetime?

  • Find the agency collecting the data
  • Read the data documentation
  • Download data files for each year
  • Load each file into R or Python
  • Notice anomalies in the data
  • Re-read the data documentation
  • Update the code per the documentation
  • Remember to repeat the process again next year
  • (And hope nothing changes)

This is tedious, error-prone, and not fun.

Using the portal R package

Example: How has tuition at my alma mater risen over my lifetime?

library(educationdata)

# Get data 
data <- get_education_data(
  level = "college-university",
  source = "ipeds",
  topic = "academic-year-tuition",
  filters = list(
    year = c(1990:2020), 
    unitid = "173258", 
    tuition_type = "4"
  )
)

# Plot data 
data %>%
  ggplot(aes(x = year, y = tuition_fees_ft)) +
  geom_line()

The R package is available on CRAN.

Using the portal Python package

Example: How has tuition at my alma mater risen over my lifetime?

import educationdata 

# Get data 
data = get_education_data(
  level = "college-university",
  source = "ipeds",
  topic = "academic-year-tuition",
  filters = {
    "year": range(1990, 2020), 
    "unitid": "173258", 
    "tuition_type": "4" 
  }
)

# Plot data 
data.plot.line(
  x = "year", y = "tuition_fees_ft"
)

The Python package is not yet publicly available.

Using the portal Stata package

Example: How has tuition at my alma mater risen over my lifetime?

* Get data 
educationdata using ///
  "college ipeds academic-year-tuition", sub( ///
  year=1990/2020 ///
  unitid=173258 ///
  tuition_type=4 ///
)

* Plot data 
twoway (line tuition_fees_ft year)







The Stata package is available on SSC.

Using the portal Data Explorer

Example: How has tuition at my alma mater risen over my lifetime?

How does the portal bridge this gap so effectively?

  1. By focusing on the underlying API
  2. By focusing on data documentation

The underlying API

Provides the foundation of the portal

  • 100+ data endpoints
    (with the data)
  • 12+ metadata endpoints
    (about the data)
  • All other tools, packages, and documentation are built from these endpoints

Data documentation

Considered a first-order feature of the portal

  • Written for both
    humans and machines
  • Provides the user with
    details on demand

Data documentation

Written for both humans and machines

Data documentation

Written for both humans and machines

{
  "results": [
    {
      "variable": "urban_centric_locale",
      "label": "Degree of urbanization (urban-centric locale)",
      "format": "urban_centric_locale",
      "data_type": "integer",
      "values": "{
        1: '1 - Large city', 
        2: '2 - Midsize city', 
        3: '3 - Urban fringe of large city', 
        4: '4 - Urban fringe of midsize city',
        [...]
      }
    }
  ]
}

https://educationdata.urban.org/api/v1/api-variables/?variable=urban_centric_locale

Data documentation

Provides the user with details on demand

How does the portal bridge this gap so effectively?

By focusing on the underlying API and data documentation… through collaboration with education and technology experts

  • Education contributors: Erica Blom, Jay Carter, Leonardo Restrepo
  • Technology contributors: Ben Chartoff, David D’Orio, Graham MacDonald, Kyle Ueyama, and Vivian Zheng

Why does bridging this gap matter?

Different people ask different questions.

Why does bridging this gap matter?

Different people ask different questions.

Why does bridging this gap matter?

Different people ask different questions.

Why does bridging this gap matter?

Different people ask different questions.

Each month, thousands of users ask questions through the portal

Why does bridging this gap matter?

By unlocking data for more people, we can allow more questions to find evidence-based answers that drive impact.

Get in touch