A Guide to Mapping at the Urban Institute

Authors

Jon Schwabish

Aleszu Bajak

Will Curran-Groome

Introduction

The purpose of this document is to provide a step-by-step guide to Urban Institute researchers interested in creating geographic maps with their data. The document describes the various design and style considerations around effective maps, intervals (bins), and other decisions.

Urban supports numerous data visualization tools for internal and external publications. At the time of this writing, most data visualizations on the Urban Wire blog are created in the browser-based DataWrapper data visualization tool. Most graphs created for the Data at Urban blog and in Urban reports and briefs are created in either Excel or the R programming language. This document lists the advantages and disadvantages of all three tools, as well as examples and code snippets to enable map creation.

Further questions and requests for help creating maps and other data visualizations can be directed to Aleszu Bajak (abajak@urban.org) and Jon Schwabish (jschwabish@urban.org).

Step 1: Should you use a map to visualize your data?

There are obvious advantages to plotting geographic data on a map—people can find themselves and their communities, they can see the pattern across a state or country, and they are generally familiar to readers. However, one should approach the desire of creating a map with skepticism and critical thought. Is a map the best way to present geographic data? Or is the map simply showing where people live? Does it show interesting relationships to explore or simply relying on the fact that a map can be familiar to readers?

Whether to create a map and which kind of map to use will depend on two questions: How important are the geographic patterns in the data? And how important is it for the reader to see a familiar map?

Choosing a Map Projection

There are myriad maps to choose from to create a data visualization. Geographic units on a map (e.g., states or counties) can be colored differently (i.e., “choropleth maps”) or can include different shapes such as dots or lines. Furthermore, there are hundreds of different map projections to choose from, all of which are attempting to take the three-dimensional spherical shape of the earth and flatten it to two dimensions.

Different projections will stretch or shrink geographic areas in different ways, and which projection works best for the data being presented will depend on various factors. For example, the Mercator projection (bottom-left image) tends to stretch geographic areas closer to the poles (e.g., Greenland) but has less distortion for areas near the equator. The Albers projection (middle image) is known as an equal-area projection, which means that any given area on the map is proportaional to its size on the globe. As a specific example, in the Mercator projection map, Greenland appears to be larger than the entire continent of Africa when, in fact, it covers 836,300 square miles, while Africa covers about 11.7 million square miles. The two better resemble their true sizes under the Albers projection.

Mercator

Albers Conic

Miller

Source: Wikipedia.

For the United States specifically, Alaska appears much more stretched out and larger than in reality and the shape of the 48 contiguous states becomes more elongated east-west and compressed north-south. Thus, under the Mercator projection, the northwest border of the country is in a straight line. By comparison, using the Albers projection—the projection Urban uses when creating maps for the entire country—generates a curved northwest border.

Mercator

Albers Conic

For the United States specifically, Alaska appears much more stretched out and larger than in reality and the shape of the 48 contiguous states becomes more elongated east-west and compressed north-south. Thus, under the Mercator projection, the northwest border of the country is in a straight line. By comparison, using the Albers projection—the projection Urban uses when creating maps for the entire country—generates a curved northwest border.

Illinois

Montana

(Mercator projections in blue; Albers projections in black.)

Tension in Creating Data-Driven Maps

The overarching challenge with creating data-driven maps is that the familiarity of the map does not necessarily accurately represent the data being plotted. Take, for example, the US presidential election. What matters in the election is the number of electoral votes for each state, which directly corresponds to the number of people living in the state, not the state’s land area. So, for example, Wyoming, Idaho, and Montana are more than 325,000 square miles of the country and currently have 11 electoral votes. Massachusetts, by comparison, is about 7,800 square miles or less than 2.5 percent of the other three states, but itself also has 11 electoral votes. To address this imbalance between land size and the importance of the geography in the data, cartograms are one alternative to the standard maps.

Cartograms reshape geographic areas based on the data values. They might use differently-sized circles or squares for each state, or similarly-sized shapes to correspond to data elements. In his book Cartography, Kenneth Field summarizes the purpose of cartograms:

The intent of most thematic maps is to provide the reader with a map from which comparisons can be made, and so geography is almost always inappropriate. This fact alone creates problems for perception and cognition. Accounting for these problems might be addressed in many ways such as manipulating the data itself. Alternatively, instead of changing the data and maintaining the geography, you can retain the data values but modify the geography to create a cartogram.

Step 2: What kind of map should you create?

Once you have decided that a map is the best solution for your data, the next question is what kind of map are you going to create? Urban typically produces two kinds of maps:

  • Choropleth maps. A map that uses colors, shades, or patterns on geographic units to show proportionate quantities and magnitudes.

  • Tile grid maps. A map that uses a single square for each geographic unit to approximate the shape of the overall geography. There are variations on the tile grid map, such as hex grid maps (which use hexagons for each geographic unit) and Demers cartograms (which use differently-sized squares for each geographic unit).

  • Point-based maps. Maps that use dots or other shapes to mark specific locations on a map or, in the case of dot density maps, use such shapes to indicate an overall pattern of distribution.

A secondary, but also important, question to consider when creating a map is how to choose the intervals (or bins) that will shade or define the geographic units. Placing data into discrete categories is, at its core, an aggregation problem. By combining several states or counties into a single bin, the reader cannot clearly distinguish how different the values are from one another.

There are four primary binning methods for maps:

  1. No Bins. Essentially a continuous color palette (or “ramp”) in which each data value receives a unique color tone. The

  2. Equal Interval Bins. In this approach, the data are divided into equally-sized groups, such as 1-25, 26-50, 51-75, and 76-100.

  3. Data Distribution Bins. Instead of having a bin at equal intervals, the bins can be arranged to hold the same number of observations, such as quartiles or quintiles.

  4. Arbitrary Bins. Here, the map creator chooses the bin cutoffs based on round numbers, natural breaks, or some other arbitrary criterion.

Step 3: What tool should you use to create your map?

Urban supports numerous data and data visualization tools for exploration, analysis, and data visualization. There are three primary tools that are recommended for creating data-driven maps, each with its own advantages and disadvantages:

Advantages Disadvantages
Datawrapper An online, browser-based tool that enables users to create interactive and responsive choropleth maps. Sensitive data or data with PII should not be used. Users can open a free account to create a basic map, but styling specific to Urban will need to be completed by the data visualization team in the COMM department.
Excel There are two options for creating maps in Excel: tile grid maps, for which there already exists an Urban-specific template; or basic choropleth maps. Choropleth maps in Excel do not always look particularly sharp. They are also limited in the geographies that can be used—typically limited to states, counties, and zip codes.
R With some fairly standard code, R can be used to create a wide array of maps with a wide array of projections. Coding languages have multiple
advantages over other drag-and-drop tools, but R, in particular, has excellent
data visualization and graphics capabilities (through the ggplot2 library), as well as
decent support for interactivity and webpage development (via leaflet, tmap,
Plotly, and Quarto).
Users need to have a basic understanding of the R coding language, as well as how to use the Urban theme via library(urbnthemes). Adding custom features, such as labels, scales, and compasses can be very challenging.

Some Urban researchers use other data and data visualization tools including QGIS, ArcGIS, Flourish, Tableau, SAS, and Stata.

Step 4: How to create maps in these different tools?

Datawrapper: PUMA, County, State

Datawrapper is a free, browser-based tool that enables creators to generate interactive and responsive data visualizations. Urban has an enterprise license that enables members of the COMM team to add Urban-specific colors, fonts, and logos to graphs, especially for use on Urban Wire.

With the free license, it is relatively easy to create county- and state-level maps. Simply copy-and-paste the data—state names, county names, and FIPS codes will work—into the Datawrapper interface and adjust the various settings. As shown in the images below, the default graphs do not use the Urban Lato font or Urban color palette.

But working with Urban’s COMM team, those graphs can be easily changed to include Urban branding elements, including font, colors, and logo. Those maps can be published as interactive visualizations that can be embedded on a website or exported as static images (e.g., PNG, PDF, etc.) for use in reports and presentations.

As with all tools, it is important to know how to load the map file and merge it to the data values. Creating the county-level map of Illinois, for example, requires using just the name of the county (e.g., “Cook”) along with the data. When the initial map is chosen in Step 1 of the process, Datawrapper will provide examples of its preferred name for the geographic units. In the “Match” tab of the Add your data tab, the user can select their preferred geographic identifier—for most county-level maps, Datawrapper will accept county name or the 3- or 5-digit FIPS code.

Also available in Datawrapper default templates are maps at the ZIP Code and ZIP Code Tabulation Area (ZCTA) levels. There is also a default template that includes the five US territories (note: in some cases, Datawrapper codes “Northern Mariana Islands” as “Northern Marianas” and “U.S. Virgin Islands” as “Virgin Islands”).

Datawrapper also permits the use of custom geographies in the form of GeoJSON file formats. This means that a PUMA-level map could be created in Datawrapper, though it will require some additional processing of the underlying shapefiles, which are typically available from the Census Bureau website. This Datawrapper blog post walks through the steps of how to upload a custom file and create a custom map in Datawrapper.

To create a PUMA-level map of Georgia, for example, the R code below will generate a GeoJSON file (“ga_puma_simple.geojson”) and save it to the user’s computer. That file can then be loaded into Datawrapper and matched to the data.

Show the code
#Install packages, if you don’t have them
# renv::install(c("sf", "rmapshaper", "tidycensus", "tidyverse"))

#Load packages
library(sf)
library(rmapshaper)
library(tidycensus)
library(tidyverse)

# Get dummy Georgia 2012 PUMA data
# You can pull PUMA files for another state by changing the state=”Georgia” line
# If you have newer PUMA codes, you can change the year=2012 line to something else
ga_puma_data = get_acs(geography = "public use microdata area", 
                       variable = c(population = "B01003_001"), 
                       state = "Georgia", year = 2012, 
                       geometry = TRUE, progress_bar = FALSE)

# Write non-spatial attributes to a local .csv
#ga_puma_data %>% write_csv("ga_puma_data.csv")

# Simplify to get geojson under 2MB to upload to Datawrapper
# then write to a local file
# ga_puma_data %>% 
#   select(GEOID, geometry) %>%
#   # This step simplifies the shapefile boundaries
#   ms_simplify(keep = 0.2) #%>% 
#   st_write("ga_puma_simple.geojson", append = FALSE, quiet = TRUE)

Finally, tile and hex grid maps can also be created in Datawrapper. The default tile grid map in Datawrapper does not include Washington, DC or America Samoa, but a custom GeoJSON file can be uploaded to include those two geographies and to lay them out as preferred. (Note: At the moment, Urban has not defined a specific layout for maps that include US territories.)

Excel: County, State

There are two kinds of maps that are relatively easy to create in Excel: A tile grid map (a square for each state) and a state- or county-level choropleth map.

Urban has made available a tile grid map template with multiple options. Users need only to import their data into the DATA tab, update the VLOOKUP formula in any one of the map tabs, adjust the cutoffs (if desired), and update the titles and other labels. Unlike Datawrapper (or R), creating different breaks in the data and associated legends requires a bit more manual work of changing formulas and adjusting columns and rows in the spreadsheets.

Microsoft introduced the “Filled Map” data visualization feature with the Excel 2016 package. The tool is fairly basic and allows for a limited number of geography types. However, because it is available directly in Excel, it is relatively easy to use and customize.

To create a map in Excel, the data should be arranged with the geographic identifier in a single column to the left of the data column. The geographic identifier may need to be adjusted to enable Excel to find the right matching variable. For example, to create the Illinois county-level map above, FIPS codes are converted to county names and concatenated with the state name, such as “Cook County, Illinois.”

Selecting the Insert > Filled Map option will insert a map directly into the Excel worksheet. Right-clicking on the actual map with reveal the Format Data Series… menu where the map projection, map area, map labels, and colors can all be adjusted. There are several ways in which Excel Filled Maps cannot be adjusted in the same way as typical Excel charts but is a straightforward way to create basic choropleth maps. 

R: PUMA, County, State

The R programming language sits somewhere in between Excel and Datawrapper: it is not a drag-and-drop tool but, with some basic code, can be extended to enable different and more detailed maps and data visualizations. Users should consult Urban’s public R User Group page and the internal R Slack channel (#r-users-group) for additional support and assistance.

To create maps in R, users will first need to load several packages (and first install them, if they haven’t done so previously). This can be done using two commands:

  • renv::install(“package_name”) (this only needs to be run once)

  • library(package_name) (this needs to be run at the top of every script)

While multiple libraries are loaded below, many of these provide convenience functions for obtaining particular datasets, working with particular file types, or conducting specialized tasks (e.g., algorithms for binning values, or functions for placing labels without overlaps). The central libraries that almost every mapping exercise will require are library(tidyverse)–which contains data manipulation and visualization functions–and library(sf), which provides utilities for working with spatial data and conducting spatial operations.

Two other Urban-specific packages—library(urbnmapr) and library(urbnthemes)—are also necessary and may require additional installation steps (commented in the code chunk below). For help with installing and working with R packages and code, send a message to the #r-users-group Slack channel.

Show the code
## install.packages("renv")
library(here)
library(janitor)
library(skimr)
library(tidyverse)
library(tigris)
library(sf)
## renv::install("urbnthemes")
## renv::install("urbnmapr")
library(urbnthemes)
library(urbnmapr)
library(readxl)
library(tilemaps)
library(cowplot)
library(BAMMtools)
library(ggrepel)
library(gridExtra)

set_urbn_defaults(style = "map")
Show the code
# Read in and clean various example datasets that we'll use for mapped examples

df_puma_georgia = readxl::read_excel(
  here("mapping", "data", "data-raw", "SampleDataForMaps_NotFinal.xlsx"), 
  sheet = 1) %>% ## first sheet in workbook
  clean_names()

df_county_illinois = readxl::read_excel(
  here("mapping", "data", "data-raw", "SampleDataForMaps_NotFinal.xlsx"), 
  sheet = 2, ## second sheet in workbook
  skip = 9) %>% ## the first nine rows are metadata / non-tabular; we don't read these
  clean_names() %>%
  ## multiple columns are duplicated; we select unique columns and rename for clarity
  select(
    state_name = state,
    county_fips = fips,
    county_name = county,
    observations,
    mean_monthly_people_eligible_snap = average_monthly_number_eligible_2) %>%
  mutate(county_fips = as.character(county_fips)) %>% ## convert to character for joining
  filter(!is.na(state_name)) ## in the raw data, there's a "total" row at the bottom with missingness; we don't need this row

df_state_territories = readxl::read_excel(
  here("mapping", "data", "data-raw", "SampleDataForMaps_NotFinal.xlsx"), 
  sheet = 3) %>% ## third sheet in the workbook
  clean_names() %>%
  rename(fips = fips_code, state_abbreviation = state_abbrev)

A standard choropleth map is a basic map type in R. Attributes like labels and legends can be modified and edited with relatively small adjustments to the code. Urban also has a tile grid map template available as part of library(urbnthemes), which can be used to create Figure 3.

Spatial data for different geographic units can be loaded into R using either library(tidycensus) or library(tigris), both of which pull data from the US Census Bureau. library(tigris) offers a more complete set of spatial data (e.g., including options for roads and bodies of water), while library(tidycensus) allows users to query Census Bureau APIs for American Community Survey and other population data and—when the geometry = TRUE argument is specified—also returns the associated spatial data for the requested legal, political, or statistical geography. (library(tidycensus) actually uses library(tigris) to provide users with the spatial data that it returns).

The following lines of code, for example, import PUMA geographies for the state of Illinois using library(tigris); these data can then be joined to the non-spatial, PUMA-level data set using PUMA codes.

Figure 1 - PUMA-level Choropleth Map with Inset

Show the code
pumas_sf = tigris::pumas(
  state = "GA", 
  year = 2019, 
  cb = TRUE, ## cb = "cartgographic boundaries"; this excludes, for example, area in the ocean
  progress_bar = FALSE)

sf_puma_georgia = pumas_sf %>%
  ## join our spatial data to our non-spatial data
  left_join(df_puma_georgia, by = c("PUMACE10" = "puma"))

sf_puma_atlanta_metro = pumas_sf %>%
  filter(str_detect(NAME10, "Atlanta")) ## capture Atlanta-area PUMAs only

#xmin, ymin, xmax, ymax
## create an object--not a spatial object--the contains the extents of the Atlanta metro area
atlanta_bbox = sf_puma_atlanta_metro %>% st_bbox()

puma_plot_main = sf_puma_georgia %>%
  ggplot() +
    geom_sf(aes(fill = avg_monthly_number_of_eligible_families)) +
    geom_sf(
      ## convert our bbox into a spatial polygon
      data = atlanta_bbox %>% st_as_sfc(),
      color = "black",
      fill = NA,
      linewidth = 1.3) + 
    scale_fill_continuous(
      labels = scales::comma, ## add commas to the labels
      breaks = c(0, 2000, 4000, 6000), ## specify which points on the scale to label
      trans = "reverse", ## reverse the scale so that higher values are darker
      limits = c(6000, 0)) + ## specify the range of the scale to ensure that 0 and 6000 are labeled
    ## move the legend to the top; by default, it's vertically centered  
    theme(legend.justification = "top") + 
    labs(fill = "Mean SNAP-eligible families per month (2019)")

puma_plot_inset = sf_puma_georgia %>% 
  ## crop to the Atlanta metro area
  st_crop(atlanta_bbox %>% st_as_sfc()) %>%
  ggplot() +
    geom_sf(aes(
      fill = avg_monthly_number_of_eligible_families), 
      show.legend = FALSE) + ## we've already got a legend for our main map, so we omit the legend here
    scale_fill_continuous(
      labels = scales::comma, 
      breaks = c(0, 2000, 4000, 6000),
      trans = "reverse",
      limits = c(6000, 0)) +
    theme(
      ## add a black border around the inset
      panel.border = element_rect(color = "black", linewidth = 1.3, fill = NA)) +
    coord_sf(expand = FALSE) + ## don't add buffer space around the plot
    labs(
      title = "",
      subtitle = "Atlanta Metro Area")

## combine our two plots
main_plot = ggdraw(puma_plot_main) +
  draw_plot(
    {puma_plot_inset},
    # (x,y) determine placement of the inset map
    x = 0.35, 
    y = .06,
    width = 0.6, 
    height = 0.6) 

## combine our plot with other plot info (title, note, source, logo)
puma_plot_final = grid.arrange(
  urbn_title("SNAP-eligible Families in Georgia by Public-Use Microdata Area (PUMA)"),
  main_plot,
  urbn_logo_text(),
  urbn_note(
    text = paste0(
      "PUMA-level estimates of monthly mean SNAP-eligible families."),
    width = 140),
  urbn_source(
    text = "Authors' analysis of simulated estimates from an unspecified model.",
    width = 140,
    plural = T),
  ncol = 1,
  nrow = 5,
  heights = c(
    2.5, # title
    50, # figure
    2, # logo 
    2, # note
    2 # source
  ))

Figure 2 - County-level Choropleth Map with Scaled Points

Show the code
counties_sf = tidycensus::get_acs(
  state = "IL", 
  geography = "county", 
  year = 2019, 
  variables = "B01003_001", ## total population variable
  output = "wide", ## return data formatted wide (one variable per column)
  geometry = TRUE, ## include spatial data
  progress_bar = FALSE) %>%
  select(county_fips = GEOID, total_population = B01003_001E)

counties_illinois_sf = left_join(
    counties_sf,
    df_county_illinois, 
    by = "county_fips") %>%
  mutate(
    mean_monthly_people_eligible_snap_percent = mean_monthly_people_eligible_snap / total_population)

counties_plot_illinois = counties_illinois_sf %>% ## these are polygon data
  ggplot() +
  ## the basic choropleth (county boundary) geographies, filled with percentages
  geom_sf(aes(fill = mean_monthly_people_eligible_snap_percent)) +
  ## the point-based dots, scaled based on snap-eligible counts
  ## st_centroid converts polygons to points for plotting purposes
  ## st_point_on_surface will guarantee that generated points fall within polygons
  ## and is appropriate when polygons are less uniform shapes; st_centroid can generate
  ## a "centroid" that falls outside of the originating polygon
  geom_sf(
    ## this second geom_sf() call adds the points to the map based on the centroid
    ## of the original polygon
    data = counties_illinois_sf %>% st_centroid(),
    ## point size is determined by mean_monthly_people_eligible_snap
    aes(size = mean_monthly_people_eligible_snap),
    color = palette_urbn_main[2], ## color of the points (urban orange)
    alpha = .75) + ## transparency of dots helps if/when they overlap
  ## label Cook County / Chicago because this county has the greatest number of SNAP-eligible people
  ## (represented by point size) in a manner that is obscured by stand-alone choropleth mapping
  ggrepel::geom_text_repel(
    data = counties_illinois_sf %>% filter(county_name == "Cook County") %>% st_centroid(), 
    label = "Cook County (including Chicago)" %>% str_wrap(15),
    aes(geometry = geometry),
    stat = "sf_coordinates", ## this maps centroid c(x, y) values to the labeling function
    min.segment.length = 0, ## draw a line from the label to the point
    nudge_x = 3, ## nudge the label to the right
    nudge_y = .5) + ## nudge the label up
  theme(legend.box = "vertical") + ## orient the legend vertically
  labs( ## scale titles
    fill = "Mean SNAP-eligible people per month, standardized by total population (2019)" %>% str_wrap(30),
    size = "Mean SNAP-eligible people per month (2019)" %>% str_wrap(30)) +
  scale_fill_continuous(
    labels = scales::percent, ## convert decimals to percentages
    trans = "reverse", ## reverse the scale so that higher values are darker
    breaks = c(0, .2, .4, .6), ## specify which points on the scale to label
    limits = c(.6, 0)) + ## specify the range of the scale to ensure that 0 and 60% are labeled
  scale_size_continuous(labels = scales::comma) + ## add commas to the labels
  ## don't draw buffer space around the plot
  coord_sf(expand = FALSE)

## combine our plot with other plot info (title, subtitle note, source, logo)
county_plot_final = urbn_plot(
  urbn_title("Simulated SNAP Eligibility in Illinois Counties (2019)"),
  urbn_subtitle("Cook County accounts for the greatest share of SNAP-eligible individuals.
  The counties with high rates of SNAP eligibility are scattered across the state and are often non-urban." %>% str_wrap(110)),
  counties_plot_illinois, 
  urbn_logo_text(),
  urbn_note(
    text = paste0(
      "County-level estimates of monthly mean SNAP-eligible individuals, 
      standardized by county-level population estimates from the 2015-2019 
      5-year American Community Survey."),
    width = 100),
  urbn_source(
    text = "Authors' analysis of simulated estimates from an unspecified model.",
    width = 100,
    plural = FALSE),
  ncol = 1,
  heights = c(
    2,
    3.5,
    30, # figure
    1.5, # logo
    3, # note
    1.5 # source
  )) 

Figure 3 - State-level Choropleth Map with Offset Labels

Show the code
## spatial data for states and territories
states_sf = get_urbn_map(map = "territories_states", sf = TRUE) %>%
  rename(state_abbreviation = state_abbv) 

## join non-spatial data to our spatial data
state_poverty_sf = states_sf %>%
  left_join(df_state_territories, by = "state_abbreviation") %>%
  mutate(
    ## the distribution of state-level poverty rates is not normal; we use a "jenk"
    ## algorithm to create breaks that are more reflective of the data distribution
    poverty_rate_jenks = cut(
      poverty_rate, 
      breaks = getJenksBreaks(poverty_rate, 7), ## create six buckets of values
      include.lowest = TRUE, 
      ordered_result = TRUE, 
      labels = FALSE) %>%
      ## convert the labels to a more human-readable format
      ## clarifying the meaning of the bounds of the scale
      factor(
        levels = c(1, 2, 3, 4, 5, 6), 
        ordered = TRUE, 
        labels = c("1 - Lowest", "2", "3", "4", "5", "6 - Highest")))

## labels for some states need to be nudged so that they're legible and not 
## obscuring underlying geometries
states_shift_north = c("HI", "GU", "PR")
states_shift_south = c("AS", "VI", "MP")
states_shift_east = c("DC", "MD", "DE", "NJ", "CT", "MA", "VT", "NH", "RI")

state_choropleth_plot = state_poverty_sf %>%  
  ggplot() +
    ## this creates the basic choropleth
    geom_sf(aes(fill = poverty_rate_jenks)) +
    ## this labels most states (those where default labels are legible)
    geom_sf_text(
      data = state_poverty_sf %>% 
        filter(!state_abbreviation %in% c(states_shift_north, states_shift_east, states_shift_south)),
      aes(label = state_abbreviation), color = "black") +
    ## these geom_text_repel() calls label states where labels need to be offset
    ggrepel::geom_text_repel(
      ## these calls require centroids of the states to be calculated
      data = state_poverty_sf %>% filter(state_abbreviation %in% states_shift_north) %>% st_centroid(),
      aes(geometry = geometry, label = state_abbreviation),
      stat = "sf_coordinates",
      nudge_y = 100000, ## meters to nudge
      ## add a line back to the geometry if the label is more than .2 from the geometry
      min.segment.length = .2, 
      force = .5) + ## the force of repulsion between the labels
    ggrepel::geom_text_repel(
      data = state_poverty_sf %>% filter(state_abbreviation %in% states_shift_east) %>% st_centroid(),
      aes(geometry = geometry, label = state_abbreviation),
      stat = "sf_coordinates",
      nudge_x = 150000,
      min.segment.length = .2,
      force = .5) +
    ggrepel::geom_text_repel(
      data = state_poverty_sf %>% filter(state_abbreviation %in% states_shift_south) %>% st_centroid(),
      aes(geometry = geometry, label = state_abbreviation),
      stat = "sf_coordinates",
      nudge_x = -150000,
      min.segment.length = .2,
      force = .5) +
    coord_sf(expand = FALSE) + ## don't add buffer space around the plot
    labs(fill = "Poverty rate (jenks)")

subtitle = c("While this representation is more reflective of actual land area and spatial relationships than the tile map below, it creates an impression that country-wide poverty levels are lower than they are because some large-land-area, low-population states with low poverty rates occupy significant portions of the map.")

state_choropleth_final = urbn_plot(
  urbn_title("Poverty Rate by State / Territory"),
  urbn_subtitle(subtitle %>% str_wrap(100)),
  state_choropleth_plot, 
  urbn_logo_text(),
  urbn_source(
    text = "Authors' analysis of poverty rates; source unknown.",
    width = 100,
    plural = FALSE),
  ncol = 1,
  heights = c(
    2,
    5.5,
    30, # figure
    1.5, # logo
    #3, # note
    1.5 # source
  )) 

Figure 4 - State-level Tilemap, Including Territories

Show the code
set_urbn_defaults(style = "map")
## create a data frame that includes the non-state entities
## that we add to a 50-states dataframe below from urbnthemes::urbn_geofacet()
non_state_entities = tibble::tribble(
  ~row, ~col, ~state_code, ~state_name,
  8, 11, "PR", "Puerto Rico",
  9, 11, "VI", "Virgin Islands",
  9, 0, "GU", "Guam",
  8, 0, "MP", "Northern Mariana Islands",
  9, 1, "AS", "American Samoa")

state_tile_plot = urbnthemes::urbn_geofacet %>%
  ## combine territories with urbn_geofacet states
  bind_rows(non_state_entities) %>%
  ## join our non-spatial data
  left_join(state_poverty_sf %>% st_drop_geometry, by = c("state_code" = "state_abbreviation")) %>%
  ## plot poverty rates by state, jenk-ed
  ggplot(aes(x = col, y = -row, fill = poverty_rate_jenks)) +
  ## create a small white space between each tile
  geom_tile(color = "white", linewidth = 1) +
  ## add black state abbreviation labels
  geom_text(aes(label = state_code), color = "black") +
  ## place the legend vertically, and to the right of the map
  theme(
    legend.position = "right",
    legend.direction = "vertical") +
  ## ensure the map plots such that tiles are square, not stretched in one direction
  coord_equal() +
  labs(fill = "Poverty rate (jenks)")

state_grid_final= urbn_plot(
  urbn_title("Poverty Rate by State / Territory"),
  urbn_subtitle("Tiles provide equal visual areas for each geographic unit, irrespective of actual spatial area. 
    This creates a more accurate visual representation of the distribution of state-level poverty rates." %>% str_wrap(110)),
  state_tile_plot, 
  urbn_logo_text(),
  urbn_source(
    text = "Authors' analysis of poverty rates; source unknown.",
    width = 100,
    plural = FALSE),
  ncol = 1,
  heights = c(
    2,
    3,
    30, # figure
    1.5, # logo
    #3, # note
    1.5 # source
  )) 

Additional Resources, Books, and Blogs

Datawrapper Resources

Urban Institute R Resources

Books