Review

  • Use st_join to perform spatial joins
  • Use st_buffer() to buffer points (or polygons!)
  • Use group_by() and summarize() to aggregate spatial data

Exercise 0: Create a Project (if you didn’t do this last week)

Step 1: Create a new directory called mapping

Step 2: Open RStudio. Click “Project: (None)” in the top right corner. Click “New Project” and create a project based on the existing mapping directory.

Step 3: Submit install.packages("tidyverse") to the Console.

Step 4: Submit install.packages("devtools") to the Console.

Step 5: Submit install.packages("mapview") to the Console.

Step 6: Submit remotes::install_github("UrbanInstitute/urbnmapr") to the Console.

Step 7: Submit remotes::install_github("UrbanInstitute/urbnthemes") to the Console.

Step 8: Write library(tidyverse), library(urbnmapr), and library(urbnthemes) at the top of 01_intro-to-mapping.R. With the cursor on the line of text, click Control-Enter.

Exercise 0.5: Setup

Step 1: Open a .R script with the button in the top left. Save the script as 04_more_spatial_operations.R.

Step 2: Copy and paste the following to the top of 04_more_spatial_operations.R. This loads in all the necessary libraries and data for today. With the cursor highlighting all the below text, click Control-Enter.

library(tidyverse)
library(sf)
library(urbnmapr)
library(urbnthemes)
library(tigris)
library(mapview)

set_urbn_defaults(style = "map")

dc_tracts <- tracts(
  state = "DC",
  year = 2017,
  class = "sf",
  cb = TRUE,
  progress_bar = F
) %>%
  st_transform(crs = 6487)

dc_parks <- st_read("https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Recreation_WebMercator/MapServer/9/query?where=1%3D1&outFields=NAME,ADDRESS,DRINKFOUNT&outSR=4326&f=json", quiet = TRUE) %>%
  st_transform(crs = 6487)

fire_stations <- st_read("https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Public_Safety_WebMercator/MapServer/6/query?where=1%3D1&outFields=NAME,ADDRESS,TRUCK,AMBULANCE&outSR=4326&f=geojson", quiet = TRUE) %>%
  st_transform(crs = 6487)

Interactive Maps

One great feature for R is support for quickly and easily creating interactive maps. While we do not recommend creating interactive maps for any briefs/products that will be published (mostly because getting the styles to align with the Urban Style guide is a little difficult), interactive maps can be a great tool for exploratory data analyses and helping you understand your data. Today we are going to cover a package called mapview which will allow you to create interactive maps.

Once you have an sf dataframe, you can create an interactive map with just one line: mapview(sf_dataframe). You can save this map by assigning it to an object. For example map <- mapview(sf_dataframe)

Below are some of the many ways you can change the appearance of your interactive map. For more details, please see: https://r-spatial.github.io/mapview/articles/articles/mapview_01-basics.html

  • If you want to color in the tracts by a particular variable (ie make a chloropleth map), use the zcol argument and feed in a column name IN QUOTES. Note this is different from ggplot where you could use a column name without quotes

  • If you want to change the variable text displayed on hover, use the label argument and feed in a column name in QUOTES.

  • If you want to change the type of basemap being used in the background, use the map.types argument and feed in some of the following: "CartoDB.Positron", "CartoDB.DarkMatter", "OpenStreetMap", "Esri.WorldImagery", "OpenTopoMap". For a full list of basemaps, see here.

  • If you want to change the title of the legend, you can use the layer.name argument and feed in the "title" of your choice.

  • If you want to change the opacity of the colored in regions of the map, use the alpha.regions argument and choose a number from 0 to 1 (like 0.3).

Exercise 1

Step 1: Make an interactive map of dc_tracts colored in by the "ALAND" variable (the area of the land covered by that tract). Call this map dc_land.

Step 2: Make an interactive map of dc_tracts colored in by the "AWATER" variable (the area of the land covered by that tract). Call this map dc_water.

Step 3: Now make the Census tract names display on hover. They are contained in the "NAME" column.

Adding two layers together

With mapview, it’s really easy to plot two sf dataframes together on the same map. Just like ggplot, you use the + operator to add two different maps on top of each other:

mapview(sf_dataframe) +
  mapview(another_sf_dataframe)

You can also use the | operator to create side by side slider maps!

mapview(sf_dataframe) |
  mapview(another_sf_dataframe)

Exercise 2

Step 1: Now plot the DC parks on top of the DC census tracts using the + operator. What do you see?

Step 2: For the dc_tracts mapview statement, set col.regions = "steelblue" and for the dc_parks mapview statement set col.regions = "green"

Step 3: Plot the dc_land and dc_water maps side by side using the | operator.

Calculating Distances

Often, you will want to calculate the distance between two things - like the distance between two points, or a point and a polygon. There’s a handy function called st_distance() that does this for you! But it spits out a matrix instead of the dataframes we’re used to seeing. This can feel scary but don’t worry, we can easily coerce them into the dataframes we’re use to working with. Say we wanted to get the distance between the first 5 fire stations and 1 park

ggplot() +
  geom_sf(data = (dc_parks %>% slice(1)), fill = "green") +
  geom_sf(data = (fire_stations %>% slice(1:5)), color = "red")

dists <- st_distance(dc_parks %>% slice(1), fire_stations %>% slice(1:5))
dists
## Units: [m]
##          [,1]     [,2]     [,3]     [,4]     [,5]
## [1,] 11123.02 7310.232 9973.471 6946.382 8092.819

As you can see there are 5 rows and 1 column, and each column contains the distance between the park and the respective fire station. Keep in mind that the first argument in st_distance determines the number of rows and the second argument determines the number of columns. Let’s say you wanted the shortest distance between this park and these 5 fire stations. That’s just the minimum of the first row! But because it’s a matrix, we need to do some wrangling to get to the answer.

min_dists <- dists %>%
  # Conert matrix to tibble (ie a dataframe)
  as_tibble() %>%
  mutate(min_dist = pmap_dbl(., min)) %>% # This line looks complicated, but it just calculates the minimum of the row
  select(min_dist)
min_dists
## # A tibble: 1 × 1
##   min_dist
##      <dbl>
## 1    6946.

Now say you wanted to do this for every park in the city. All we have to do is remove the slice operation from the dc_parks dataframe:

dists <- st_distance(dc_parks, fire_stations %>% slice(1:5))

min_dists <- dists %>%
  as_tibble() %>%
  mutate(min_dist = pmap_dbl(., min)) %>%
  select(min_dist)
min_dists
## # A tibble: 256 × 1
##    min_dist
##       <dbl>
##  1    6946.
##  2    9399.
##  3    8297.
##  4    7283.
##  5    3329.
##  6    2918.
##  7     233.
##  8    4498.
##  9    4450.
## 10    1008.
## # … with 246 more rows

Now say you wanted to find the minimum distance between every park and its closest fire station: WARNING, the below code chunk may take a long time to run!

dists <- st_distance(dc_parks, fire_stations)

min_dists <- dists %>%
  as_tibble() %>%
  mutate(min_dist = pmap_dbl(., min)) %>%
  select(min_dist)
min_dists
## # A tibble: 256 × 1
##    min_dist
##       <dbl>
##  1    1163.
##  2     503.
##  3     637.
##  4     723.
##  5     119.
##  6     248.
##  7     233.
##  8     653.
##  9     660.
## 10     186.
## # … with 246 more rows

st_distance can also be used to measure the distance between points and polygons, or polygons and polygons. Bear in mind however that it will always calculate the distance to the closest edge of the polygon!

st_centroid()

This returns the centroid of a polygon geometry. This is useful if you want to calculate the distance between say fire stations and the center of a census tract.

ggplot() +
  geom_sf(data = dc_tracts) +
  geom_sf(data = dc_tracts %>% st_centroid())

Exercise 3: Calculating Distances

Step 1: Calculate the distance between the first fire station (use slice()) and the first 5 DC tracts (use slice() again).

Step 2: Calculate the minimum distance between each fire station and each DC tract. Be careful with the order of the sf dataframe you input into st_distance! What do these values look like and why?

Step 3: Calculate the minimum distance between each fire station and the centroid of each DC tract. Append this information to the fire_stations dataframe as a column named closest_fire_station_to_centroid

Step 4:Make a chloropleth map of distance to fire stations for every census tract centroid in DC. Use ggplot() and map the colors (ie the fill= variable) to closest_fire_station_to_centroid.

Miscellanious Spatial Operations

st_union()

Sometimes, you will need to combine multiple geometries together into one geometry. Let’s say for example you wanted the total boundary of DC as a single polygon, but all you had were the individual Census Tracts for all of DC. You can use st_union()! But be careful because by default it returns a sfg object not an sf dataframe, so you need to coerce it back to an sf dataframe

dc_boundary <- st_union(dc_tracts) %>% st_sf()

ggplot() +
  geom_sf(data = dc_tracts) +
  geom_sf(data = dc_boundary, color = palette_urbn_cyan[4], size = 2)

st_crop()

If you need to crop your data, you can use st_crop

You can crop an object into a rectangle by manually setting the coordinates of your rectangle. But remember the coordinates need to match the CRS of the sf dataframe! So if you want to crop using latitudes/longitudes, you need to set the CRS to a system that uses latitudes/longitudes

set_urbn_defaults(style = "print")
dc_tracts_cropped <- st_crop(dc_tracts %>% st_transform(4326),
  xmin = -77.05, xmax = -76, ymin = 38.78, ymax = 38.85
)

ggplot() +
  geom_sf(data = dc_tracts_cropped)

set_urbn_defaults(style = "map")

You can also crop an object using the bounds of another smaller object. This is very useful for example if you are working with county data, and have state boundaries, but want to limit the state boundaries to the area covered by your counties. The syntax is st_crop(x,y), where x is the larger geometry and y is the smaller geometry you are cropping x to.

one_tract <- dc_tracts %>% slice(1)

cropped_dc_tracts <- st_crop(dc_tracts, one_tract)

ggplot() +
  geom_sf(data = cropped_dc_tracts)

A really useful list of all the spatial operations available in SF can be found here:

https://github.com/rstudio/cheatsheets/blob/master/sf.pdf

Spatial Joins In Depth

Last week, we covered spatial joins between points and polygons. But what about a polygon to polygon spatial join? Say we wanted to find out the Census tract(s) that DC parks are located in. We will still use st_join but we need to think critically about what kind of join to use (ie the join= argument inside st_join. This can affect your results without you realizing it.

Exercise 1: Practicing Spatial Joins

Step 1: How many rows are in the dc_parks dataframe? Use the nrow() function.

Step 2: Spatially join dc_tracts to dc_parks and set join=st_intersects. Call the joined data parks_joined. How many rows does parks_joined have? How many rows have a NA value for NAME.y? What do these rows represent?

Step 3: Spatially join dc_tracts to dc_parks and set join=st_covered_by. Call the joined data parks_covered. How many rows does parks_joined have? How many rows have a NA value for NAME.y? What do these rows represent?

Step 4: Using the results from above, use st_join to find all tracts that are completely covered by a park. How many are there and what does this mean?

Spatial joins sometimes return more rows than were in the initial data. By default will never return fewer rows than were in the initial data because spatial joins are by default left joins. The unmatched columns just show up as NA’s after the join. You can turn this behavior off by setting left = FALSE in the st_join() function.

In Step 2, we got more rows that were in the initial data because some parks were located in many census tracts, and each park-tract combination is returned as a row. Let’s take the example of Marvin Gaye Park to see whats happening.

marvin_gaye_park_joined <- st_join(dc_parks %>% filter(NAME == "Marvin Gaye Park"),
  dc_tracts,
  join = st_intersects
)
tracts <- marvin_gaye_park_joined %>% pull(GEOID)
tracts_marvin_gaye <- dc_tracts %>% filter(GEOID %in% tracts)

ggplot() +
  geom_sf(data = tracts_marvin_gaye) +
  geom_sf(data = marvin_gaye_park_joined, fill = "green", col = "white")

Calculating Proportions

With st_join you retain the full spatial information of the left hand side variable. So if you were to look at the shape of each of the 4 rows in marvin_gaye_park_joined, they would be the exact same. Let’s see that with the st_equals() function which checks for equality of geometries.

st_equals(marvin_gaye_park_joined %>% slice(1),
  marvin_gaye_park_joined %>% slice(2),
  sparse = F
)
##      [,1]
## [1,] TRUE

You might be interested in the proportion of the park that falls within each tract for weighting purposes. For that you need to break up the park into four different polygons, each of which falls into a different tract. You can calculate that with st_intersection()!.

park_ints <- st_intersection(
  dc_parks %>% filter(NAME == "Marvin Gaye Park"),
  dc_tracts
)

ggplot(park_ints, mapping = aes(fill = GEOID)) +
  geom_sf()

So now we have four different Polygons representing the parts of the park in each of the 4 tracts it overlaps. Now let’s say we wanted to figure out the proportion of the area falling into each of the tracts. We would first need to figure out the area of each park in total (using st_area()), then intersect the parks with the tracts, calculate the area of the intersected parts, and create a proportion. Here’s what that would look like:

marvin_gaye_park_ints <- dc_parks %>%
  filter(NAME == "Marvin Gaye Park") %>%
  mutate(area_park = st_area(.)) %>%
  st_intersection(dc_tracts) %>%
  mutate(
    area_park_tract = st_area(.),
    prop_area_tract = area_park_tract / area_park
  )

ggplot(marvin_gaye_park_ints) +
  geom_sf(aes(fill = as.numeric(prop_area_tract))) +
  geom_sf(data = tracts_marvin_gaye, fill = "transparent", col = "black")

Exercise 2: Calculating proportion of all parks in all tracts

Step 1: Using the same steps as above, calculate the proportion of all parks in all census tracts. Store this as a dataframe called dc_park_ints.

Step 2: Filter dc_park_ints to the park named “Pope Branch Park” and call it town_center_park_ints. How many tracts does it fall into? What tract has the highest proportion of the park?

Step 3: Make a plot of the Pope Branch Park colored in by the GEOID of the Census Tract each part of the park is in.