st_join
to perform spatial joinsst_buffer()
to buffer points (or polygons!)group_by()
and summarize()
to aggregate spatial dataStep 1: Create a new directory called mapping
Step 2: Open RStudio. Click “Project: (None)” in the top right corner. Click “New Project” and create a project based on the existing mapping
directory.
Step 3: Submit install.packages("tidyverse")
to the Console.
Step 4: Submit install.packages("devtools")
to the Console.
Step 5: Submit install.packages("mapview")
to the Console.
Step 6: Submit remotes::install_github("UrbanInstitute/urbnmapr")
to the Console.
Step 7: Submit remotes::install_github("UrbanInstitute/urbnthemes")
to the Console.
Step 8: Write library(tidyverse)
, library(urbnmapr)
, and library(urbnthemes)
at the top of 01_intro-to-mapping.R
. With the cursor on the line of text, click Control-Enter.
Step 1: Open a .R
script with the button in the top left. Save the script as 04_more_spatial_operations.R
.
Step 2: Copy and paste the following to the top of 04_more_spatial_operations.R
. This loads in all the necessary libraries and data for today. With the cursor highlighting all the below text, click Control-Enter.
library(tidyverse)
library(sf)
library(urbnmapr)
library(urbnthemes)
library(tigris)
library(mapview)
set_urbn_defaults(style = "map")
dc_tracts <- tracts(
state = "DC",
year = 2017,
class = "sf",
cb = TRUE,
progress_bar = F
) %>%
st_transform(crs = 6487)
dc_parks <- st_read("https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Recreation_WebMercator/MapServer/9/query?where=1%3D1&outFields=NAME,ADDRESS,DRINKFOUNT&outSR=4326&f=json", quiet = TRUE) %>%
st_transform(crs = 6487)
fire_stations <- st_read("https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Public_Safety_WebMercator/MapServer/6/query?where=1%3D1&outFields=NAME,ADDRESS,TRUCK,AMBULANCE&outSR=4326&f=geojson", quiet = TRUE) %>%
st_transform(crs = 6487)
One great feature for R is support for quickly and easily creating interactive maps. While we do not recommend creating interactive maps for any briefs/products that will be published (mostly because getting the styles to align with the Urban Style guide is a little difficult), interactive maps can be a great tool for exploratory data analyses and helping you understand your data. Today we are going to cover a package called mapview
which will allow you to create interactive maps.
Once you have an sf
dataframe, you can create an interactive map with just one line: mapview(sf_dataframe)
. You can save this map by assigning it to an object. For example map <- mapview(sf_dataframe)
Below are some of the many ways you can change the appearance of your interactive map. For more details, please see: https://r-spatial.github.io/mapview/articles/articles/mapview_01-basics.html
If you want to color in the tracts by a particular variable (ie make a chloropleth map), use the zcol
argument and feed in a column name IN QUOTES. Note this is different from ggplot
where you could use a column name without quotes
If you want to change the variable text displayed on hover, use the label
argument and feed in a column name in QUOTES.
If you want to change the type of basemap being used in the background, use the map.types
argument and feed in some of the following: "CartoDB.Positron", "CartoDB.DarkMatter", "OpenStreetMap", "Esri.WorldImagery", "OpenTopoMap"
. For a full list of basemaps, see here.
If you want to change the title of the legend, you can use the layer.name
argument and feed in the "title"
of your choice.
If you want to change the opacity of the colored in regions of the map, use the alpha.regions
argument and choose a number from 0
to 1
(like 0.3
).
Step 1: Make an interactive map of dc_tracts
colored in by the "ALAND"
variable (the area of the land covered by that tract). Call this map dc_land
.
Step 2: Make an interactive map of dc_tracts
colored in by the "AWATER"
variable (the area of the land covered by that tract). Call this map dc_water
.
Step 3: Now make the Census tract names display on hover. They are contained in the "NAME"
column.
With mapview, it’s really easy to plot two sf
dataframes together on the same map. Just like ggplot, you use the +
operator to add two different maps on top of each other:
mapview(sf_dataframe) +
mapview(another_sf_dataframe)
You can also use the |
operator to create side by side slider maps!
mapview(sf_dataframe) |
mapview(another_sf_dataframe)
Step 1: Now plot the DC parks on top of the DC census tracts using the +
operator. What do you see?
Step 2: For the dc_tracts
mapview statement, set col.regions = "steelblue"
and for the dc_parks
mapview statement set col.regions = "green"
Step 3: Plot the dc_land
and dc_water
maps side by side using the |
operator.
Often, you will want to calculate the distance between two things - like the distance between two points, or a point and a polygon. There’s a handy function called st_distance()
that does this for you! But it spits out a matrix instead of the dataframes we’re used to seeing. This can feel scary but don’t worry, we can easily coerce them into the dataframes we’re use to working with. Say we wanted to get the distance between the first 5 fire stations and 1 park
ggplot() +
geom_sf(data = (dc_parks %>% slice(1)), fill = "green") +
geom_sf(data = (fire_stations %>% slice(1:5)), color = "red")
dists <- st_distance(dc_parks %>% slice(1), fire_stations %>% slice(1:5))
dists
## Units: [m]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 11123.02 7310.232 9973.471 6946.382 8092.819
As you can see there are 5 rows and 1 column, and each column contains the distance between the park and the respective fire station. Keep in mind that the first argument in st_distance
determines the number of rows and the second argument determines the number of columns. Let’s say you wanted the shortest distance between this park and these 5 fire stations. That’s just the minimum of the first row! But because it’s a matrix, we need to do some wrangling to get to the answer.
min_dists <- dists %>%
# Conert matrix to tibble (ie a dataframe)
as_tibble() %>%
mutate(min_dist = pmap_dbl(., min)) %>% # This line looks complicated, but it just calculates the minimum of the row
select(min_dist)
min_dists
## # A tibble: 1 × 1
## min_dist
## <dbl>
## 1 6946.
Now say you wanted to do this for every park in the city. All we have to do is remove the slice
operation from the dc_parks
dataframe:
dists <- st_distance(dc_parks, fire_stations %>% slice(1:5))
min_dists <- dists %>%
as_tibble() %>%
mutate(min_dist = pmap_dbl(., min)) %>%
select(min_dist)
min_dists
## # A tibble: 256 × 1
## min_dist
## <dbl>
## 1 6946.
## 2 9399.
## 3 8297.
## 4 7283.
## 5 3329.
## 6 2918.
## 7 233.
## 8 4498.
## 9 4450.
## 10 1008.
## # … with 246 more rows
Now say you wanted to find the minimum distance between every park and its closest fire station: WARNING, the below code chunk may take a long time to run!
dists <- st_distance(dc_parks, fire_stations)
min_dists <- dists %>%
as_tibble() %>%
mutate(min_dist = pmap_dbl(., min)) %>%
select(min_dist)
min_dists
## # A tibble: 256 × 1
## min_dist
## <dbl>
## 1 1163.
## 2 503.
## 3 637.
## 4 723.
## 5 119.
## 6 248.
## 7 233.
## 8 653.
## 9 660.
## 10 186.
## # … with 246 more rows
st_distance
can also be used to measure the distance between points and polygons, or polygons and polygons. Bear in mind however that it will always calculate the distance to the closest edge of the polygon!
This returns the centroid of a polygon geometry. This is useful if you want to calculate the distance between say fire stations and the center of a census tract.
ggplot() +
geom_sf(data = dc_tracts) +
geom_sf(data = dc_tracts %>% st_centroid())
Step 1: Calculate the distance between the first fire station (use slice()
) and the first 5 DC tracts (use slice()
again).
Step 2: Calculate the minimum distance between each fire station and each DC tract. Be careful with the order of the sf
dataframe you input into st_distance
! What do these values look like and why?
Step 3: Calculate the minimum distance between each fire station and the centroid of each DC tract. Append this information to the fire_stations
dataframe as a column named closest_fire_station_to_centroid
Step 4:Make a chloropleth map of distance to fire stations for every census tract centroid in DC. Use ggplot() and map the colors (ie the fill=
variable) to closest_fire_station_to_centroid
.
Sometimes, you will need to combine multiple geometries together into one geometry. Let’s say for example you wanted the total boundary of DC as a single polygon, but all you had were the individual Census Tracts for all of DC. You can use st_union()
! But be careful because by default it returns a sfg
object not an sf dataframe, so you need to coerce it back to an sf dataframe
dc_boundary <- st_union(dc_tracts) %>% st_sf()
ggplot() +
geom_sf(data = dc_tracts) +
geom_sf(data = dc_boundary, color = palette_urbn_cyan[4], size = 2)
If you need to crop your data, you can use st_crop
You can crop an object into a rectangle by manually setting the coordinates of your rectangle. But remember the coordinates need to match the CRS of the sf dataframe! So if you want to crop using latitudes/longitudes, you need to set the CRS to a system that uses latitudes/longitudes
set_urbn_defaults(style = "print")
dc_tracts_cropped <- st_crop(dc_tracts %>% st_transform(4326),
xmin = -77.05, xmax = -76, ymin = 38.78, ymax = 38.85
)
ggplot() +
geom_sf(data = dc_tracts_cropped)
set_urbn_defaults(style = "map")
You can also crop an object using the bounds of another smaller object. This is very useful for example if you are working with county data, and have state boundaries, but want to limit the state boundaries to the area covered by your counties. The syntax is st_crop(x,y)
, where x is the larger geometry and y is the smaller geometry you are cropping x to.
one_tract <- dc_tracts %>% slice(1)
cropped_dc_tracts <- st_crop(dc_tracts, one_tract)
ggplot() +
geom_sf(data = cropped_dc_tracts)
A really useful list of all the spatial operations available in SF can be found here:
Last week, we covered spatial joins between points and polygons. But what about a polygon to polygon spatial join? Say we wanted to find out the Census tract(s) that DC parks are located in. We will still use st_join
but we need to think critically about what kind of join to use (ie the join=
argument inside st_join
. This can affect your results without you realizing it.
Step 1: How many rows are in the dc_parks
dataframe? Use the nrow()
function.
Step 2: Spatially join dc_tracts
to dc_parks
and set join=st_intersects
. Call the joined data parks_joined
. How many rows does parks_joined
have? How many rows have a NA value for NAME.y? What do these rows represent?
Step 3: Spatially join dc_tracts
to dc_parks
and set join=st_covered_by
. Call the joined data parks_covered
. How many rows does parks_joined
have? How many rows have a NA value for NAME.y? What do these rows represent?
Step 4: Using the results from above, use st_join
to find all tracts that are completely covered by a park. How many are there and what does this mean?
Spatial joins sometimes return more rows than were in the initial data. By default will never return fewer rows than were in the initial data because spatial joins are by default left joins. The unmatched columns just show up as NA’s after the join. You can turn this behavior off by setting left = FALSE
in the st_join()
function.
In Step 2, we got more rows that were in the initial data because some parks were located in many census tracts, and each park-tract combination is returned as a row. Let’s take the example of Marvin Gaye Park to see whats happening.
marvin_gaye_park_joined <- st_join(dc_parks %>% filter(NAME == "Marvin Gaye Park"),
dc_tracts,
join = st_intersects
)
tracts <- marvin_gaye_park_joined %>% pull(GEOID)
tracts_marvin_gaye <- dc_tracts %>% filter(GEOID %in% tracts)
ggplot() +
geom_sf(data = tracts_marvin_gaye) +
geom_sf(data = marvin_gaye_park_joined, fill = "green", col = "white")
With st_join
you retain the full spatial information of the left hand side variable. So if you were to look at the shape of each of the 4 rows in marvin_gaye_park_joined
, they would be the exact same. Let’s see that with the st_equals()
function which checks for equality of geometries.
st_equals(marvin_gaye_park_joined %>% slice(1),
marvin_gaye_park_joined %>% slice(2),
sparse = F
)
## [,1]
## [1,] TRUE
You might be interested in the proportion of the park that falls within each tract for weighting purposes. For that you need to break up the park into four different polygons, each of which falls into a different tract. You can calculate that with st_intersection()
!.
park_ints <- st_intersection(
dc_parks %>% filter(NAME == "Marvin Gaye Park"),
dc_tracts
)
ggplot(park_ints, mapping = aes(fill = GEOID)) +
geom_sf()
So now we have four different Polygons representing the parts of the park in each of the 4 tracts it overlaps. Now let’s say we wanted to figure out the proportion of the area falling into each of the tracts. We would first need to figure out the area of each park in total (using st_area()
), then intersect the parks with the tracts, calculate the area of the intersected parts, and create a proportion. Here’s what that would look like:
marvin_gaye_park_ints <- dc_parks %>%
filter(NAME == "Marvin Gaye Park") %>%
mutate(area_park = st_area(.)) %>%
st_intersection(dc_tracts) %>%
mutate(
area_park_tract = st_area(.),
prop_area_tract = area_park_tract / area_park
)
ggplot(marvin_gaye_park_ints) +
geom_sf(aes(fill = as.numeric(prop_area_tract))) +
geom_sf(data = tracts_marvin_gaye, fill = "transparent", col = "black")
Step 1: Using the same steps as above, calculate the proportion of all parks in all census tracts. Store this as a dataframe called dc_park_ints
.
Step 2: Filter dc_park_ints
to the park named “Pope Branch Park” and call it town_center_park_ints
. How many tracts does it fall into? What tract has the highest proportion of the park?
Step 3: Make a plot of the Pope Branch Park colored in by the GEOID
of the Census Tract each part of the park is in.