sf
package handles point and polygon spatial datalibrary(tidyverse)
library(sf)
library(urbnmapr)
library(urbnthemes)
library(tigris)
set_urbn_defaults(style = "map")
Step 1: Create a new directory called mapping
Step 2: Open RStudio. Click “Project: (None)” in the top right corner. Click “New Project” and create a project based on the existing mapping
directory.
Step 3: Submit install.packages("tidyverse")
to the Console.
Step 4: Submit install.packages("devtools")
to the Console.
Step 5: Submit remotes::install_github("UrbanInstitute/urbnmapr")
to the Console.
Step 6: Submit remotes::install_github("UrbanInstitute/urbnthemes")
to the Console.
Step 7: Write library(tidyverse)
, library(urbnmapr)
, and library(urbnthemes)
at the top of 01_intro-to-mapping.R
. With the cursor on the line of text, click Control-Enter.
Step 1: Open a .R
script with the button in the top left. Save the script as 02_import_export.R
.
Step 2: Submit install.packages("tigris")
, into the console and wait for the package to install.
Step 2: Copy and paste the following to the top of 02_import_export.R
. This loads in all the necessary libraries for today. With the cursor highlighting all the below text, click Control-Enter.
library(urbnmapr)
library(urbnthemes)
library(tidyverse)
library(sf)
library(tigris)
set_urbn_defaults(style = "map")
Out in the wild, spatial data comes in many formats but some of the common ones you will encounter are:
the sf
package has a function called st_read()
that makes reading in any of these types of files easy. The function takes in an argument called dsn
or data source name
For GeoJSONs and Shapefiles:
data <- st_read(dsn = "path/to/geojson/file.geojson")
data <- st_read(dsn = "path/to/shp/file.shp")
For CSV’s: You have two options!
1- Read in the file using st_read, but manipulate some GDAL options
data <- st_read(dsn = "path/to/csv/file.csv",
options = c("X_POSSIBLE_NAMES=lon", "Y_POSSIBLE_NAMES=lat")
)
2- Read in the file as a data frame and convert it to an sf object
data <- read_csv("path/to/csv/file.csv")
data <- st_as_sf(data, coords = c("lon", "lat"))
NOTE: X = Longitude & Y = Latitude. You will at some point mix these two up, but try to keep them straight.
st_read
can also accept URL’s!
data <- st_read("https://opendata.arcgis.com/datasets/287eaa2ecbff4d699762bbc6795ffdca_9.geojson")
Notice that it figured out what filetype it was and what kind of spatial data it was.
Step 1: Read in data on the locations of all fire stations in DC. Create a variable named fire_stations
and using st_read
, read in the following GeoJSON from this URL: https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Public_Safety_WebMercator/MapServer/6/query?where=1%3D1&outFields=NAME,ADDRESS,TRUCK,AMBULANCE&outSR=4326&f=geojson
Step 2: Type in fire_stations
into your console and press enter. What does this SF dataframe look like?
Step 1: Copy and paste the following link (https://opendata.arcgis.com/datasets/0a73011064ae4580a4a8539de03060d1_14.csv) into your browser. This will download a CSV dataset of all the public wifi hostpots in DC.
Step 2: Create a folder called data
in your mapping101 folder, and move the downloaded CSV into that folder
Step 3: Find the filepath to the CSV (it should be something like ‘data/example.csv’) and read it into R as an sf dataframe named hotspots
. Note: You will need to open the CSV to see what the X and Y columns are labelled as. You can use any of the two methods described above
sf has a function called st_write
that can write out sf objects as GeoJSON’s, shapefiles, CSVS, Geopackages, or just about any other spatial format supported by GDAL. Shapefiles have been the defacto standard used by spatial developers and GIS professionals for the past 20+ years, but they have some serious problems:
We highly recommend saving your spatial files as GeoJSON’s for for a few reasons:
Here is how you would write out different file formats using sf
Writing out GeoJSONs and Shapefiles:
st_write(sf_data, "output/path/filename.geojson")
st_write(sf_data, "output/path/filename.shp")
Writing out CSV’s
There may be some cases when you need to write out your spatial data as a CSV to share with other folks. sf does let you do that, but the lat/lon columns will either be labelled X/Y or be a single WKT column.
X | Y |
---|---|
-76.9651 | 38.89204 |
WKT |
---|
POINT (-76.9651, 38.89204) |
Here’s how you write out CSV’s both ways using st_write
:
# Writing out X and Y columns to CSV
st_write(sf_data, "data/exparkslocations.csv",
layer_options = "GEOMETRY=AS_XY"
)
# Writing out WKT columns to CSV
st_write(sf_data, "data/exparkslocations.csv",
layer_options = "GEOMETRY=AS_WKT"
)
But sometimes, we want to write out CSVs with human readable Latitude and Longitude columns instead of X and Y columns as they are easy to mix up (quick - is X latitude or longitude?) . sf
doesn’t give you the ability to do this easily. So we need to extract the coordinates from the SF dataframe, append them as columns to our dataframe, then rename them latitude/longitude.
# getting lon/lat columns from dataframe
coords <- st_coordinates(data) %>%
as_tibble() %>%
dplyr::rename(lon = X, lat = Y)
# appending lon/lat columns to dataframe
data_to_write <- data %>%
st_set_geometry(NULL) %>%
bind_cols(coords)
# writing out dataframe as csv
write_csv(data_to_write, "path/to/file.csv")
Step 1: Let’s say we were only interested in relatively large fire stations. Use the filter()
function to limit the fire_stations
dataframe to stations that have more than 5 trucks (ie TRUCK > 5
). Call this filtered dataframe big_stations
Step 2: Write out the big_stations
dataframe as a GeoJSON into the data/
folder we created earlier using st_write()
. Call the file big_stations.geojson
.
Step 3: Write out the big_stations
dataframe as a CSV with lat/lon columns into the data/
folder we created earlier using st_write()
. Call the file big_stations.csv
.
The tigris
package in R is a great place to start looking. tigris
provides spatial data for:
It provides powerful functionality by allowing you to filter to specific states, counties, or tracts. Say for example I wanted data on all block groups in DC
dc_tracts <- tigris::block_groups(
state = "DC",
year = 2017,
class = "sf",
progress_bar = FALSE
)
ggplot() +
geom_sf(data = dc_tracts, mapping = aes()) +
theme_urbn_map()
There’s also a sister package called tidycensus
that provides easy access to census data in addition to spatial data. Say for example you wanted data on the population counts in Montgomery County census tracts from from the 1 year ACS:
# Note you need to sign up for a free Census API key here:
# http://api.census.gov/data/key_signup.html
api_key <- "12345678"
md_tracts <- get_acs(
geography = "tract", year = 2016,
endyear = 2016, state = "MD",
county = "Montgomery", variables = "B19013_001",
geometry = TRUE, key = api_key
)
ggplot() +
geom_sf(data = dc_tracts, mapping = aes()) +
theme_urbn_map()
Below are some other common places to find spatial data
Step 1: Use the school_districts()
function in tigris to download all the school districts in Oregon from the year 2015. Call the variable or_school_districts.
Make sure to set class = "sf"
!
Step 2: Use ggplot()
and geom_sf()
to make a plot of all the school districts in Oregon
Whenever you create a map you have to make assumptions about 1) the exact 3d shape of the earth and 2) how to project that 3d shape onto a 2d surface. That’s where Coordinate Reference Systems (CRS) come in! There are two kinds of Coordinate Reference Systems:
Below is a gif showing how the popular but Eurocentric Mercator projection works