Review

  • Making maps in R is really powerful and easy
  • The sf package handles point and polygon spatial data
  • sf objects are dataframes with an additional geometry column
library(tidyverse)
library(sf)
library(urbnmapr)
library(urbnthemes)
library(tigris)
set_urbn_defaults(style = "map")

Exercise 0: Create a Project (if you didn’t do this last week)

Step 1: Create a new directory called mapping

Step 2: Open RStudio. Click “Project: (None)” in the top right corner. Click “New Project” and create a project based on the existing mapping directory.

Step 3: Submit install.packages("tidyverse") to the Console.

Step 4: Submit install.packages("devtools") to the Console.

Step 5: Submit remotes::install_github("UrbanInstitute/urbnmapr") to the Console.

Step 6: Submit remotes::install_github("UrbanInstitute/urbnthemes") to the Console.

Step 7: Write library(tidyverse), library(urbnmapr), and library(urbnthemes) at the top of 01_intro-to-mapping.R. With the cursor on the line of text, click Control-Enter.

Exercise 0: Setup

Step 1: Open a .R script with the button in the top left. Save the script as 02_import_export.R.

Step 2: Submit install.packages("tigris"), into the console and wait for the package to install.

Step 2: Copy and paste the following to the top of 02_import_export.R. This loads in all the necessary libraries for today. With the cursor highlighting all the below text, click Control-Enter.

library(urbnmapr)
library(urbnthemes)
library(tidyverse)
library(sf)
library(tigris)
set_urbn_defaults(style = "map")

Reading in spatial data

Out in the wild, spatial data comes in many formats but some of the common ones you will encounter are:

  • CSV’s with lon/lat columns
  • GeoJSON’s (.geojson)
  • Shapefiles (.shp)

the sf package has a function called st_read() that makes reading in any of these types of files easy. The function takes in an argument called dsn or data source name

For GeoJSONs and Shapefiles:

data <- st_read(dsn = "path/to/geojson/file.geojson")
data <- st_read(dsn = "path/to/shp/file.shp")

For CSV’s: You have two options!

1- Read in the file using st_read, but manipulate some GDAL options

data <- st_read(dsn = "path/to/csv/file.csv",
  options = c("X_POSSIBLE_NAMES=lon", "Y_POSSIBLE_NAMES=lat")
)

2- Read in the file as a data frame and convert it to an sf object


data <- read_csv("path/to/csv/file.csv")
data <- st_as_sf(data, coords = c("lon", "lat"))

NOTE: X = Longitude & Y = Latitude. You will at some point mix these two up, but try to keep them straight.

st_read can also accept URL’s!

data <- st_read("https://opendata.arcgis.com/datasets/287eaa2ecbff4d699762bbc6795ffdca_9.geojson")

Notice that it figured out what filetype it was and what kind of spatial data it was.

Exercise 1: Read in a GeoJSON

Step 1: Read in data on the locations of all fire stations in DC. Create a variable named fire_stations and using st_read, read in the following GeoJSON from this URL: https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Public_Safety_WebMercator/MapServer/6/query?where=1%3D1&outFields=NAME,ADDRESS,TRUCK,AMBULANCE&outSR=4326&f=geojson

Step 2: Type in fire_stations into your console and press enter. What does this SF dataframe look like?

Exercise 2: Read in a CSV

Step 1: Copy and paste the following link (https://opendata.arcgis.com/datasets/0a73011064ae4580a4a8539de03060d1_14.csv) into your browser. This will download a CSV dataset of all the public wifi hostpots in DC.

Step 2: Create a folder called data in your mapping101 folder, and move the downloaded CSV into that folder

Step 3: Find the filepath to the CSV (it should be something like ‘data/example.csv’) and read it into R as an sf dataframe named hotspots. Note: You will need to open the CSV to see what the X and Y columns are labelled as. You can use any of the two methods described above

Writing out spatial data

sf has a function called st_write that can write out sf objects as GeoJSON’s, shapefiles, CSVS, Geopackages, or just about any other spatial format supported by GDAL. Shapefiles have been the defacto standard used by spatial developers and GIS professionals for the past 20+ years, but they have some serious problems:

  1. A shapefile is composed of atleast 3 files, making data transfer tricky
  2. Their file size is limited to 2 GB
  3. It’s a closed source propietary ESRI file format (though drivers to read/write shapefiles have been open sourced)
  4. Column names are limited to 10 characters

We highly recommend saving your spatial files as GeoJSON’s for for a few reasons:

  1. It’s an open source file format that can easily be read back into R, ArcGIS, or Python
  2. It uses standardized projections and naming conventions, which saves you a lot of headaches down the road
  3. It’s a lightweight single file, makes it easy to transfer small to medium sized files
  4. It doesn’t have any file size limitations! Though if you have really large spatial files, you should look into geopackages


Here is how you would write out different file formats using sf

Writing out GeoJSONs and Shapefiles:

st_write(sf_data, "output/path/filename.geojson")
st_write(sf_data, "output/path/filename.shp")

Writing out CSV’s

There may be some cases when you need to write out your spatial data as a CSV to share with other folks. sf does let you do that, but the lat/lon columns will either be labelled X/Y or be a single WKT column.

X Y
-76.9651 38.89204
WKT
POINT (-76.9651, 38.89204)

Here’s how you write out CSV’s both ways using st_write:

# Writing out X and Y columns to CSV
st_write(sf_data, "data/exparkslocations.csv",
  layer_options = "GEOMETRY=AS_XY"
)

# Writing out WKT columns to CSV
st_write(sf_data, "data/exparkslocations.csv",
  layer_options = "GEOMETRY=AS_WKT"
)

But sometimes, we want to write out CSVs with human readable Latitude and Longitude columns instead of X and Y columns as they are easy to mix up (quick - is X latitude or longitude?) . sf doesn’t give you the ability to do this easily. So we need to extract the coordinates from the SF dataframe, append them as columns to our dataframe, then rename them latitude/longitude.

# getting lon/lat columns from dataframe
coords <- st_coordinates(data) %>%
  as_tibble() %>%
  dplyr::rename(lon = X, lat = Y)

# appending lon/lat columns to dataframe
data_to_write <- data %>%
  st_set_geometry(NULL) %>%
  bind_cols(coords)

# writing out dataframe as csv
write_csv(data_to_write, "path/to/file.csv")

Exercise 3: Writing out Spatial data

Step 1: Let’s say we were only interested in relatively large fire stations. Use the filter() function to limit the fire_stations dataframe to stations that have more than 5 trucks (ie TRUCK > 5). Call this filtered dataframe big_stations

Step 2: Write out the big_stations dataframe as a GeoJSON into the data/ folder we created earlier using st_write(). Call the file big_stations.geojson.

Step 3: Write out the big_stations dataframe as a CSV with lat/lon columns into the data/ folder we created earlier using st_write(). Call the file big_stations.csv.

Where can I find spatial data?

The tigris package in R is a great place to start looking. tigris provides spatial data for:

  • Census Tracts, Blocks, & Block Groups
  • Counties, ZCTA’s, PUMA’s, Places
  • congressional districts, School Districts
  • roads, railways, native areas, military bases
  • Lots of other data sources!

It provides powerful functionality by allowing you to filter to specific states, counties, or tracts. Say for example I wanted data on all block groups in DC

dc_tracts <- tigris::block_groups(
  state = "DC",
  year = 2017,
  class = "sf",
  progress_bar = FALSE
)

ggplot() +
  geom_sf(data = dc_tracts, mapping = aes()) +
  theme_urbn_map()

There’s also a sister package called tidycensus that provides easy access to census data in addition to spatial data. Say for example you wanted data on the population counts in Montgomery County census tracts from from the 1 year ACS:

# Note you need to sign up for a free Census API key here:
# http://api.census.gov/data/key_signup.html

api_key <- "12345678"
md_tracts <- get_acs(
  geography = "tract", year = 2016,
  endyear = 2016, state = "MD",
  county = "Montgomery", variables = "B19013_001",
  geometry = TRUE, key = api_key
)


ggplot() +
  geom_sf(data = dc_tracts, mapping = aes()) +
  theme_urbn_map()

Below are some other common places to find spatial data

Exercise 4: Using tigris

Step 1: Use the school_districts() function in tigris to download all the school districts in Oregon from the year 2015. Call the variable or_school_districts. Make sure to set class = "sf"!

Step 2: Use ggplot() and geom_sf() to make a plot of all the school districts in Oregon

Coordinate Reference Systems

Whenever you create a map you have to make assumptions about 1) the exact 3d shape of the earth and 2) how to project that 3d shape onto a 2d surface. That’s where Coordinate Reference Systems (CRS) come in! There are two kinds of Coordinate Reference Systems:

  1. Geographic coordinate systems:
  • Are a 3d representation of the earth
  • Uses spheroid/ellipsoid surface to approximate shape of the earth
  • Usually use decimal degree units (ie latitude/longitude) to identify locations on earth

  1. Projected coordinate systems
  • Are 2d representations of the earth
  • Is a particular geographic coordinate system + a particular projection
    • projection: mathematical formula used to convert a 3d coordinate system to a 2d flat coordinate system
    • Many different kinds of projections, including Equal Area, Equidistant, Con formal
    • All projections distort the true shape of the earth in some way, either in terms of shape, area, or angle. Required xkcd comic
  • Usually use linear units (ie feet, meters) and therefore useful for distance based spatial operations (ie creating buffers)

Below is a gif showing how the popular but Eurocentric Mercator projection works

Finding and Setting CRS

In sf there’s a function called st_crs() that will tell us what the CRS of an sf dataframe is. Let’s use that on our fire_stations dataframe

st_crs(fire_stations)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]

As you can see there is an EPSG code and a proj4 string, either of which can uniquely identify the CRS being used. The EPSG codes are usually easier to work with and are a string of 4-6 numbers that represent CRS definitions. The acronym EPSG, comes from the now defunct European Petrol Survey Group. Good resources for learning more about an EPSG code (such as their default units and main areas of use) are spatialreference.org and epsg.io

After you read spatial data into R, the first thing you should do is check that the CRS is specified, In most cases, sf should automatically detect the right CRS, but sometimes it fails and the CRS may be empty. In this case it’s up to you to provide the CRS (ie the EPSG code). You can set the CRS using st_set_crs(). You can also transform the CRS (if the data has a non standard CRS you don’t want to use) using st_transform(). Some tips for finding the EPSG codes of a dataset:

  • For GeoJSON’s, the EPSG code will always be 4326 as that is a requirement for all GeoJSON files.
  • For Shapefiles, you will need to open the .prj file and look for the CRS code.
    • For CSV’s, assuming you have latitude and longitude points, it’s a good bet that the EPSG code is 4326. But you may have to do some digging into metadata to find out exactly which crs is used.

Exercise 5: Working with CRS’s

Step 1: Download the zipped shapefile of Toronto’s zoning boundaries from here: ftp://webftp.vancouver.ca/OpenData/shape/zoning_districts_shp.zip

Step 2: Unzip the contents of the shapefile into the data folder of the mapping directory.

Step 3: Find the filepath to the zoning_districts.shp file in the data folder and read the shapefile into R. Name the object tor_zones.

Step 3: Type tor_zones into the console and take a look at the geometry column. Do these points look funky?

Step 4: Confirm that sf used a CRS when it read in the shapefile by using st_crs() on the tor_zones dataframe.

Step 5: Use epsg.io to find out what the units of this CRS is.

Step 6: The actual coordinates of this coordinate system may look foreign to people used to lat/lons. So transform the CRS to EPSG:4326 to get latitude/longitude points. Name the transformed dataframe tor_zones_transformed.