1  Data Visualization

Author

Fay Walker

Published

Invalid Date

1.1 Review

Why Use R instead of Excel?

  • Reproducibility
  • Scalability
  • Flexibility
  • Iterative
  • Open Source

Remember

  • Environment/scripts/console
  • Comment your code as you go
  • Read your error messages
  • Where possible, type, don’t copy and paste - you will remember better!
  • The assignment operator

1.2 Exercise 0 (Set up a project and load packages)

Step 1: Open RStudio. File > New Project > New Directory > Select the location where you would like to create a new folder that houses your R Project. Call it urbn101

Step 2: Open an .R script with the button in the top left (sheet with a plus sign icon). Save the script as 01_data-visualization.R.

Step 3: Type install.packages("tidyverse") and hit enter (submit) to the Console.

Step 4: Write library(tidyverse) and at the top of 01_data-visualization.R. With the cursor on the line of text, hit Control-Enter (at the same time).

Step 5: Repeat steps 3 & 4 with install.packages("ggplot2") and library(ggplot2), respectively.

  • tidyverse - a collection of libraries that use the same syntax/grammar (more on this on Monday)
  • ggplot2 - for making plots/graphs

1.3 Exercise 1 (Make a plot)

Step 1: Submit data() to the console. We will use the airquality dataset.

Step 2: Type the following in your script:

ggplot(data=airquality)+
  geom_point(mapping=aes(x=Temp, y=Ozone))
  • Data frames are the only appropriate input for library(ggplot2).

Step 3: Add a comment above the ggplot2 code that describes the plot we created.

Step 4: Add comments below the data visualization code that describes the argument or function that corresponds to each of the first three components of the grammar of graphics.

Data are the values represented in the visualization.

Aesthetic mappings are directions for how data are mapped in a plot in a way that we can perceive. Aesthetic mappings include linking variables to the x-position, y-position, color, fill, shape, transparency, and size.

Geometric objects are representations of the data, including points, lines, and polygons.

1.4 Exercise 2 (Change some of the plot settings)

Step 1: Duplicate the code from your first chart. Inside aes(), add color = "red" (separated by a comma)

Step 2: Move color = "red" from aes() to geom_point(). What changed?

Step 3: Remove color = "red" and add color = Month inside aes().

Step 4: This is a little cluttered. Add alpha = 0.2 inside geom_point().

Step 5: Add a plus sign to the end of the geom_point line, type labs(title="Air Quality Temperature and Ozone Readings")

Aesthetic mappings like x and y almost always vary with the data. Aesthetic mappings like color, fill, shape, transparency, and size can vary with the data. But those arguments can also be added as styles that don’t vary with the data. If you include those arguments in aes(), they will show up in the legend (which can be annoying!).

1.5 Exercise 3 (Add a regression line/confidence interval)

Step 1: Reconfigure the data so that the geompoint parentheses are empty (move mapping=aes(x=Temp, y=Ozone) to the ggplot line. Add + to the labs line and add geom_smooth()

1.6 Exercise 4 (Scale the axes)

Step 1: Create a new scatter plot using the msleep data set. Use bodywt on the x-axis and sleep_total on the y-axis.

Step 2: The y-axis doesn’t contain zero. Below geom_point(), add scale_y_continuous(lim = c(0, NA)). Hint: add + after geom_point().

Step 3: The x-axis is clustered near zero. Add scale_x_log10() above scale_y_continuous(lim = c(0, NA)).

Scales Turn data values, which are quantitative or categorical, into aesthetic values. This includes not only the x-axis and y-axis, but the ranges of sizes, shapes, and colors of aesthetics.

1.7 Exercise 5 (Make it look nice!!)

Step 1: Add the following code to your script. Submit it!

ggplot(storms)+ 
geom_bar(mapping=aes(category))

Step 2: Run install.packages("remotes") and remotes::install_github("UrbanInstitute/urbnthemes") in the console.

Step 3: In the lines preceding the chart add and run the following code:

library(urbnthemes)
set_urbn_defaults(style = "print")

Step 4: Run the code to make the chart.

Step 5: Add scale_y_continuous(expand = expand_scale(mult = c(0, 0.1))) and rerun the code.

Theme controls the visual style of plot with font types, font sizes, background colors, margins, and positioning.

1.8 Excercise 6 (Multiple little graphs/Faceting)

Step 1: Read in Zillow Observed Rent Index (ZORI) data using read_csv('https://raw.githubusercontent.com/UI-Research/urbn101-intro-r/master/homework/zillow_clean.csv')

Step 2: Clean the zillow data to just the most expensive ciites using code below:

zillow_clean <- zillow %>%
  arrange(desc(Avg_price)) %>%
  slice(1:10) %>%
  ggplot()+
  geom_point(mapping=aes(x=Year, y=Avg_price))

Step 3: In ggplot, plot the zillow_clean data as a scatter plot with year on the x axis and avg_price on the y axis.

Step 4: Add facet_wrap(~RegionName) after the geom_point(mapping=aes(x=Year, y=Avg_price)) line.

1.9 Exercise 7 (Mapping)

Step 1: Read in UFO sighting data (source: National UFO Reporting Center), link: https://raw.githubusercontent.com/UI-Research/urbn101-intro-r/master/homework/ufo_state.csv

Step 1: Install urbnmapr devtools::install_github("UrbanInstitute/urbnmapr"), load the urbnmapr library, and update the urbnthemes style to map set_urbn_defaults(style = "map").

Step 2:: Pull a shapefile of US States using urbnmapr and join the UFO counts by state to the shapefile using the code below:

states_sf <- get_urbn_map("states", sf = TRUE)

states_ufo <- states_sf %>% 
  left_join(ufo, by=c("state_abbv"="state"))

Step 3:: In ggplot, plot the states_ufo dataset, fill it in using the “count” column, add labels, change the outline (colour). Instead of using geom_point or geom_bar using geom_sf.

Mapping Resources

1.10 Functions

  • ggplot(): Create a plot, pull in data
  • aes(): The aesthetics that show up in the legend
  • geom_*(): What kind of graph
    • geom_point()
    • geom_line()
    • geom_bar()
    • geom_col()
    • geom_sf()
  • scale_*(): The units on the x/y axis (discrete/continuous)
    • scale_y_continuous()
  • labs(): Labels
    • x/y/title/fill/colour

1.11 Theory

  1. Data
  2. Aesthetic mappings
  3. Geometric objects
  4. Scales
  5. Coordinate systems
  6. Facets
  7. Statistical transformations
  8. Theme

1.12 Resources