urbanschemes
Overview
urbanschemes
is a custom Stata scheme for creating Urban Institute-themed visualizations (see scheme manual). The scheme is built upon the popular s2color
scheme, with further customizations to align with the Urban Institute Data Visualization Style Guide.
This scheme is currently under development and is subject to change (current version: 0.2.0). You can provide feedback, suggest ideas, or propose improvements via GitHub Issues or by email (Jennifer Andre - jandre@urban.org). There are many tools available that make Urban-themed data visualization simpler; urbanschemes
is just one option for Stata users.
Background
Stata offers a lot of flexibility for data visualizations. This flexibility is great for developing complex, reproducibile visualizations - but it often requires long and potentially confusing syntax. The goal of this scheme is to include much of this syntax as default settings, so that the user can focus on the most important plotting decisions.
This scheme is not intended to cover all of the vast range of graphing capabilities and options. Refer to Stata documentation to learn more about graph commands. Currently, this scheme is optimized to produce report-ready static figures without titles, subtitles, citations, or notes. These must be included separately in your report.
Here are some helpful resources on schemes: * Stata graphs: Define your own color schemes, by Asjad Naqvi on Medium * Scheming your way to your favorite graph style, by Kristin MacDonald on The Stata Blog * Intro to schemes, Stata documentation
Setup
Install
By default, these instructions will install the scheme files into your “PLUS” ado directory (see ado details, see net details). Use command sysdir
to confirm this location.
net install urbanschemes, replace from("https://urbaninstitute.github.io/urbanschemes/")
Fonts
The Urban Institute uses Lato font for publications. Make sure Lato is installed before proceeding. The Lato font cannot be included in the scheme and must be set independently.
Getting Started
Set the scheme and font at the beginning of a .do file with the following commands:
set scheme urbanschemes
graph set window fontface "Lato"
Instead of globally setting the scheme, you may alternatively include scheme(urbanschemes)
as a graph command option.
These commands can be included in your profile.do
to automatically run on startup (see details).
Tips
This section covers a range of guidelines for using the scheme. While the scheme defines many default settings, users often must turn these on or off within a graph command code chunk by adding or modifying an option.
Note that you can only have one instance of each option for a given graph command. For example, if your plotting code already includes a ylab
option, you cannot add another separate one. Instead, you must combine the suboptions within one option. For example:
ylab(, suboption1 suboption2)
Italic Font
The Urban Institute Data Visualization Style Guide indicates that axes titles should be in italicized Lato font. While Lato is called using graph set window fontface "Lato"
, italics must be indicated independently within plot code chunks using Stata Markup Control Language (SMCL).
For example:
xtitle("{it:This is my x-axis title}")
y-axis Titles
The y-axis title should be horizontal aross the top left corner of the plot. This change is not automatically reflected in the scheme. Add the following options to your graph command to make this change:
subtitle("{it:This is my y-axis title"}) ytitle("")
Grid lines
Grid lines (dotted, thin, gray) are on by default. To turn these off, add the following line to your plot code:
ylab(, glcolor(white))
To change the grid lines to solid lines, add the following line to your plot code:
ylab(, glpattern(solid))
Depending on your plot, you may choose to further customize the glcolor
, glwidth
, or glpattern
suboptions within ylab
.
y-axis line and labels
For many plots, the Urban Institute Data Visualization Style Guide omits the solid y-axis line. To remove the line, turn it white by adding the following line to your plot code:
yscale(lcolor(white))
You may also wish to remove y-axis labels, especially if you are instead including labeled values within the plot itself. To remove these labels, add the following line to your plot code:
ylab(, nolab)
Ticks
Axis ticks are on by default for many plots. To remove these ticks, add the following line(s) to your plot code:
ylab(, noticks) xlab(, noticks)
Legend
By default, the scheme places the legend at the top of a chart. Based on the chart type, you might want to add some additional white space around it. To do so, add the following line to your plot code, where [X] is an integer value indicating the relative percentage of white space you want to add:
plotregion(margin(t = [X]))
Axis gap
For some plots, you may find an undesirable gap between the plot and an axis. To remove this gap from the x-axis, add the following line to your plot code:
plotregion(margin(b = 0))
To remove this gap from the y-axis, add the following line to your plot code:
plotregion(margin(l = 0))
Scheme Colors
For any plot, you can always customize the colors of bars, lines, markers, etc. Refer to the style guide for RGB color codes.
If you do not specify a color, the scheme will automatically utilize the following colors for various chart elements in this order:
Color | Name | RGB |
---|---|---|
1 | cyan 5 | “22 150 210” |
2 | yellow | “253 191 17” |
3 | black | “0 0 0” |
4 | magenta | “236 0 139” |
5 | gray | “210 210 210” |
6 | space gray | “92 88 89” |
7 | green | “85 183 72” |
8 | red | “219 43 39” |
9 | cyan 1 | “207 232 243” |
10 | cyan 2 | “162 212 236” |
11 | cyan 3 | “115 191 226” |
12 | cyan 4 | “70 171 219” |
13 | cyan 6 | “18 113 158” |
14 | cyan 7 | “10 76 106” |
15 | cyan 8 | “6 38 53” |
Exporting Plots
Your chosen file format may vary given your publication needs. For a print publication, a format like .svg
or .emf
may provide highest quality. Sample plots in this repository are included as .svg
files.
You can export a generated plot with the following command, where [FILE TYPE]
may be svg
, emf
, png
, etc.:
graph export "[PATH]\[PLOT NAME].[FILE TYPE]", replace
If exporting as .svg
, you may need to include an additional suboption to maintain the Lato fontface.
graph export "[PATH]\[PLOT NAME].svg", fontface(Lato) replace
Examples
The example plots in this section show how to utilize urbanschemes
when creating common plot types. These examples do not capture all possibilities, but cover some common plotting choices. Throughout this section, refer to inline code comments following //
for brief explanations.
These examples use datasets included with a Stata installation. After loading one of these datasets, use notes
to see some documentation, if available.
Note that there is some loss of quality in the embedded images below - refer to the image files included in this repository for better quality.
Bar/Column Plot
The following bar charts visualize population by US region using the census
dataset included with a Stata installation.
Example 1: This example displays population values (in millions) along the y-axis, although the y-axis line itself is removed. The y-axis title is placed along the top of the chart and the grid lines pattern is dotted.
sysuse census, clear
collapse (sum) pop, by(region)
gen pop_mill = pop / 1000000
graph bar pop_mil, over(region) /// // plot population (millions) by region
subtitle("{it:Population (millions)}") /// // subtitle = y-axis title
ytitle("") /// // remove y-axis title from side of plot
ylab(, format(%2.0f) noticks) /// // format y-axis labels to two digits and remove ticks
yscale(lcolor(white)) // remove y-axis line
Example 2: This example labels each bar with the corresponding population value, and no longer displays y-axis labels. The y-axis title is removed, so users should be sure to adequately describe the plot in the title included in any report or presentation materials. Grid lines are removed.
sysuse census, clear
collapse (sum) pop, by(region)
graph bar pop, over(region) /// // plot population by region
blabel(total, format(%12.0fc)) /// // label bars with total population, formatted with commas
ytitle("") /// // remove y-axis title from side of plot
ylab(, glcolor(white) noticks nolab) /// // remove grid lines, y-axis ticks, and y-axis labels
yscale(lcolor(white)) // remove y-axis line
Example 3: This example treats each region as a separate y-variable, allowing us to more easily control bar colors. Other options align with the previous example.
sysuse census, clear
collapse (sum) pop, by(region)
graph bar pop, over(region) /// // plot population by region
asyvars /// // plot region populations as separate variables (to easily control colors)
showyvars /// // show region labels on x-axis
blabel(total, format(%12.0fc)) /// // label bars with total population, formatted with commas
bargap(75) /// // increase space between bars
ytitle("") /// // remove y-axis title from side of plot
ylab(, glcolor(white) noticks nolab) /// // remove grid lines, y-axis ticks, and y-axis labels
yscale(lcolor(white)) /// // remove y-axis line
legend(off) // turn legend off
Grouped Bar/Column Plot
The following bar charts visualize January and July average temperature by US region using the citytemp
dataset included with a Stata installation.
Example 1: This example displays temperature values along the y-axis, although the y-axis line itself is removed. The y-axis title is placed along the top of the chart and the grid lines pattern is dotted. The legend is placed above the plot area.
sysuse citytemp, clear
graph bar tempjan tempjuly, over(region) /// // plot jan and june temp by region
subtitle("{it:Average temperature (f)}") /// // subtitle = y-axis title
ylab(, noticks) /// // remove y-axis ticks
yscale(lc(white)) /// // remove y-axis line
legend(label(1 "January") label(2 "July")) /// // relabel legend
plotregion(margin(t = 6)) // make space on top of plot for legend
Example 2: This example labels each bar with the corresponding temperature value, and no longer displays y-axis labels. The y-axis title is removed, so users should be sure to adequately describe the plot in the title included in any report or presentation materials. The legend is placed above the plot area. Grid lines are removed.
sysuse citytemp, clear
graph bar tempjan tempjuly, over(region) /// // plot jan and june temp by region
blabel(total, format("%2.0f")) /// // label bars with temperatures formatted to two digits
ylab(, glcolor(white) noticks nolab) /// // remove grid lines, y-axis ticks, and y-axis labels
yscale(lc(white)) /// // remove y-axis line
legend(label(1 "January") label(2 "July")) /// // relabel legend
plotregion(margin(t = 12)) // make space on top of plot for legend
Horizontal bar chart
This bar chart visualizes average car price by manufacturer using the auto
dataset included with a Stata installation.
With a horizontal bar chart, the y-axis is “flipped” to the horizontal position but must still be referred to with y-axis options. This example displays average car prices along the y-axis, although the y-axis line itself is removed. The y-axis title is placed along the top of the chart and the grid lines pattern is dotted.
sysuse auto, clear
split make, p(" ")
graph hbar (mean) price, ///
over(make1, sort(1) descending) /// // sort bars in descending order of 1st (only) variable
subtitle("{it:Average price (dollars)}") //// // subtitle = y-axis title
ytitle("") /// // remove y-axis title - moved to subtitle position
ylab(, noticks) /// // remove y-axis ticks
yscale(lc(white)) /// // remove y-axis line
plotregion(margin(b = 0 t = 0)) // remove gap at bottom and top of plot
Line Plot
This line plot compares average US life expectancy over time for white males and Black males using the uslifeexp
dataset included with a Stata installation.
This example displays age values along the y-axis, although the y-axis line itself is removed. The y-axis title is placed along the top of the chart and the grid lines pattern is dotted. The legend is placed above the plot area.
sysuse uslifeexp, clear
line le_wm le_bm year, /// // plot life expectancy over time by race
subtitle("{it:Life expectancy (years)}") /// // subtitle = y-axis title
ylab(0(10)80, noticks) /// // reset y-axis to begin at 0, remove y-axis ticks
yscale(lc(white)) /// // remove y-axis line
xtitle("") /// // remove unnecessary x-axis title ("Years")
legend(label(1 "White Males") label(2 "Black Males")) /// // relabel legend
plotregion(margin(b = 0 t = 6)) // remove gap at bottom of plot, make space on top of plot for legend
Scatter Plot with Best Fit Line
This scatter plot with best fit line explores the relationship between automobile weight and mileage using the auto
dataset included with a Stata installation. This example is intended to demonstrate a more complex twoway
plot and provides examples of customizing some urbanschemes
defaults.
A twoway
plot allows us to overlay multiple plots. The first scatter
plots all points, and we choose to customize the msize
. The lfit
line plots a predicted line of best fit, and we specify the color and width of this line. The plot also displays the correlation coefficient value.
sysuse auto, clear
corr mpg weight // store correlation coefficient
local rho = string(r(rho), "%03.2f")
di("`rho'")
twoway ///
(scatter mpg weight, msize(1.5)) || /// // scatter mpg and weight
(lfit mpg weight, lcolor("236 0 139") lwidth(.2)), /// // fit predicted line, change color and width
subtitle("{it:Mileage (mpg)}") //// // subtitle = y-axis title
xtitle("{it:Weight (lbs)}") /// // x-axis title
xlab(, noticks) /// // remove x-axis ticks
ylab(, noticks) /// // remove y-axis ticks
legend(off) /// // turn off legend
text(11 4450 `"Corr = `rho'"') // add correlation coefficient
Histogram
This histogram explores the distribution of S&P 500 opening prices using the sp500
dataset included with a Stata installation.
The y-axis title is placed along the top of the chart and the y-axis line is removed. Only y-axis ticks are removed and the grid lines pattern is dotted.
sysuse sp500, clear
histogram open, ///
subtitle("{it:Density}") //// // subtitle = y-axis title
ytitle("") /// // remove y-axis title
xtitle("{it:Open price}") /// // x-axis title
ylab(, noticks) // remove y-axis ticks
Kernel density plot
We can make a comparative kernel density plot comparing the distributions of MPG for foreign and domestic cars using the auto
dataset included with a Stata installation.
We first store the kernel density options in a local for use in the twoway plot command. The y-axis title is placed along the top of the chart and the y-axis line is removed. The legend is placed above the plot area. Only y-axis ticks are removed and the grid lines pattern is dotted.
sysuse auto, clear
local options kernel(biweight) bwidth(5) recast(area) boundary color(%50) // set kernel density options
graph two ///
(kdensity mpg if foreign, `options') /// // kernel density plot for foreign with options
(kdensity mpg if !foreign, `options'), /// // kernel density plot for domestic with options
subtitle("{it:Density}") //// // subtitle = y-axis title
ytitle("") /// // remove y-axis title
xtitle("{it:MPG}") /// // x-axis title
ylab(, noticks) /// // remove y-axis ticks
legend(label(1 "Foreign") label(2 "Domestic")) /// // relabel legend
plotregion(margin(b = 0)) // remove gap at bottom of plot
Box plot
We can compare blood pressure by age group before and after a treatment by using the bpwide
dataset included with a Stata installation.
The y-axis title is placed along the top of the chart and the y-axis line is removed. The legend is placed above the plot area. Axis ticks are removed and the grid lines pattern is dotted.
sysuse bpwide, clear
graph box bp_before bp_after, over(agegrp) /// // box plot over age groups
subtitle("{it:Blood pressure by age group}") /// // subtitle = y-axis title
ylab(, noticks) /// // remove y-axis ticks
plotregion(margin(t = 10)) // make space on top of plot for legend
Contact
Contact Jennifer Andre (jandre@urban.org) with questions or to provide feedback.