renv: How to save your collaboRators (and future you) grief

Will Curran-Groome & Erika Tyagi

Motivation

Two Principles

  1. Our work should be accurate and reproducible1.
  1. We can’t prove accuracy without reproducibility.

Reproducibility should be the minimum standard for all computational social sciences.

Two Implications

  1. We should adopt code-first data analysis.
  2. We should organize our work in notebooks and/or packages.

Two Three Implications

  1. We should adopt code-first data analysis.
  2. We should organize our work in notebooks and/or packages.
  1. We should run code in environments that produce consistent results across different computers.

What is a virtual environment?

What is a virtual environment?

Virtual environments promote reproducibility by letting you let you specify project-specific versions of packages.

  • They make it easy to take a snapshot of the packages used in a project and restore the snapshot on other computers.
  • They also make it easy to switch between snapshots on a single computer as you switch between projects.

Why should I use them?

“But it works on my computer!”

  • You published an analysis based on 2021 data. A year later, you want to update the analysis with 2022 data, but your code no longer works.
  • Your coworker is running into errors running your code because of differences in package versions between your computers.
  • You got a new computer and don’t want to spend hours resolving errors to remember which R packages to install.
  • You need to use an older version of a package for one project and a newer version of the package for another project.
  • Your code takes a long time to run locally, so you want to leverage powerful computers in the cloud.

https://xkcd.com/1172

Why should I use renv?

The renv package helps you create reproducible environments for your R projects. Use renv to make your R projects more isolated, portable and reproducible.

  • Isolated: Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because renv gives each project its own private library.
  • Portable: Easily transport your projects from one computer to another, even across different platforms. renv makes it easy to install the packages your project depends on.
  • Reproducible: renv records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.

What can’t renv do?

  • renv is not a panacea for reproducibility.
  • renv is a tool that can help make projects reproducible by helping with one part of the overall problem: R packages.
  • There are other pieces that renv doesn’t currently provide much help with: R versions, Pandoc, operating system, versions of system libraries, compiler versions, etc.

https://xkcd.com/2347

What is renv?

A package you install.

A couple new files and folders in your project folder.

A set of functions you use to document your project’s dependencies.

The solution to all of your problems.

  • Me, after three hours without renv: Will, you’re an idiot–you’re using that one version of the Qualtrics API that’s broken.

  • Me, after .001 seconds with renv: Will, you’re an idiot–you need to install version 3.1.6 of Qualtrics: renv::install("qualtRics@3.1.6")

… or at least a few of them.

How do I set up renv?

Set up GitHub.

Configure your .gitignore.

Add an RStudio Project to your new project folder.

  1. Open your .Rproj file, then your code

  2. Use relative paths: here::here("data", "raw-data", "input1.csv")

Refer to R for Data Science for more on project workflows: https://r4ds.had.co.nz/workflow-projects.html

Set up renv with renv::init().

This adds to your project the infrastructure needed for renv.

Shortcut?

Create some subfolders.

Because organization is good and saves you time.

Install some packages with renv::install().

“… you can continue to use familiar tools like install.packages(). But you can also use renv::install(): it’s a little less typing and can install packages from GitHub, Bioconductor, and more, not just CRAN.” - The docs.

  • Straight from GitHub?

  • Yes. renv::install("UrbanInstitute/urbnthemes")

Update your lockfile with renv::snapshot().

This records your packages (and their sources) in the lockfile, creating a reference for future you (and current and future collaborators).

Wait, a lockfile you said?

renv.lock, to be exact. It’s JSON and you can open it in Notepad (or wherever).

Double check your .gitignore.

  • git status
  • .gitignore review.

Push to GitHub.

  1. git status

  2. git add -A

  3. git commit -m "a brief, descriptive message describing what you've done"

  4. git push -u origin main

Collaborate using renv::restore().

Cloning a repository? Pulling updates ? renv::restore() loads (and installs, if necessary) packages based on the lockfile.

In review.

  1. Create GitHub repository.

  2. Create your .Rproj file.

  3. renv::init() to configure renv.

  4. renv::install() to install packages.

  5. renv::snapshot() to record your packages in the lockfile.

  6. Push to GitHub.

  7. renv::restore() to ensure your loaded packages reflect the lockfile.

Gotchas

.gitignore and required renv files.

Make sure your .gitignore includes:

  • renv.lock

  • .Rprofile

  • renv/settings.json

  • renv/activate.R

Happily, the renv/ folder by default contains its own .gitignore that will exclude unneeded files.

Adding to (and updating) your lockfile.

Sometimes not everything you want is reflected in your lockfile.

  1. Ensure you’ve run renv::snapshot().

  2. Try renv::install().

  3. Run renv::dependencies() to check what packages renv is detecting.

  4. Slack for help!

Virtual Desktops

The workflow for renv should be the same on virtual desktops.

  • You may run into challenges with installing certain packages without administrator privileges. Reach out to Helpdesk@urban.org for assistance; this is not specific to using renv.

Mid-project adoption of renv.

  • Process is the same as beginning-of-project setup.

  • Communicate with collaborators to ensure everyone’s aware of renv workflow.

  • If you’re feeling detail-oriented, set up renv, push to GitHub, then clone a fresh copy of the repository to check that everything’s working.

Next steps

Resources

  • Urban’s guide to using virtual environments
  • Urban’s #reproducible-research and #github Slack channels
  • The official renv documentation
  • E. David Aja’s talk titled “You should be using renv”
  • The R for Data Science chapter on project workflows
  • The “What They Forgot to Teach You About R” book