8  visit_sequence

library(tidyverse)
library(tidysynthesis)

We refer to the order of variables for the sequential synthesis as the visit sequence. The function visit_sequence() creates a visit_sequence S3 object. By default, roadmap() creates a visit sequence based on the ordering of columns in the confidential data.

Recommended Method for Creating visit_sequence

We strongly recommend using the Tidy API to create visit_sequences from a roadmap instance instead of directly creating a visit_sequence instance.

8.1 visit_sequence Construction via Tidy API

To demonstrate how to best construct a visit sequence, we’ll continue our example with the ACS data. First, let us look at the default visit_sequence created by the roadmap constructor:

acs_roadmap_step1 <- roadmap(
  conf_data = acs_conf,
  start_data = acs_start
)
acs_roadmap_step1$visit_sequence
Visit Sequence
Method:Variable
default:hcovany default:empstat default:classwkr default:age default:famsize default:transit_time default:inctot 

Each variable has a method associated with its visit sequence placement. The default method corresponds to variables that don’t have a specific visit_sequence placement determined by a user-specified method.

To update this visit_sequence, tidysynthesis has three API calls:

  • add_sequence_manual(): Specify a manual variable ordering.
  • add_sequence_factor(): Specify a data-driven ordering for categorical variables.
  • add_sequenece_numeric(): Specify a data-driven ordering for numeric variables.

The add_sequence_manual() function allows you to add variables to the sequence by name, in the order they’re provided to the function. To demonstrate, we add the variable famsize to the sequence:

acs_roadmap_step2 <- acs_roadmap_step1 %>%
  add_sequence_manual(famsize)

acs_roadmap_step2$visit_sequence
Visit Sequence
Method:Variable
manual:famsize default:hcovany default:empstat default:classwkr default:age default:transit_time default:inctot 

We see that the first variable is now associated with the manual method while the remaining variables retain their original positions. This allows users to dynamically combine multiple methods for constructing visit sequences when variables have different structural and/or statistical relationships to one another. By design, tidysynthesis’s visit sequence API allows users to focus primarily on the order in which visit sequence methods are applied.

The add_sequence_factor() function adds categorical variables in order from least to greatest information entropy measured on the confidential data. The first argument can be any <tidyselect> expression from library(tidyselect), for example:

acs_roadmap_step3 <- acs_roadmap_step2 %>%
  add_sequence_factor(where(is.factor))

acs_roadmap_step3$visit_sequence
Visit Sequence
Method:Variable
manual:famsize entropy:empstat entropy:hcovany entropy:classwkr default:age default:transit_time default:inctot 

The add_sequence_numeric() function adds numeric variables using a variety of different methods, specified by the method argument with the following options:

  • "correlation": Ordered from greatest to least pearson correlation with a specified variable, cor_var
  • "proportion": Ordered from greatest to least proportion of non-zero values.
  • "weighted total": Ordered from greatest to least weighted total (using weight_var).
  • "absolute weighted total": Ordered from greatest to least absolute value weighted total (using weight_var).
  • "weighted absolute total": Ordered from greatest to least weighted sum of absolute values (using weigth_var).
acs_roadmap_step4 <- acs_roadmap_step3 %>%
  add_sequence_numeric(
    where(is.numeric),
    method = "correlation", 
    cor_var = "age",
    na.rm = TRUE
  )

acs_roadmap_step4$visit_sequence
Visit Sequence
Method:Variable
manual:famsize entropy:empstat entropy:hcovany entropy:classwkr correlation:age correlation:famsize correlation:transit_time correlation:inctot