8 visit_sequence
We refer to the order of variables for the sequential synthesis as the visit sequence. The function visit_sequence()
creates a visit_sequence
S3 object. By default, roadmap()
creates a visit sequence based on the ordering of columns in the confidential data.
visit_sequence
We strongly recommend using the Tidy API to create visit_sequences
from a roadmap
instance instead of directly creating a visit_sequence
instance.
8.1 visit_sequence
Construction via Tidy API
To demonstrate how to best construct a visit sequence, we’ll continue our example with the ACS data. First, let us look at the default visit_sequence
created by the roadmap
constructor:
acs_roadmap_step1 <- roadmap(
conf_data = acs_conf,
start_data = acs_start
)
acs_roadmap_step1$visit_sequence
Visit Sequence
Method:Variable
default:hcovany default:empstat default:classwkr default:age default:famsize default:transit_time default:inctot
Each variable has a method associated with its visit sequence placement. The default
method corresponds to variables that don’t have a specific visit_sequence
placement determined by a user-specified method.
To update this visit_sequence
, tidysynthesis
has three API calls:
add_sequence_manual()
: Specify a manual variable ordering.add_sequence_factor()
: Specify a data-driven ordering for categorical variables.add_sequenece_numeric()
: Specify a data-driven ordering for numeric variables.
The add_sequence_manual()
function allows you to add variables to the sequence by name, in the order they’re provided to the function. To demonstrate, we add the variable famsize
to the sequence:
acs_roadmap_step2 <- acs_roadmap_step1 %>%
add_sequence_manual(famsize)
acs_roadmap_step2$visit_sequence
Visit Sequence
Method:Variable
manual:famsize default:hcovany default:empstat default:classwkr default:age default:transit_time default:inctot
We see that the first variable is now associated with the manual
method while the remaining variables retain their original positions. This allows users to dynamically combine multiple methods for constructing visit sequences when variables have different structural and/or statistical relationships to one another. By design, tidysynthesis
’s visit sequence API allows users to focus primarily on the order in which visit sequence methods are applied.
The add_sequence_factor()
function adds categorical variables in order from least to greatest information entropy measured on the confidential data. The first argument can be any <tidyselect>
expression from library(tidyselect)
, for example:
acs_roadmap_step3 <- acs_roadmap_step2 %>%
add_sequence_factor(where(is.factor))
acs_roadmap_step3$visit_sequence
Visit Sequence
Method:Variable
manual:famsize entropy:empstat entropy:hcovany entropy:classwkr default:age default:transit_time default:inctot
The add_sequence_numeric()
function adds numeric variables using a variety of different methods, specified by the method
argument with the following options:
"correlation"
: Ordered from greatest to least pearson correlation with a specified variable,cor_var
"proportion"
: Ordered from greatest to least proportion of non-zero values."weighted total"
: Ordered from greatest to least weighted total (usingweight_var
)."absolute weighted total"
: Ordered from greatest to least absolute value weighted total (usingweight_var
)."weighted absolute total"
: Ordered from greatest to least weighted sum of absolute values (usingweigth_var
).
acs_roadmap_step4 <- acs_roadmap_step3 %>%
add_sequence_numeric(
where(is.numeric),
method = "correlation",
cor_var = "age",
na.rm = TRUE
)
acs_roadmap_step4$visit_sequence
Visit Sequence
Method:Variable
manual:famsize entropy:empstat entropy:hcovany entropy:classwkr correlation:age correlation:famsize correlation:transit_time correlation:inctot