5 synth_spec
The synth_spec
object specifies the process for each conditional synthesis stage. The function synth_spec()
creates a synth_spec
S3 object that specifies the following kinds of elements:
model
s:parsnip::model_spec
objects used to fit models based on the confidential data.sampler
s: functions used to generate new synthetic values based on model output applied to the current starting data.step
s: functions that applyrecipe::step_*()
functions for feature engineering, data cleaning, and other preprocessing steps.noise
s:tidysynthesis::noise
S3 objects for incorporating additive noise into synthesized samples.tuner
s: specially formattedtidysynthesis
named lists for hyperparameter tuning in models.extractor
s:parsnip::extract_
functions used to collect and return model information as part ofpostsynth
.
6 synth_spec()
Component Mapping Arguments
For each synthesized variable, a model
and sampler
must be explicitly specified. However, within the synth_spec
object itself, all elements are optional to allow for flexible specifications. For example, default_regression_model
and default_classification_model
may not be necessary if a comprehensive custom_models
list is provided. These checks are performed when creating a presynth
instance.
synth_spec
evaluation order with presynth
.
By default, synth_spec()
assigns NULL
to each component so they may be optionally specified by users as necessary. Error checking for whether synth_spec
components are sufficiently specified only occur when creating presynth
instances.
synth_spec
allows users to specify how synthesize variables get mapped to different elements. While the synth_spec()
constructor function has many arguments, they follow an explicit pattern:
default_regression_*
arguments apply to all numeric variables unless otherwise specified. Used in…default_regression_model
default_regression_steps
default_regression_sampler
default_regression_noise
default_regression_tuner
default_classification_*
arguments apply to all categorical variables unless otherwise specified. Used in…default_classification_model
default_classification_steps
default_classification_sampler
default_classification_noise
default_classification_tuner
default_*
arguments apply to all variables unless otherwise specified (used indefault_extractor
).custom_*
arguments apply to specific variables in a named list format described below. Used in…custom_models
custom_steps
custom_samplers
custom_noise
custom_tuners
custom_extractors
The custom_*
arguments are supplied a list of named lists, each of which has two elements: one corresponding to the variable or variable(s) with custom elements, and the element value itself. Each set of arguments follows the pattern here:
For example, here’s how one might specify different models for our running ACS example data:
# define parsnip models
# regression trees
rpart_mod <- decision_tree() |>
set_engine(engine = "rpart") |>
set_mode(mode = "regression")
# linear regresssion
lm_mod <- linear_reg() |>
set_engine(engine = "lm") |>
set_mode(mode = "regression")
# classification trees
rpart_class <- decision_tree() |>
set_engine(engine = "rpart") |>
set_mode(mode = "classification")
# create example synth_spec
example_synth_spec <- synth_spec(
# for all variables except 'age', use CART
default_regression_model = rpart_mod,
default_classification_model = rpart_class,
# for the 'age' variable, use linear regression
custom_models = list(
list("vars" = c("age"), "model" = lm_mod)
),
# (same for samplers)
default_regression_sampler = sample_rpart,
default_classification_sampler = sample_rpart,
custom_samplers = list(
list("vars" = c("age"), "sampler" = sample_lm)
)
)
print(example_synth_spec)
Synthesis specification with user-specified components:
* default_regression_model
* default_classification_model
* custom_models
* default_regression_sampler
* default_classification_sampler
* custom_samplers
6.1 Using synth_spec
with the Tidy API
Each of the elements of synth_spec
can be added, updated, or removed in place. update_synth_spec()
is used to add, update, or remove all non-custom_*
arguments to synth_spec
, including:
default_regression_model
default_classification_model
default_regression_steps
default_classification_steps
default_regression_sampler
default_classification_sampler
default_regression_noise
default_classification_noise
default_regression_tuner
default_classification_tuner
default_extractor
invert_transformations
enforce_na
Each custom_*
argument has dedicated add_*()
, update_*()
, and remove_*()
API calls.