5 synth_spec
The synth_spec object specifies the process for each conditional synthesis stage. The function synth_spec() creates a synth_spec S3 object that specifies the following kinds of elements:
models:parsnip::model_specobjects used to fit models based on the confidential data.samplers: functions used to generate new synthetic values based on model output applied to the current starting data.steps: functions that applyrecipe::step_*()functions for feature engineering, data cleaning, and other preprocessing steps.noises:tidysynthesis::noiseS3 objects for incorporating additive noise into synthesized samples.tuners: specially formattedtidysynthesisnamed lists for hyperparameter tuning in models.extractors:parsnip::extract_functions used to collect and return model information as part ofpostsynth.
6 synth_spec() Component Mapping Arguments
For each synthesized variable, a model and sampler must be explicitly specified. However, within the synth_spec object itself, all elements are optional to allow for flexible specifications. For example, default_regression_model and default_classification_model may not be necessary if a comprehensive custom_models list is provided. These checks are performed when creating a presynth instance.
synth_spec evaluation order with presynth.
By default, synth_spec() assigns NULL to each component so they may be optionally specified by users as necessary. Error checking for whether synth_spec components are sufficiently specified only occur when creating presynth instances.
synth_spec allows users to specify how synthesize variables get mapped to different elements. While the synth_spec() constructor function has many arguments, they follow an explicit pattern:
default_regression_*arguments apply to all numeric variables unless otherwise specified. Used in…default_regression_modeldefault_regression_stepsdefault_regression_samplerdefault_regression_noisedefault_regression_tuner
default_classification_*arguments apply to all categorical variables unless otherwise specified. Used in…default_classification_modeldefault_classification_stepsdefault_classification_samplerdefault_classification_noisedefault_classification_tuner
default_*arguments apply to all variables unless otherwise specified (used indefault_extractor).custom_*arguments apply to specific variables in a named list format described below. Used in…custom_modelscustom_stepscustom_samplerscustom_noisecustom_tunerscustom_extractors
The custom_* arguments are supplied a list of named lists, each of which has two elements: one corresponding to the variable or variable(s) with custom elements, and the element value itself. Each set of arguments follows the pattern here:
For example, here’s how one might specify different models for our running ACS example data:
# define parsnip models
# regression trees
rpart_mod <- decision_tree() |>
set_engine(engine = "rpart") |>
set_mode(mode = "regression")
# linear regresssion
lm_mod <- linear_reg() |>
set_engine(engine = "lm") |>
set_mode(mode = "regression")
# classification trees
rpart_class <- decision_tree() |>
set_engine(engine = "rpart") |>
set_mode(mode = "classification")
# create example synth_spec
example_synth_spec <- synth_spec(
# for all variables except 'age', use CART
default_regression_model = rpart_mod,
default_classification_model = rpart_class,
# for the 'age' variable, use linear regression
custom_models = list(
list("vars" = c("age"), "model" = lm_mod)
),
# (same for samplers)
default_regression_sampler = sample_rpart,
default_classification_sampler = sample_rpart,
custom_samplers = list(
list("vars" = c("age"), "sampler" = sample_lm)
)
)
print(example_synth_spec)Synthesis specification with user-specified components:
* default_regression_model
* default_classification_model
* custom_models
* default_regression_sampler
* default_classification_sampler
* custom_samplers
6.1 Using synth_spec with the Tidy API
Each of the elements of synth_spec can be added, updated, or removed in place. update_synth_spec() is used to add, update, or remove all non-custom_* arguments to synth_spec, including:
default_regression_modeldefault_classification_modeldefault_regression_stepsdefault_classification_stepsdefault_regression_samplerdefault_classification_samplerdefault_regression_noisedefault_classification_noisedefault_regression_tunerdefault_classification_tunerdefault_extractorinvert_transformationsenforce_na
Each custom_* argument has dedicated add_*(), update_*(), and remove_*() API calls.