5  synth_spec

The synth_spec object specifies the process for each conditional synthesis stage. The function synth_spec() creates a synth_spec S3 object that specifies the following kinds of elements:

6 synth_spec() Component Mapping Arguments

For each synthesized variable, a model and sampler must be explicitly specified. However, within the synth_spec object itself, all elements are optional to allow for flexible specifications. For example, default_regression_model and default_classification_model may not be necessary if a comprehensive custom_models list is provided. These checks are performed when creating a presynth instance.

synth_spec evaluation order with presynth.

By default, synth_spec() assigns NULL to each component so they may be optionally specified by users as necessary. Error checking for whether synth_spec components are sufficiently specified only occur when creating presynth instances.

synth_spec allows users to specify how synthesize variables get mapped to different elements. While the synth_spec() constructor function has many arguments, they follow an explicit pattern:

  • default_regression_* arguments apply to all numeric variables unless otherwise specified. Used in…
    • default_regression_model
    • default_regression_steps
    • default_regression_sampler
    • default_regression_noise
    • default_regression_tuner
  • default_classification_* arguments apply to all categorical variables unless otherwise specified. Used in…
    • default_classification_model
    • default_classification_steps
    • default_classification_sampler
    • default_classification_noise
    • default_classification_tuner
  • default_* arguments apply to all variables unless otherwise specified (used in default_extractor).
  • custom_* arguments apply to specific variables in a named list format described below. Used in…
    • custom_models
    • custom_steps
    • custom_samplers
    • custom_noise
    • custom_tuners
    • custom_extractors

The custom_* arguments are supplied a list of named lists, each of which has two elements: one corresponding to the variable or variable(s) with custom elements, and the element value itself. Each set of arguments follows the pattern here:

custom_element = list(
  list(
    "vars" = c("var1", "var2"),
    "element" = element_value_1
  ),
  list(
    "vars" = c("var3"),
    "element" = element_value_2
  )
)

For example, here’s how one might specify different models for our running ACS example data:

library(tidyverse)
library(tidymodels)
library(tidysynthesis)
# define parsnip models
# regression trees 
rpart_mod <- decision_tree() |>
  set_engine(engine = "rpart") |>
  set_mode(mode = "regression")

# linear regresssion
lm_mod <- linear_reg() |> 
  set_engine(engine = "lm") |> 
  set_mode(mode = "regression")

# classification trees
rpart_class <- decision_tree() |>
  set_engine(engine = "rpart") |>
  set_mode(mode = "classification")

# create example synth_spec
example_synth_spec <- synth_spec(
  # for all variables except 'age', use CART
  default_regression_model = rpart_mod,
  default_classification_model = rpart_class, 
  # for the 'age' variable, use linear regression
  custom_models = list(
    list("vars" = c("age"), "model" = lm_mod)
  ),
  # (same for samplers)
  default_regression_sampler = sample_rpart,
  default_classification_sampler = sample_rpart,
  custom_samplers = list(
    list("vars" = c("age"), "sampler" = sample_lm)
  )
)

print(example_synth_spec)
Synthesis specification with user-specified components: 
* default_regression_model
* default_classification_model
* custom_models
* default_regression_sampler
* default_classification_sampler
* custom_samplers

6.1 Using synth_spec with the Tidy API

Each of the elements of synth_spec can be added, updated, or removed in place. update_synth_spec() is used to add, update, or remove all non-custom_* arguments to synth_spec, including:

  • default_regression_model
  • default_classification_model
  • default_regression_steps
  • default_classification_steps
  • default_regression_sampler
  • default_classification_sampler
  • default_regression_noise
  • default_classification_noise
  • default_regression_tuner
  • default_classification_tuner
  • default_extractor
  • invert_transformations
  • enforce_na

Each custom_* argument has dedicated add_*(), update_*(), and remove_*() API calls.