7 Hyperparameter Tuning
The predictive power of many predictive models is shaped by the choice of hyperparameters. The process of selecting hyperparameters is called hyperparameter tuning. Here, we will focus on empirical hyperparameter tuning using \(v\)-fold cross-validation to optimize for a particular user-specified metric.
Hyperparameter tuning each variable in a visit sequence is a high-dimensional task. This chapter demonstrates how library(tidysynthesis)
can simplify the process of hyperparameter tuning predictive models to improve syntheses.
7.1 Specifying Tuners
construct_tuners()
creates one hyperparameter tuning object for each variable in the entire visit sequence. The tuner expects several parameters:
v
is the number of folds to use in v-fold cross validation.grid
is the number of unique candidate parameter sets to be created.metrics
is ametric_set
object fromlibrary(yardstick)
.
The construct_tuners()
function follows a similar design to construct_recipes()
, construct_algos()
, and construct_samplers()
. There are three approaches to creating tuners.
- Use the same tuner for all variables.
- Use a default tuner and manually override it for specific variables.
- Manually specify the tuner for each variable.
7.1.1 Approach 1
Only specify a default_tuner
to use the same grid for all variables in the visit sequence.
7.1.2 Approach 2
This example uses a small tuning grid for bill_length_mm
and flipper_length_mm
, and then uses a big tuning grid for body_mass_g
and bill_depth_mm.
7.1.3 Approach 3
It is also possible to manually specify the tuner for each variable. In the following example, bill_length_mm
, flipper_length_mm
, and body_mass_g
don’t use hyperparameter tuning and bill_depth_mm
uses hyperparameter tuning.
grid_values <- list(
v = 3,
grid = 5,
metrics = yardstick::metric_set(yardstick::rmse)
)
tuners3 <- construct_tuners(
roadmap = roadmap,
default_tuner = NULL,
custom_tuners = list(
"bill_length_mm" = grid_values,
"flipper_length_mm" = grid_values,
"body_mass_g" = big_grid_values,
"bill_depth_mm" = big_grid_values
)
)
7.2 Mapping Tuners to Models
Only some algorithms use hyperparameters. Tuners need to be associated with specific hyperparameters in the tidymodels framework using library(tune)
. To use one of these algorithms with hyperparameter tuning, replace the hyperparameter with tune::tune()
when specifying an algorithm.
synthesize()
will use cross-validation to pick the hyparameters with optimal performance in cross validation. Finally, it will train the model using all of the data and the optimal hyperparamater.
Now, we can create the synth_spec
object and presynth
object, and synthesize.
7.3 Extracting hyperparameters
synth_spec
includes an argument called extract
. Extract supports extract functions from workflows and can be used to pull information from fitted models.
Here, we can run the last few steps with workflows::extract_fit_parsnip()
and then review the optimal hyperparameters for each variable.
synth_spec <- synth_spec(
roadmap = roadmap,
synth_algorithms = rpart_mod,
recipes = construct_recipes(roadmap = roadmap),
predict_methods = sample_rpart,
tuners = tuners3,
extract = workflows::extract_fit_parsnip
)
# presynth
presynth <- presynth(
roadmap = roadmap,
synth_spec = synth_spec
)
set.seed(1)
synth <- synthesize(presynth)
map_dbl(synth[["extractions"]], ~.x[["fit"]][["control"]][["cp"]])
bill_length_mm flipper_length_mm body_mass_g bill_depth_mm
1.026118e-02 4.592906e-03 2.268017e-10 7.321297e-08
7.4 Conclusion
Hyperparameter tuning is important for predictive modeling, which makes it important for generating synthetic data using predictive models. The impacts of hyperparameter tuning on utility and disclosure risks are not well documented. Tuning using cross-validation could reduce disclosure risk but this should be evaluated on a case-by-case basis.