14  noise

tidysynthesis sampler elements dictate how new synthetic records are drawn from models. noise elements dictate how additional randomization may optionally be injected beyond the existing randomness due to sampling new records.

14.1 noise components

The function noise() creates a noise S3 object with the following properties:

  • add_noise: required Boolean, either TRUE or FALSE
  • mode: required string, either regression or classification
  • noise_func: a function that applies the noise transformation.

Each noise_func has the following arguments:

  • model: a model fit object from parsnip
  • new_data: a data.frame containing the working synthetic data.
  • conf_model_data: a data.frame containing the confidential data used for modeling purposes.
  • outcome_var: a string variable name for the new variable.
  • col_schema: a col_schema named list element matching the schema.
  • pred: a vector of predicted values from the sampling application.

Additional arguments provided to noise() will be passed to noise_func.

14.2 noise_func examples provided by tidysynthesis

tidysynthesis provides the following example noise functions.

  • add_noise_gaussian: add independent Gaussian noise with mean zero and constant variance to numeric variables.
  • add_noise_laplace: add independent Laplace noise with mean zero and constant variance to numeric variables.
  • add_noise_cat_unif: add independent uniform noise from a categorical distribution to categorical variables.
  • add_noise_kde: add dependent Gaussian noise with mean zero and variance estimated from a kernel density estimate bandwidth for numeric variables.