6  start_method

For various reasons, such as increasing the privacy protection of the synthetic data, we may want to randomly modify start_data as part of the synthesis process. We can make these modifications through resampling, noise injection, or other modeling and sampling based procedures. We can use the start_method argument in roadmap to specify an initial stage of the synthesis process dedicated to this randomization.

library(tidyverse)
library(tidysynthesis)

First, we must define a basic roadmap. We will continue to use our running ACS example:

# create example start_data and a roadmap
acs_start <- acs_conf_nw |>
  select(county, gq) 

acs_roadmap <- roadmap(
  conf_data = acs_conf_nw,
  start_data = acs_start
)

glimpse(acs_start)
Rows: 1,500
Columns: 2
$ county <fct> Other, Other, Other, Other, Douglas, Lancaster, Other, Sarpy, O…
$ gq     <fct> Household, Household, Household, Household, Household, Househol…

6.1 Example start_method functions

We specify the start methods with the start_method S3 object. The object has one required argument, start_func, a function that accepts and returns a data.frame with the same number of columns. The number of rows can randomly vary.

tidysynthesis provides the example function start_resample(), which resamples from an (optionally noisy) empirical distribution of frequencies for different observed values in the start_data. Below we demonstrate its functionality:

# original start data
table(acs_start)
           gq
county      Other GQ Household Institution
  Other           18       822          27
  Douglas         10       312          11
  Lancaster        9       163           4
  Sarpy            1       123           0
# resample new start_data
acs_start_new <- start_resample(
  start_data = acs_start, # data to resample
  n = 1000, # number of values to resample
  inv_noise_scale = 1.0, # adds geometric noise to the empirical frequencies
  support = "all" # include all combinations of factor levels
)

table(acs_start_new)
           gq
county      Other GQ Household Institution
  Other           11       523          13
  Douglas          5       229          12
  Lancaster        3       125           4
  Sarpy            5        70           0

6.2 Using start_method with the tidysynthesis API

To use a start_method in a synthesis, we must create the start_method object. All arguments besides start_func to the constructor are passed to this function as optional arguments. Here, we replicate the example from above:

my_start_method <- start_method(
  start_func = start_resample,
  # optional arguments passed to `start_func`
  n = 1000,
  inv_noise_scale = 1.0,
  support = "all" 
)

# display `start_method` object properties
my_start_method
Start Method: User-Specified 
Keyword Arguments: 
n: 1000
inv_noise_scale: 1
support: all

To associate a start_method with a roadmap, there are a few different options. One option is you can pass a start_method directly into a roadmap’s constructor:

# Option 1: use the `roadmap` constructor
roadmap_w_start_data <- roadmap(
  conf_data = acs_conf_nw,
  start_data = acs_start,
  start_method = my_start_method
)

Another option is you can use the API calls add_start_method or update_start_method to associate the start_method with the roadmap.

# Option 2: use `add_start_method()`
roadmap_w_start_data <- acs_roadmap %>%
  add_start_method(my_start_method)

# Option 3: pass arguments directly to `update_start_method()`
roadmap_w_start_data <- acs_roadmap %>%
  update_start_method(
    start_func = start_resample,
    n = 1000,
    inv_noise_scale = 1.0,
    support = "all" 
  )