6 start_method
For various reasons, such as increasing the privacy protection of the synthetic data, we may want to randomly modify start_data
as part of the synthesis process. We can make these modifications through resampling, noise injection, or other modeling and sampling based procedures. We can use the start_method
argument in roadmap
to specify an initial stage of the synthesis process dedicated to this randomization.
First, we must define a basic roadmap
. We will continue to use our running ACS example:
# create example start_data and a roadmap
acs_start <- acs_conf_nw |>
select(county, gq)
acs_roadmap <- roadmap(
conf_data = acs_conf_nw,
start_data = acs_start
)
glimpse(acs_start)
Rows: 1,500
Columns: 2
$ county <fct> Other, Other, Other, Other, Douglas, Lancaster, Other, Sarpy, O…
$ gq <fct> Household, Household, Household, Household, Household, Househol…
6.1 Example start_method
functions
We specify the start methods with the start_method
S3 object. The object has one required argument, start_func
, a function that accepts and returns a data.frame
with the same number of columns. The number of rows can randomly vary.
tidysynthesis
provides the example function start_resample()
, which resamples from an (optionally noisy) empirical distribution of frequencies for different observed values in the start_data
. Below we demonstrate its functionality:
gq
county Other GQ Household Institution
Other 18 822 27
Douglas 10 312 11
Lancaster 9 163 4
Sarpy 1 123 0
# resample new start_data
acs_start_new <- start_resample(
start_data = acs_start, # data to resample
n = 1000, # number of values to resample
inv_noise_scale = 1.0, # adds geometric noise to the empirical frequencies
support = "all" # include all combinations of factor levels
)
table(acs_start_new)
gq
county Other GQ Household Institution
Other 11 523 13
Douglas 5 229 12
Lancaster 3 125 4
Sarpy 5 70 0
6.2 Using start_method
with the tidysynthesis
API
To use a start_method
in a synthesis, we must create the start_method
object. All arguments besides start_func
to the constructor are passed to this function as optional arguments. Here, we replicate the example from above:
my_start_method <- start_method(
start_func = start_resample,
# optional arguments passed to `start_func`
n = 1000,
inv_noise_scale = 1.0,
support = "all"
)
# display `start_method` object properties
my_start_method
Start Method: User-Specified
Keyword Arguments:
n: 1000
inv_noise_scale: 1
support: all
To associate a start_method
with a roadmap
, there are a few different options. One option is you can pass a start_method
directly into a roadmap
’s constructor:
Another option is you can use the API calls add_start_method
or update_start_method
to associate the start_method
with the roadmap
.
# Option 2: use `add_start_method()`
roadmap_w_start_data <- acs_roadmap %>%
add_start_method(my_start_method)
# Option 3: pass arguments directly to `update_start_method()`
roadmap_w_start_data <- acs_roadmap %>%
update_start_method(
start_func = start_resample,
n = 1000,
inv_noise_scale = 1.0,
support = "all"
)