4  roadmap

As discussed in the minimal example, roadmap objects describe the input and output data, its properties, and the macroscropic strategies for synthesizing your data. This section outlines the various functionality available in roadmap.

4.1 roadmap Arguments

All syntheses start with a roadmap object, which is a container with information about the order of operations for a synthesis. roadmap() creates a roadmap S3 object and contains many arguments for modifying its behavior.

As a reminder, roadmap’s constructor requires two inputs:

  • conf_data: A data frame with the confidential data used to generate the synthetic data. The resulting synthetic data will have the same number of columns as conf_data.
  • start_data: A data frame with a strict subset of variables from conf_data, which is used to start the synthesis process. The resulting synthetic data will have the same number of rows as start_data.

roadmap also offers the following optional arguments:

  • start_method: An object that is executed prior to running a synthesis. start_method objects modifie the start_data, typically randomly, to provide greater disclosure risk protections. By default, the start_data is used unmodified in the final synthetic data output.

  • schema: An object that handles data type information about each column in the confidential data. schema objects can also modify data types, missing data definitions, and factor level definitions. By default, the schema is inferred from conf_data.

  • visit_sequence: An object that specifies the order of synthesis for a sequential synthesis. visit_sequence objects can be specified manually or data-driven. By default, the visit_sequence uses the order that variables appear in conf_data.

  • replicates: An object that controls strategy and frequency for multiple synthesis. tidysynthesis lets you generate multiple replicates of the start data, conditional syntheses, and/or the entire end-to-end synthesis process. By default, the replicates object only creates one synthetic dataset with the same number of rows as start_data and the same number of columns as conf_data.

  • constraints: An object that defines constraints for the synthetic data and strategies for enforcing constraints during the synthesis process. These constraints can limit numeric variables to specific ranges or define allowed or forbidden levels for factor variables. By default, no constraints are implemented.

4.2 roadmap Tidy API

The required arguments for roadmap, conf_data and start_data, cannot be modified without creating a new roadmap instance. However, the remaining arguments can be updated using the provided Tidy API.

Check Individual Pages for API Best Practices

We recommend reading the individual “roadmap Components” documentation pages to learn best practices for using the Tidy API.

The Tidy API provides the following functions:

  • add_*(roadmap, ...) functions can be used to add components to a roadmap:
    • add_start_method(roadmap, start_method)
    • add_schema(roadmap, schema)
    • add_visit_sequence(roadmap, visit_sequence)
    • add_replicates(roadmap, replicates)
    • add_constraints(roadmap, constraints)
  • update_*(roadmap, ...) functions can be used to modify component arguments in a roadmap:
    • update_start_method(roadmap, ...)
    • update_schema(roadmap, ...)
    • update_replicates(roadmap, ...)
    • update_constraints(roadmap, ...)
  • reset_*(roadmap) functions can be used to reset component arguments to their default values in a roadmap:
    • reset_start_method(roadmap)
    • reset_schema(roadmap)
    • reset_visit_sequence(roadmap)
    • reset_replicates(roadmap)
    • reset_constraints(roadmap)