2  The tidysynthesis philosophy

tidysynthesis is a “metapackage” for creating synthetic data sets for statistical disclosure control or limitation (SDC/SDL) that shares the underlying design philosophy, grammar, and data structures of the tidyverse and tidymodels.

2.1 For Users

tidysynthesis is designed around a few key principles that benefit users. tidysynthesis should:

  1. Be designed for humans.
  2. Give users the full predictive modeling toolkit available in tidymodels.
  3. Embrace small, clear functions over larger functions with many arguments.
  4. Encourage the reuse of objects so small changes in synthesis specifications require minimal code changes.
  5. Catch mistakes early – before computation begins to avoid wasted time.
  6. Include robust and comprehensive documentation.

2.1.1 1. Human-centered design

Our main goal for tidysynthesis is simple. We want users to inform the design of tidysynthesis.

Human time is often more expensive than computing time. When faced with difficult design decisions, tidysynthesis prioritizes clarity and user-friendliness over computational speed and brevity. tidysynthesis’s modular design makes it easier for developers to incorporate user feedback and for users to incorporate their own data science workflows and extend the package.

2.1.2 2. tidymodels

tidysynthesis synthesizes data sets using a sequence of predictive models. The more predictive modeling tools available to users, the better the synthesis process can be.

tidymodels is a collection of packages for modeling and machine learning using tidyverse principles. It supports the full predictive modeling workflow and offers a common interface to a wide range of predictive modeling packages in R.

tidysynthesis aims to leverage on the extensive work by the tidymodels developers. tidysynthesis contains regularized regression models because tidymodels contains regularized regression models. tidysynthesis contains feature and target engineering because tidymodels contains feature and target engineering. Simply put, tidysynthesis inherits the strength of tidymodels ecosystem to empower users to synthesize data in several ways.

We highly recommend learning more about tidymodels from the tidymodels tutorial and Tidy Modeling with R.

2.1.3 3. Functional

R is a functional programming language. tidysynthesis uses small, clear functions to change the behavior of syntheses instead of YAML headers or configuration files. This framework supports for principles 4 and 5, requiring minimal code changes, and allows for catching errors earlier in the process.

2.1.4 4. Reuse objects

Users often want to test multiple approaches to experiment and fine-tune their syntheses. When they do, tidysynthesis offers a flexible API designed around reusing code across syntheses. We provide examples of how this reuse functions throughout the documentation.

2.1.5 5. Lazy computation

Users make mistakes – we all do – it is inevitable. We designed tidysynthesis to minimize the chance of a user making a mistake and maximize the chance of catching a mistake early. We want to catch mistakes early since data synthesis can be computationally very expensive.

tidysynthesis uses lazy evaluation, so no data-intensive computation happens until the user calls the synthesize() function. We designed tidysynthesis functions to perform robust checks for inputs, catching errors early before calling synthesize().

2.1.6 6. Contain robust documentation

This Quarto Book is an effort to clearly document tidysynthesis. We’ve included several examples and hope to include more examples.

If you have suggestions, please feel free to contribute to https://github.com/UI-Research/tidysynthesis-documentation.

2.2 For Developers

We want tidysynthesis to be a community-developed tool. We’ve embraced a few principles to make it easier for developers to contribute to the package.

  1. tidysynthesis’s modular design makes it more extensible. This is inspired by ggplot2 extensions and extensions to tidymodels.
  2. tidysynthesis contains hundreds of tests to ensure that changes don’t break the package.
  3. Robust documentation about “the why” should hopefully orient development around ensuring tidysynthesis is sufficiently interoperable with other libraries, packages, and other development infrastructure.