2 The tidysynthesis philosophy
tidysynthesis
is a “metapackage” for creating synthetic data sets for statistical disclosure control or limitation (SDC/SDL) that shares the underlying design philosophy, grammar, and data structures of the tidyverse and tidymodels.
2.1 For Users
tidysynthesis
is designed around a few key principles that benefit users. tidysynthesis
should:
- Be designed for humans.
- Give users the full predictive modeling toolkit available in
tidymodels
. - Embrace small, clear functions over larger functions with many arguments.
- Encourage the reuse of objects so small changes in synthesis specifications require minimal code changes.
- Catch mistakes early – before computation begins to avoid wasted time.
- Include robust and comprehensive documentation.
2.1.1 1. Human-centered design
Our main goal for tidysynthesis
is simple. We want users to inform the design of tidysynthesis.
Human time is often more expensive than computing time. When faced with difficult design decisions, tidysynthesis
prioritizes clarity and user-friendliness over computational speed and brevity. tidysynthesis
’s modular design makes it easier for developers to incorporate user feedback and for users to incorporate their own data science workflows and extend the package.
2.1.2 2. tidymodels
tidysynthesis
synthesizes data sets using a sequence of predictive models. The more predictive modeling tools available to users, the better the synthesis process can be.
tidymodels
is a collection of packages for modeling and machine learning using tidyverse
principles. It supports the full predictive modeling workflow and offers a common interface to a wide range of predictive modeling packages in R.
tidysynthesis
aims to leverage on the extensive work by the tidymodels
developers. tidysynthesis
contains regularized regression models because tidymodels
contains regularized regression models. tidysynthesis
contains feature and target engineering because tidymodels
contains feature and target engineering. Simply put, tidysynthesis
inherits the strength of tidymodels
ecosystem to empower users to synthesize data in several ways.
We highly recommend learning more about tidymodels from the tidymodels tutorial and Tidy Modeling with R.
2.1.3 3. Functional
R is a functional programming language. tidysynthesis
uses small, clear functions to change the behavior of syntheses instead of YAML headers or configuration files. This framework supports for principles 4 and 5, requiring minimal code changes, and allows for catching errors earlier in the process.
2.1.4 4. Reuse objects
Users often want to test multiple approaches to experiment and fine-tune their syntheses. When they do, tidysynthesis
offers a flexible API designed around reusing code across syntheses. We provide examples of how this reuse functions throughout the documentation.
2.1.5 5. Lazy computation
Users make mistakes – we all do – it is inevitable. We designed tidysynthesis
to minimize the chance of a user making a mistake and maximize the chance of catching a mistake early. We want to catch mistakes early since data synthesis can be computationally very expensive.
tidysynthesis
uses lazy evaluation, so no data-intensive computation happens until the user calls the synthesize()
function. We designed tidysynthesis
functions to perform robust checks for inputs, catching errors early before calling synthesize()
.
2.1.6 6. Contain robust documentation
This Quarto Book is an effort to clearly document tidysynthesis
. We’ve included several examples and hope to include more examples.
If you have suggestions, please feel free to contribute to https://github.com/UI-Research/tidysynthesis-documentation.
2.2 For Developers
We want tidysynthesis
to be a community-developed tool. We’ve embraced a few principles to make it easier for developers to contribute to the package.
tidysynthesis
’s modular design makes it more extensible. This is inspired by ggplot2 extensions and extensions totidymodels
.tidysynthesis
contains hundreds of tests to ensure that changes don’t break the package.- Robust documentation about “the why” should hopefully orient development around ensuring
tidysynthesis
is sufficiently interoperable with other libraries, packages, and other development infrastructure.