Synthetic Data with tidysynthesis

Author

Aaron R. Williams, Gabriel Morrison, Jeremy Seeman, and Madeline Pickens

Published

May 23, 2025

Welcome

This website documents the tidysynthesis R package that generates synthetic data using a sequence of predictive models. The goal of tidysynthesis is to provide a process to generate synthetic data based on the tidymodels framework that safely releases administrative data for some types of research.

Warning

This entire set of documentation is a work-in-progress.

License

library(tidysynthesis) and this documentation are free to use and licensed under the GNU AGPLv3 license.

Acknowledgements

tidysynthesis was funded by the Alfred P. Sloan Foundation [G-2022-17149] and National Science Foundation National Center for Science and Engineering Statistics [49100422C0008].

Early versions of the package and its codebase were developed in collaboration with the following partners, whose input was instrumental in shaping its design:

  • Bureau of Economic Analysis
  • Bureau of Justice Statistics
  • Department of Human Services in Allegheny County, Pennsylvania
  • National Science Foundation National Center for Science and Engineering Statistics
  • Nebraska Statewide Workforce & Educational Reporting System
  • Statistics of Income Division of the IRS