Data Governance and Privacy Training Materials for Data Practitioners
Welcome
This Quarto book presents materials for teaching data governance and privacy concepts and principles. The Data Governance and Privacy Practice Area at the Urban Institute has delivered extended training courses on data governance principles and data access to several government entities and conferences.
This Quarto book is a living document, which will be updated periodically to ensure the materials remain relevant.
All definitions we use are based on our expertise, experience, and perspectives on data governance. Because data governance, privacy, security, and ethics span multiple disciplines, there is no universally accepted taxonomy of terms or concepts, which often leads to confusion. To ensure clarity and consistency, we establish standard definitions for these materials to ensure a shared language. However, when reviewing external materials or literature, you may encounter differing or conflicting terminology.
Statement of Purpose
The Data Governance and Privacy Team at the Urban Institute aims to ensure everyone is responsibly represented in data that are accessible, accurate, private, and usable. We aim for more people to have access to high quality data for better evidence-based decisionmaking and advocacy, all without compromising people’s privacy.
These materials are designed to support working data professionals who are interested in learning more about data governance and privacy concepts and principles. We developed these training courses using Quarto, an open-source scientific and technical publishing system rendered as websites (.html). Additional resources—including takeaway materials, reference guidelines, and related reports that are provided as open-access documents.
Topics Covered
This training introduces key concepts and practical tools for data governance and privacy. Specifically, it covers:
Part 1: Introduction
Chapter 1: Foundations of Data Governance and Privacy
- Dimensions of data governance and their privacy implications.
- Kinds of trade-offs associated with disclosure risk.
- Definitions and interpretations of data privacy.
Chapter 2: Privacy-Enhancing Technologies (PETs)
- Privacy-enhancing technologies (PETs), their components, and uses/misuses.
- Statistical data privacy (SDP) and the tools needed to evaluate output PETs.
Part 2: Synthetic Data
Chapter 3: Introduction to Synthetic Data
- Synthetic data concepts.
- Modeling and sampling design choices for synthetic data.
- Best practices and considerations for making synthetic data design decisions.
Chapter 4: Introduction to tidysynthesis - Introduction to tidysynthesis, the Urban Institute’s software for synthetic data development.
Chatper 5: Utility Evaluation
- The different types of utility risk metrics used to evaluate synthetic data.
- How to apply these metrics through both conceptual questions and hands-on computational exercises.
- Key considerations and best practices when interpreting and using these metrics in real-world contexts.
Chatper 6: Disclosure Risk Metrics
- COMING SOON!
Future Content
- Synthetic data demo
- Synthetic data use cases
- Differential privacy and formal privacy
What the trainings do not cover
- Legal compliance frameworks (e.g., CCPA, GDPR, HIPAA).
- Organizational policy development.
- Advanced cryptography methods.
- Input privacy PETs (e.g., secure multiparty computation, federated learning).
- Vendor-specific software training.
Intended audience and recommended sectional reading
This document consolidates the team’s training materials into a single, accessible Quarto Book, organized around three distinct learning objectives: conceptual understanding, mathematical foundations, and coding skills.
We define each learner group based on their relevant objectives and outline which sections of the document they should prioritize. While the entire Quarto Book is available to anyone interested, some sections may be less relevant to individuals with a specific learning focus.
These recommendations are informed by our experience working with diverse organizations, each with their own learning goals and motivations. As such, the depth and breadth of learning will vary.
Conceptual understanding
Designed for upper management, managers of technical teams, practitioners, analysts, and subject matter experts or researchers outside of data governance and privacy who want to learn the basic concepts of data governance and privacy. Roles may include managing teams that implement these concepts or overseeing projects where privacy principles matter.
These individuals are recommended to review the following sections of the document:
- Part 1
- Chapter 1
- Chapter 2
- Part 2 (conceptual portions only)
- Chapter 3
- Chapter 5
Technical decisionmaking
Geared toward technical managers, practitioners, analysts, and subject matter experts who need to understand how to make informed technical decisions about data governance and privacy but do not require deep mathematical or coding support. Roles may include managing technical teams or evaluating implementation strategies.
These individuals are recommended to review the following sections of the document:
- Part 1 – all sections
- Part 2 (skip detailed math/coding as needed)
- Chapter 3
- Chapter 5
Mathematical foundation
For learners interested in the mathematical details behind data governance and privacy methods but not necessarily in coding implementation. Roles may include technical managers or team members ensuring mathematical feasibility of solutions.
These individuals are recommended to review the following sections of the document:
- Part 1 – all sections
- Part 2 (skip coding parts)
- Chapter 3
- Chapter 5
Coding skills
Designed for those who want to implement the various methods in code, including technical practitioners and developers. Roles may range from managers reviewing code to individuals writing and executing it.
- Part 1 – all sections
- Part 2 (skip math parts)
- Chapter 3
- Chapter 4
- Chapter 5
Definitions, Awareness, Caution, and Decision Points
As we walk through these educational materials, we will highlight key concepts and considerations using the following callout boxes:
Key terminology will be highlighted in boxes like this one to support clarity and shared understanding.
These boxes will draw attention to information that is essential to keep in mind throughout the process.
These boxes will flag common pitfalls or risks that may arise if certain details are overlooked.
These boxes will highlight moments where decisions can shape the process and outcomes. While all decision points are flexible, “iteration opportunities” offer greater adaptability and are easier to revise.
License
This website is free to use and licensed under the GNU AGPLv3 license. More information about GNU AGPLv3 is available here.
Citation
If you use our materials, please use the following citation:
Bowen, Claire McKay, Rachel Lamb, Maddie Pickens, Jeremy Seeman, and Aaron R. Williams. 2026. Data Governance and Privacy Training Materials for Data Practitioners. Urban Institute. https://ui-research.github.io/dgp-trainings/
Acknowledgements
These updated training materials are funded by the Gates Foundation [Investment ID INV-071365]. Original materials were funded by the Bureau of Economic Analysis, National Center for Science and Engineering Statistics at NSF, and Statistics of Income Division of the IRS.
Early versions of our training materials were developed for the following partners and events:
- American Association for Public Opinion Research Conference (2024)
- Bureau of Economic Analysis (2021)
- California’s Cradle to Career (2025)
- Department of Human Services in Allegheny County, Pennsylvania (2022)
- International Conference on Establishment Statistics (2024)
- Joint Statistical Meetings (2023)
- Nebraska Statewide Workforce & Educational Reporting System (2024)
- Statistics of Income Division of the IRS (2023)
- Summer Institute on Privacy Enhancing Technologies for Education Data at the Massive Data Institute at Georgetown University (2025)
- Women in Statistics and Data Science Conference (2022)