Examples
Here are some examples of how to use the Private Evolution library.
Images
Using foundation models (diffusion models) as the APIs. These examples follow the experimental settings in the paper Differentially Private Synthetic Data via Foundation Model APIs 1: Images (ICLR 2024).
CIFAR10 dataset: This example shows how to generate differentially private synthetic images for the CIFAR10 dataset using the APIs from a pre-trained ImageNet diffusion model.
Camelyon17 dataset: This example shows how to generate differentially private synthetic images for the Camelyon17 dataset using the APIs from a pre-trained ImageNet diffusion model.
Cat dataset: This example shows how to generate differentially private synthetic images for the Cat dataset using the APIs from Stable Diffusion.
Using simulators as the APIs. These examples follow the experimental settings in the paper Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Models.
MNIST dataset: This example shows how to generate differentially private synthetic images for the MNIST dataset using a text render.
CelebA dataset (simulator-generated data): This example shows how to generate differentially private synthetic images for the CelebA dataset using the generated data from a computer graphics-based renderer for face images.
CelebA dataset (weak simulator): This example shows how to generate differentially private synthetic images for the CelebA dataset using a rule-based avatar generator.
Text
These examples follow the experimental settings in the paper Differentially Private Synthetic Data via Foundation Model APIs 2: Text (ICML 2024 Spotlight).
Yelp dataset: These examples show how to generate differentially private synthetic text for the Yelp dataset using LLM APIs from:
OpenAI APIs: See example
Huggingface models: See example
OpenReview dataset: These examples show how to generate differentially private synthetic text for the OpenReview dataset using LLM APIs from:
OpenAI APIs: See example
Huggingface models: See example
PubMed dataset: These examples show how to generate differentially private synthetic text for the PubMed dataset using LLM APIs from:
OpenAI APIs: See example
Huggingface models: See example
Checkpoint Operation
By default, the above examples will save the generated synthetic data (e.g., images, text). Besides, they also save the checkpoints with more complete information of synthetic data, and we can use data and callback APIs to further process the checkpoints. For example, in the Text examples, the CSV files of synthetic text contain both the text selected by the histogram and the generated variations of the selected text. However, in the downstream evaluation of Differentially Private Synthetic Data via Foundation Model APIs 2: Text (ICML 2024 Spotlight), only the text selected by the histogram is used. We can use the following code to extract the selected text from the checkpoints into a new CSV file:
from pe.data import Data
from pe.callback import SaveTextToCSV
from pe.constant.data import VARIATION_API_FOLD_ID_COLUMN_NAME
data = Data()
data.load_checkpoint("<checkpoint path>")
data = data.filter({VARIATION_API_FOLD_ID_COLUMN_NAME: -1})
SaveTextToCSV(output_folder="from_last")(data)