Examples

Here are some examples of how to use the Private Evolution library.

Images

Text

These examples follow the experimental settings in the paper Differentially Private Synthetic Data via Foundation Model APIs 2: Text (ICML 2024 Spotlight).

  • Yelp dataset: These examples show how to generate differentially private synthetic text for the Yelp dataset using LLM APIs from:

  • OpenReview dataset: These examples show how to generate differentially private synthetic text for the OpenReview dataset using LLM APIs from:

  • PubMed dataset: These examples show how to generate differentially private synthetic text for the PubMed dataset using LLM APIs from:

Checkpoint Operation

By default, the above examples will save the generated synthetic data (e.g., images, text). Besides, they also save the checkpoints with more complete information of synthetic data, and we can use data and callback APIs to further process the checkpoints. For example, in the Text examples, the CSV files of synthetic text contain both the text selected by the histogram and the generated variations of the selected text. However, in the downstream evaluation of Differentially Private Synthetic Data via Foundation Model APIs 2: Text (ICML 2024 Spotlight), only the text selected by the histogram is used. We can use the following code to extract the selected text from the checkpoints into a new CSV file:

from pe.data import Data
from pe.callback import SaveTextToCSV
from pe.constant.data import VARIATION_API_FOLD_ID_COLUMN_NAME

data = Data()
data.load_checkpoint("<checkpoint path>")
data = data.filter({VARIATION_API_FOLD_ID_COLUMN_NAME: -1})
SaveTextToCSV(output_folder="from_last")(data)