pe.callback package
- class pe.callback.Callback[source]
Bases:
ABCThe abstract class that defines the callback for the synthetic data generation. These callbacks can be configured to be called after each PE iteration.
- abstract __call__(syn_data)[source]
This function is called after each PE iteration.
- Parameters:
syn_data (
pe.data.Data) – Thepe.data.Dataobject of the synthetic data
- class pe.callback.ComputeFID(priv_data, embedding, filter_criterion=None)[source]
Bases:
CallbackThe callback that computes the Frechet Inception Distance (FID) between the private and synthetic data.
- __call__(syn_data)[source]
This function is called after each PE iteration that computes the FID between the private and synthetic data.
- Parameters:
syn_data (
pe.data.Data) – The synthetic data- Returns:
The FID between the private and synthetic data
- Return type:
- __init__(priv_data, embedding, filter_criterion=None)[source]
Constructor.
- Parameters:
priv_data (
pe.data.Data) – The private dataembedding (
pe.embedding.Embedding) – The embedding to compute the FIDfilter_criterion (dict, optional) – Only computes the metric based on samples satisfying the criterion. None means no filtering. Defaults to None
- class pe.callback.ComputePrecisionRecall(priv_data, embedding, num_precision_neighbors=4, num_recall_neighbors=5, filter_criterion=None)[source]
Bases:
CallbackThe callback that computes precision and recall metrics (https://arxiv.org/abs/1904.06991) between the private and synthetic data.
- __call__(syn_data)[source]
This function is called after each PE iteration that computes the FID between the private and synthetic data.
- Parameters:
syn_data (
pe.data.Data) – The synthetic data- Returns:
The FID between the private and synthetic data
- Return type:
- __init__(priv_data, embedding, num_precision_neighbors=4, num_recall_neighbors=5, filter_criterion=None)[source]
Constructor.
- Parameters:
priv_data (
pe.data.Data) – The private dataembedding (
pe.embedding.Embedding) – The embedding to compute the FIDnum_precision_neighbors (int, optional) – The number of neighbors to use for computing precision, defaults to 4 following https://github.com/marcojira/fld/tree/main
num_recall_neighbors (int, optional) – The number of neighbors to use for computing recall, defaults to 5 following https://github.com/marcojira/fld/tree/main
filter_criterion (dict, optional) – Only computes the metric based on samples satisfying the criterion. None means no filtering. Defaults to None
- class pe.callback.ComputeTVD(priv_data, degree, num_bins=20, filter_criterion=None)[source]
Bases:
CallbackThe callback that computes the Total Variation Distance (TVD) between the private and synthetic data.
- __call__(syn_data)[source]
This function is called after each PE iteration that computes the TVD between the private and synthetic data.
- Parameters:
syn_data (
pe.data.Data) – The synthetic data- Returns:
The TVD between the private and synthetic data
- Return type:
- __init__(priv_data, degree, num_bins=20, filter_criterion=None)[source]
Constructor.
- Parameters:
priv_data (
pe.data.Data) – The private datadegree (int) – The degree of the TVD (e.g., 2 for 2-way TVD)
num_bins (int, optional) – The number of bins to compute the TVD, defaults to 20
filter_criterion (dict, optional) – Only computes the metric based on samples satisfying the criterion. None means no filtering. Defaults to None
- _compute_tvd(syn_features_df, priv_features_df)[source]
Compute the TVD between the synthetic and private features.
- Parameters:
syn_features_df (
pandas.DataFrame) – The synthetic features DataFramepriv_features_df (
pandas.DataFrame) – The private features DataFrame
- Returns:
The TVD
- Return type:
float
- _get_features_df(data)[source]
Get the features DataFrame from the data.
- Parameters:
data (
pe.data.Data) – The data- Returns:
The features DataFrame
- Return type:
pandas.DataFrame
- class pe.callback.ComputeWSD(priv_data, degree, num_samples=None, seed=42, filter_criterion=None)[source]
Bases:
CallbackThe callback that computes the Wasserstein Distance (WSD) between the private and synthetic data.
- __call__(syn_data)[source]
This function is called after each PE iteration that computes the multiple-way WSD between the private and synthetic data.
- Parameters:
syn_data (
pe.data.Data) – The synthetic data- Returns:
The multiple-way WSD between the private and synthetic data
- Return type:
- __init__(priv_data, degree, num_samples=None, seed=42, filter_criterion=None)[source]
Constructor.
- Parameters:
priv_data (
pe.data.Data) – The private datadegree (int) – The degree of the WSD (e.g., 2 for 2-way WSD)
num_samples (int, optional) – The number of samples to use for the WSD for both private and synthetic data for computation efficiency. If None, all samples are used..
seed (int, optional) – The seed to use for for sampling the data.
filter_criterion (dict, optional) – Only computes the metric based on samples satisfying the criterion. None means no filtering. Defaults to None
- _compute_wsd(syn_features_df, priv_features_df)[source]
Compute the multiple-way WSD between the synthetic and private features.
- Parameters:
syn_features_df (
pandas.DataFrame) – The synthetic features DataFramepriv_features_df (
pandas.DataFrame) – The private features DataFrame
- Returns:
The multiple-way WSD
- Return type:
float
- _get_features_df(data)[source]
Get the features DataFrame from the data.
- Parameters:
data (
pe.data.Data) – The data- Returns:
The features DataFrame
- Return type:
pandas.DataFrame
- class pe.callback.DPImageBenchClassifyImages(model_name, test_data, val_data, batch_size=256, num_epochs=50, n_splits=1, lr=0.01, lr_scheduler_step_size=20, lr_scheduler_gamma=0.2, ema_rate=0.9999, **model_params)[source]
Bases:
CallbackThe callback that evaluates the classification accuracy of the synthetic data following DPImageBench (https://github.com/2019ChenGong/DPImageBench).
- __call__(syn_data)[source]
This function is called after each PE iteration that computes the downstream classification metrics.
- Parameters:
syn_data (
pe.data.Data) – The synthetic data- Returns:
The classification accuracy metrics
- Return type:
- __init__(model_name, test_data, val_data, batch_size=256, num_epochs=50, n_splits=1, lr=0.01, lr_scheduler_step_size=20, lr_scheduler_gamma=0.2, ema_rate=0.9999, **model_params)[source]
Constructor.
- Parameters:
model_name (str) – The name of the model to use (wrn, resnet, resnext)
test_data (
pe.data.Data) – The test dataval_data (
pe.data.Data) – The validation databatch_size (int, optional) – The batch size, defaults to 256
num_epochs (int, optional) – The number of training epochs, defaults to 50
n_splits (int, optional) – The number of splits for gradient accumulation, defaults to 1
lr (float, optional) – The learning rate, defaults to 0.01
lr_scheduler_step_size (int, optional) – The step size for the learning rate scheduler, defaults to 20
lr_scheduler_gamma (float, optional) – The gamma for the learning rate scheduler, defaults to 0.2
ema_rate (float, optional) – The rate for the exponential moving average, defaults to 0.9999
- _get_data_loader(data)[source]
Getting the data loader.
- Parameters:
data (
pe.data.Data) – The data object- Returns:
The data loader
- Return type:
torch.utils.data.DataLoader
- _get_images_and_label_from_data(data)[source]
Getting images and labels from the data.
- Parameters:
data (
pe.data.Data) – The data object- Returns:
The images and labels
- Return type:
tuple[np.ndarray, np.ndarray]
- _get_model()[source]
Getting the model.
- Raises:
ValueError – If the model name is unknown
- Returns:
The model
- Return type:
torch.nn.Module
- evaluate(model, ema, data_loader, criterion)[source]
Evaluating the model.
- Parameters:
model (torch.nn.Module) – The model
ema (
pe.callback.image.dpimagebench_lib.ema.ExponentialMovingAverage) – The exponential moving average objectdata_loader (torch.utils.data.DataLoader) – The data loader
criterion (torch.nn.Module) – The criterion
- Returns:
The accuracy and loss
- Return type:
tuple[float, float]
- class pe.callback.SampleImages(num_images_per_class=10)[source]
Bases:
CallbackThe callback that samples images from the synthetic data.
- __call__(syn_data)[source]
This function is called after each PE iteration that samples images from the synthetic data.
- Parameters:
syn_data (
pe.data.Data) – Thepe.data.Dataobject of the synthetic data- Returns:
A metric item with the list of sampled images
- Return type:
- class pe.callback.SaveAllImages(output_folder, path_format='{iteration:09d}/{label_id}_{label_name}/{index}.png', tqdm_enabled=True)[source]
Bases:
CallbackThe callback that saves all images.
- __call__(syn_data)[source]
This function is called after each PE iteration that saves all images.
- Parameters:
syn_data (
pe.data.Data) – Thepe.data.Dataobject of the synthetic data
- __init__(output_folder, path_format='{iteration:09d}/{label_id}_{label_name}/{index}.png', tqdm_enabled=True)[source]
Constructor.
- Parameters:
output_folder (str) – The output folder that will be used to save the images
path_format (str, optional) – The format of the image paths, defaults to “{iteration:09d}/{label_id}_{label_name}/{index}.png”
tqdm_enabled (bool, optional) – Whether to show tqdm progress bar when saving the images, defaults to True
- class pe.callback.SaveCheckpoints(output_folder, iteration_format='09d')[source]
Bases:
CallbackThe callback that saves checkpoints of the synthetic data.
- __call__(syn_data)[source]
This function is called after each PE iteration that saves checkpoints of the synthetic data.
- Parameters:
syn_data (
pe.data.Data) – The synthetic data
- class pe.callback.SaveTabToCSV(output_folder, iteration_format='09d')[source]
Bases:
CallbackThe callback that saves the synthetic tabular data to a CSV file.
- __call__(syn_data)[source]
This function is called after each PE iteration that saves the synthetic tabular data to a CSV file.
- Parameters:
syn_data (
pe.data.Data) – Thepe.data.Dataobject of the synthetic data
- class pe.callback.SaveTextToCSV(output_folder, iteration_format='09d')[source]
Bases:
CallbackThe callback that saves the synthetic text to a CSV file.
- __call__(syn_data)[source]
This function is called after each PE iteration that saves the synthetic text to a CSV file.
- Parameters:
syn_data (
pe.data.Data) – Thepe.data.Dataobject of the synthetic data
- class pe.callback.TabClassifier(test_data, model_name='xgboost', filter_criterion=None)[source]
Bases:
CallbackEvaluate tabular classification accuracy using a tabular classifier.
- __call__(syn_data)[source]
Evaluate the tabular classifier on the test set.
- Parameters:
syn_data (
pe.data.Data) – The synthetic training data- Returns:
Classification accuracy metrics
- Return type:
- __init__(test_data, model_name='xgboost', filter_criterion=None)[source]
Constructor.
- Parameters:
test_data (
pe.data.Data) – The test datamodel_name (str, optional) – The classifier model to use, defaults to “xgboost”
filter_criterion (dict, optional) – Only computes the metric based on samples satisfying the criterion. None means no filtering. Defaults to None
- _encoding(syn_data)[source]
Encoding categorical and numerical columns.
- Parameters:
syn_data (
pe.data.Data) – The synthetic training data- Returns:
The encoded synthetic training and test data
- Return type:
tuple[
pe.data.Data,pe.data.Data]
Subpackages
- pe.callback.common package
- pe.callback.image package
- Subpackages
- Submodules
- pe.callback.tabular package
- pe.callback.text package