pe.callback package

class pe.callback.Callback[source]

Bases: ABC

The abstract class that defines the callback for the synthetic data generation. These callbacks can be configured to be called after each PE iteration.

abstract __call__(syn_data)[source]

This function is called after each PE iteration.

Parameters:

syn_data (pe.data.Data) – The pe.data.Data object of the synthetic data

class pe.callback.ComputeFID(priv_data, embedding, filter_criterion=None)[source]

Bases: Callback

The callback that computes the Frechet Inception Distance (FID) between the private and synthetic data.

__call__(syn_data)[source]

This function is called after each PE iteration that computes the FID between the private and synthetic data.

Parameters:

syn_data (pe.data.Data) – The synthetic data

Returns:

The FID between the private and synthetic data

Return type:

list[pe.metric_item.FloatMetricItem]

__init__(priv_data, embedding, filter_criterion=None)[source]

Constructor.

Parameters:
  • priv_data (pe.data.Data) – The private data

  • embedding (pe.embedding.Embedding) – The embedding to compute the FID

  • filter_criterion (dict, optional) – Only computes the metric based on samples satisfying the criterion. None means no filtering. Defaults to None

class pe.callback.ComputePrecisionRecall(priv_data, embedding, num_precision_neighbors=4, num_recall_neighbors=5, filter_criterion=None)[source]

Bases: Callback

The callback that computes precision and recall metrics (https://arxiv.org/abs/1904.06991) between the private and synthetic data.

__call__(syn_data)[source]

This function is called after each PE iteration that computes the FID between the private and synthetic data.

Parameters:

syn_data (pe.data.Data) – The synthetic data

Returns:

The FID between the private and synthetic data

Return type:

list[pe.metric_item.FloatMetricItem]

__init__(priv_data, embedding, num_precision_neighbors=4, num_recall_neighbors=5, filter_criterion=None)[source]

Constructor.

Parameters:
  • priv_data (pe.data.Data) – The private data

  • embedding (pe.embedding.Embedding) – The embedding to compute the FID

  • num_precision_neighbors (int, optional) – The number of neighbors to use for computing precision, defaults to 4 following https://github.com/marcojira/fld/tree/main

  • num_recall_neighbors (int, optional) – The number of neighbors to use for computing recall, defaults to 5 following https://github.com/marcojira/fld/tree/main

  • filter_criterion (dict, optional) – Only computes the metric based on samples satisfying the criterion. None means no filtering. Defaults to None

class pe.callback.ComputeTVD(priv_data, degree, num_bins=20, filter_criterion=None)[source]

Bases: Callback

The callback that computes the Total Variation Distance (TVD) between the private and synthetic data.

__call__(syn_data)[source]

This function is called after each PE iteration that computes the TVD between the private and synthetic data.

Parameters:

syn_data (pe.data.Data) – The synthetic data

Returns:

The TVD between the private and synthetic data

Return type:

list[pe.metric_item.FloatMetricItem]

__init__(priv_data, degree, num_bins=20, filter_criterion=None)[source]

Constructor.

Parameters:
  • priv_data (pe.data.Data) – The private data

  • degree (int) – The degree of the TVD (e.g., 2 for 2-way TVD)

  • num_bins (int, optional) – The number of bins to compute the TVD, defaults to 20

  • filter_criterion (dict, optional) – Only computes the metric based on samples satisfying the criterion. None means no filtering. Defaults to None

_compute_tvd(syn_features_df, priv_features_df)[source]

Compute the TVD between the synthetic and private features.

Parameters:
  • syn_features_df (pandas.DataFrame) – The synthetic features DataFrame

  • priv_features_df (pandas.DataFrame) – The private features DataFrame

Returns:

The TVD

Return type:

float

_get_features_df(data)[source]

Get the features DataFrame from the data.

Parameters:

data (pe.data.Data) – The data

Returns:

The features DataFrame

Return type:

pandas.DataFrame

class pe.callback.ComputeWSD(priv_data, degree, num_samples=None, seed=42, filter_criterion=None)[source]

Bases: Callback

The callback that computes the Wasserstein Distance (WSD) between the private and synthetic data.

__call__(syn_data)[source]

This function is called after each PE iteration that computes the multiple-way WSD between the private and synthetic data.

Parameters:

syn_data (pe.data.Data) – The synthetic data

Returns:

The multiple-way WSD between the private and synthetic data

Return type:

list[pe.metric_item.FloatMetricItem]

__init__(priv_data, degree, num_samples=None, seed=42, filter_criterion=None)[source]

Constructor.

Parameters:
  • priv_data (pe.data.Data) – The private data

  • degree (int) – The degree of the WSD (e.g., 2 for 2-way WSD)

  • num_samples (int, optional) – The number of samples to use for the WSD for both private and synthetic data for computation efficiency. If None, all samples are used..

  • seed (int, optional) – The seed to use for for sampling the data.

  • filter_criterion (dict, optional) – Only computes the metric based on samples satisfying the criterion. None means no filtering. Defaults to None

_compute_wsd(syn_features_df, priv_features_df)[source]

Compute the multiple-way WSD between the synthetic and private features.

Parameters:
  • syn_features_df (pandas.DataFrame) – The synthetic features DataFrame

  • priv_features_df (pandas.DataFrame) – The private features DataFrame

Returns:

The multiple-way WSD

Return type:

float

_get_features_df(data)[source]

Get the features DataFrame from the data.

Parameters:

data (pe.data.Data) – The data

Returns:

The features DataFrame

Return type:

pandas.DataFrame

class pe.callback.DPImageBenchClassifyImages(model_name, test_data, val_data, batch_size=256, num_epochs=50, n_splits=1, lr=0.01, lr_scheduler_step_size=20, lr_scheduler_gamma=0.2, ema_rate=0.9999, **model_params)[source]

Bases: Callback

The callback that evaluates the classification accuracy of the synthetic data following DPImageBench (https://github.com/2019ChenGong/DPImageBench).

__call__(syn_data)[source]

This function is called after each PE iteration that computes the downstream classification metrics.

Parameters:

syn_data (pe.data.Data) – The synthetic data

Returns:

The classification accuracy metrics

Return type:

list[pe.metric_item.FloatListMetricItem]

__init__(model_name, test_data, val_data, batch_size=256, num_epochs=50, n_splits=1, lr=0.01, lr_scheduler_step_size=20, lr_scheduler_gamma=0.2, ema_rate=0.9999, **model_params)[source]

Constructor.

Parameters:
  • model_name (str) – The name of the model to use (wrn, resnet, resnext)

  • test_data (pe.data.Data) – The test data

  • val_data (pe.data.Data) – The validation data

  • batch_size (int, optional) – The batch size, defaults to 256

  • num_epochs (int, optional) – The number of training epochs, defaults to 50

  • n_splits (int, optional) – The number of splits for gradient accumulation, defaults to 1

  • lr (float, optional) – The learning rate, defaults to 0.01

  • lr_scheduler_step_size (int, optional) – The step size for the learning rate scheduler, defaults to 20

  • lr_scheduler_gamma (float, optional) – The gamma for the learning rate scheduler, defaults to 0.2

  • ema_rate (float, optional) – The rate for the exponential moving average, defaults to 0.9999

_get_data_loader(data)[source]

Getting the data loader.

Parameters:

data (pe.data.Data) – The data object

Returns:

The data loader

Return type:

torch.utils.data.DataLoader

_get_images_and_label_from_data(data)[source]

Getting images and labels from the data.

Parameters:

data (pe.data.Data) – The data object

Returns:

The images and labels

Return type:

tuple[np.ndarray, np.ndarray]

_get_model()[source]

Getting the model.

Raises:

ValueError – If the model name is unknown

Returns:

The model

Return type:

torch.nn.Module

evaluate(model, ema, data_loader, criterion)[source]

Evaluating the model.

Parameters:
Returns:

The accuracy and loss

Return type:

tuple[float, float]

class pe.callback.SampleImages(num_images_per_class=10)[source]

Bases: Callback

The callback that samples images from the synthetic data.

__call__(syn_data)[source]

This function is called after each PE iteration that samples images from the synthetic data.

Parameters:

syn_data (pe.data.Data) – The pe.data.Data object of the synthetic data

Returns:

A metric item with the list of sampled images

Return type:

list[pe.metric_item.ImageListMetricItem]

__init__(num_images_per_class=10)[source]

Constructor.

Parameters:

num_images_per_class (int, optional) – number of images to sample per class, defaults to 10

class pe.callback.SaveAllImages(output_folder, path_format='{iteration:09d}/{label_id}_{label_name}/{index}.png', tqdm_enabled=True)[source]

Bases: Callback

The callback that saves all images.

__call__(syn_data)[source]

This function is called after each PE iteration that saves all images.

Parameters:

syn_data (pe.data.Data) – The pe.data.Data object of the synthetic data

__init__(output_folder, path_format='{iteration:09d}/{label_id}_{label_name}/{index}.png', tqdm_enabled=True)[source]

Constructor.

Parameters:
  • output_folder (str) – The output folder that will be used to save the images

  • path_format (str, optional) – The format of the image paths, defaults to “{iteration:09d}/{label_id}_{label_name}/{index}.png”

  • tqdm_enabled (bool, optional) – Whether to show tqdm progress bar when saving the images, defaults to True

_save_image(image, label_name, label_id, index, iteration)[source]

A helper function that saves an image.

class pe.callback.SaveCheckpoints(output_folder, iteration_format='09d')[source]

Bases: Callback

The callback that saves checkpoints of the synthetic data.

__call__(syn_data)[source]

This function is called after each PE iteration that saves checkpoints of the synthetic data.

Parameters:

syn_data (pe.data.Data) – The synthetic data

__init__(output_folder, iteration_format='09d')[source]

Constructor.

Parameters:
  • output_folder (str) – The output folder that will be used to save the checkpoints

  • iteration_format (str, optional) – The format of the iteration number, defaults to “09d”

_get_checkpoint_path(iteration)[source]

Get the checkpoint path.

Parameters:

iteration (int) – The PE iteration number

Returns:

The checkpoint path

Return type:

str

class pe.callback.SaveTabToCSV(output_folder, iteration_format='09d')[source]

Bases: Callback

The callback that saves the synthetic tabular data to a CSV file.

__call__(syn_data)[source]

This function is called after each PE iteration that saves the synthetic tabular data to a CSV file.

Parameters:

syn_data (pe.data.Data) – The pe.data.Data object of the synthetic data

__init__(output_folder, iteration_format='09d')[source]

Constructor.

Parameters:
  • output_folder (str) – The output folder that will be used to save the CSV files

  • iteration_format (str, optional) – The format of the iteration part of the CSV paths, defaults to “09d”

_get_csv_path(iteration)[source]

Get the CSV path.

Parameters:

iteration (int) – The PE iteration number

Returns:

The CSV path

Return type:

str

class pe.callback.SaveTextToCSV(output_folder, iteration_format='09d')[source]

Bases: Callback

The callback that saves the synthetic text to a CSV file.

__call__(syn_data)[source]

This function is called after each PE iteration that saves the synthetic text to a CSV file.

Parameters:

syn_data (pe.data.Data) – The pe.data.Data object of the synthetic data

__init__(output_folder, iteration_format='09d')[source]

Constructor.

Parameters:
  • output_folder (str) – The output folder that will be used to save the CSV files

  • iteration_format (str, optional) – The format of the iteration part of the CSV paths, defaults to “09d”

_get_csv_path(iteration)[source]

Get the CSV path.

Parameters:

iteration (int) – The PE iteration number

Returns:

The CSV path

Return type:

str

class pe.callback.TabClassifier(test_data, model_name='xgboost', filter_criterion=None)[source]

Bases: Callback

Evaluate tabular classification accuracy using a tabular classifier.

__call__(syn_data)[source]

Evaluate the tabular classifier on the test set.

Parameters:

syn_data (pe.data.Data) – The synthetic training data

Returns:

Classification accuracy metrics

Return type:

list[pe.metric_item.FloatListMetricItem]

__init__(test_data, model_name='xgboost', filter_criterion=None)[source]

Constructor.

Parameters:
  • test_data (pe.data.Data) – The test data

  • model_name (str, optional) – The classifier model to use, defaults to “xgboost”

  • filter_criterion (dict, optional) – Only computes the metric based on samples satisfying the criterion. None means no filtering. Defaults to None

_encoding(syn_data)[source]

Encoding categorical and numerical columns.

Parameters:

syn_data (pe.data.Data) – The synthetic training data

Returns:

The encoded synthetic training and test data

Return type:

tuple[pe.data.Data, pe.data.Data]

_get_model()[source]

Getting the classifier model.

Subpackages

Submodules