pe.callback.tabular.compute_wsd module

class pe.callback.tabular.compute_wsd.ComputeWSD(priv_data, degree, num_samples=None, seed=42, filter_criterion=None)[source]

Bases: Callback

The callback that computes the Wasserstein Distance (WSD) between the private and synthetic data.

__call__(syn_data)[source]

This function is called after each PE iteration that computes the multiple-way WSD between the private and synthetic data.

Parameters:

syn_data (pe.data.Data) – The synthetic data

Returns:

The multiple-way WSD between the private and synthetic data

Return type:

list[pe.metric_item.FloatMetricItem]

__init__(priv_data, degree, num_samples=None, seed=42, filter_criterion=None)[source]

Constructor.

Parameters:
  • priv_data (pe.data.Data) – The private data

  • degree (int) – The degree of the WSD (e.g., 2 for 2-way WSD)

  • num_samples (int, optional) – The number of samples to use for the WSD for both private and synthetic data for computation efficiency. If None, all samples are used..

  • seed (int, optional) – The seed to use for for sampling the data.

  • filter_criterion (dict, optional) – Only computes the metric based on samples satisfying the criterion. None means no filtering. Defaults to None

_compute_wsd(syn_features_df, priv_features_df)[source]

Compute the multiple-way WSD between the synthetic and private features.

Parameters:
  • syn_features_df (pandas.DataFrame) – The synthetic features DataFrame

  • priv_features_df (pandas.DataFrame) – The private features DataFrame

Returns:

The multiple-way WSD

Return type:

float

_get_features_df(data)[source]

Get the features DataFrame from the data.

Parameters:

data (pe.data.Data) – The data

Returns:

The features DataFrame

Return type:

pandas.DataFrame