pe.runner package

Submodules

pe.runner.pe module

class pe.runner.pe.PE(priv_data, population, histogram, dp=None, loggers=[], callbacks=[])[source]

Bases: object

The class that runs the PE algorithm.

__init__(priv_data, population, histogram, dp=None, loggers=[], callbacks=[])[source]

Constructor.

Parameters:
_clean_up_loggers()[source]

Clean up loggers.

_get_num_samples_per_label_id(num_samples, fraction_per_label_id)[source]

Get the number of samples per label id given the total number of samples

Parameters:
  • num_samples (int) – The total number of samples

  • fraction_per_label_id (list[float], optional) – The fraction of samples for each label id. The fraction does not have to be normalized. When it is None, the fraction is assumed to be the same as the fraction of label ids in the private data. Defaults to None

Raises:
  • ValueError – If the length of fraction_per_label_id is not the same as the number of labels

  • ValueError – If the number of samples is so small that the number of samples for some label ids is zero

Returns:

The number of samples per label id

Return type:

np.ndarray

_log_metrics(syn_data)[source]

Log metrics.

Parameters:

syn_data (pe.data.data.Data) – The synthetic data

load_checkpoint(checkpoint_path)[source]

Load a checkpoint.

Parameters:

checkpoint_path (str) – The path to the checkpoint

Returns:

The synthetic data

Return type:

pe.data.data.Data or None

run(num_samples_schedule, delta, epsilon=None, noise_multiplier=None, checkpoint_path=None, save_checkpoint=True, fraction_per_label_id=None)[source]

Run the PE algorithm.

Parameters:
  • num_samples_schedule (list[int]) – The schedule of the number of samples for each PE iteration. The first element is the number of samples for the initial data, and the rest are the number of samples for each PE iteration. So the length of the list is the number of PE iterations plus one

  • delta (float) – The delta value of DP

  • epsilon (float, optional) – The epsilon value of DP, defaults to None

  • noise_multiplier (float, optional) – The noise multiplier of the DP mechanism, defaults to None

  • checkpoint_path (str, optional) – The path to load and save the checkpoint, defaults to None

  • save_checkpoint (bool, optional) – Whether to save the checkpoint, defaults to True

  • fraction_per_label_id (list[float], optional) – The fraction of samples for each label id. The fraction does not have to be normalized. When it is None, the fraction is assumed to be the same as the fraction of label ids in the private data. Defaults to None

Returns:

The synthetic data

Return type:

pe.data.data.Data