pe.runner package

class pe.runner.PE(priv_data, population, histogram, dp=None, loggers=[], callbacks=[])[source]

Bases: object

The class that runs the PE algorithm.

__init__(priv_data, population, histogram, dp=None, loggers=[], callbacks=[])[source]

Constructor.

Parameters:

priv_data (pe.data.Data) – The private data
population (pe.population.Population) – The population algorithm
histogram (pe.histogram.Histogram) – The histogram algorithm
dp (pe.dp.DP, optional) – The DP algorithm, defaults to None, in which case the Gaussian mechanism pe.dp.Gaussian is used
loggers (list[pe.logger.Logger], optional) – The list of loggers, defaults to []
callbacks (list[Callable or pe.callback.Callback], optional) – The list of callbacks, defaults to []

_clean_up_loggers()[source]: Clean up loggers.

_get_num_samples_per_label_id(num_samples, fraction_per_label_id)[source]

Get the number of samples per label id given the total number of samples

Parameters:

num_samples (int) – The total number of samples
fraction_per_label_id (list[float], optional) – The fraction of samples for each label id. The fraction does not have to be normalized. When it is None, the fraction is assumed to be the same as the fraction of label ids in the private data. Defaults to None

Raises:

ValueError – If the length of fraction_per_label_id is not the same as the number of labels
ValueError – If the number of samples is so small that the number of samples for some label ids is zero

Returns:

The number of samples per label id

Return type:

np.ndarray

_log_metrics(syn_data)[source]

Log metrics.

Parameters:: syn_data (pe.data.Data) – The synthetic data

evaluate(checkpoint_path)[source]

Evaluate the synthetic data.

Parameters:: checkpoint_path (str) – The path to the checkpoint

load_checkpoint(checkpoint_path)[source]

Load a checkpoint.

Parameters:: checkpoint_path (str) – The path to the checkpoint
Returns:: The synthetic data
Return type:: pe.data.Data or None

run(num_samples_schedule, delta, epsilon=None, noise_multiplier=None, checkpoint_path=None, save_checkpoint=True, fraction_per_label_id=None)[source]

Run the PE algorithm.

Parameters:

num_samples_schedule (list[int]) – The schedule of the number of samples for each PE iteration. The first element is the number of samples for the initial data, and the rest are the number of samples for each PE iteration. So the length of the list is the number of PE iterations plus one
delta (float) – The delta value of DP
epsilon (float, optional) – The epsilon value of DP, defaults to None
noise_multiplier (float, optional) – The noise multiplier of the DP mechanism, defaults to None
checkpoint_path (str, optional) – The path to load and save the checkpoint, defaults to None
save_checkpoint (bool, optional) – Whether to save the checkpoint, defaults to True
fraction_per_label_id (list[float], optional) – The fraction of samples for each label id. The fraction does not have to be normalized. When it is None, the fraction is assumed to be the same as the fraction of label ids in the private data. Defaults to None

Returns:

The synthetic data

Return type:

pe.data.Data

Submodules

pe.runner.pe module
- PE