pe.runner package
Submodules
pe.runner.pe module
- class pe.runner.pe.PE(priv_data, population, histogram, dp=None, loggers=[], callbacks=[])[source]
Bases:
object
The class that runs the PE algorithm.
- __init__(priv_data, population, histogram, dp=None, loggers=[], callbacks=[])[source]
Constructor.
- Parameters:
priv_data (
pe.data.data.Data
) – The private datapopulation (
pe.population.population.Population
) – The population algorithmhistogram (
pe.histogram.histogram.Histogram
) – The histogram algorithmdp (
pe.dp.dp.DP
, optional) – The DP algorithm, defaults to None, in which case the Gaussian mechanismpe.dp.gaussian.Gaussian
is usedloggers (list[
pe.logger.logger.Logger
], optional) – The list of loggers, defaults to []callbacks (list[Callable or
pe.callback.callback.Callback
], optional) – The list of callbacks, defaults to []
- _get_num_samples_per_label_id(num_samples, fraction_per_label_id)[source]
Get the number of samples per label id given the total number of samples
- Parameters:
num_samples (int) – The total number of samples
fraction_per_label_id (list[float], optional) – The fraction of samples for each label id. The fraction does not have to be normalized. When it is None, the fraction is assumed to be the same as the fraction of label ids in the private data. Defaults to None
- Raises:
ValueError – If the length of fraction_per_label_id is not the same as the number of labels
ValueError – If the number of samples is so small that the number of samples for some label ids is zero
- Returns:
The number of samples per label id
- Return type:
np.ndarray
- _log_metrics(syn_data)[source]
Log metrics.
- Parameters:
syn_data (
pe.data.data.Data
) – The synthetic data
- load_checkpoint(checkpoint_path)[source]
Load a checkpoint.
- Parameters:
checkpoint_path (str) – The path to the checkpoint
- Returns:
The synthetic data
- Return type:
pe.data.data.Data
or None
- run(num_samples_schedule, delta, epsilon=None, noise_multiplier=None, checkpoint_path=None, save_checkpoint=True, fraction_per_label_id=None)[source]
Run the PE algorithm.
- Parameters:
num_samples_schedule (list[int]) – The schedule of the number of samples for each PE iteration. The first element is the number of samples for the initial data, and the rest are the number of samples for each PE iteration. So the length of the list is the number of PE iterations plus one
delta (float) – The delta value of DP
epsilon (float, optional) – The epsilon value of DP, defaults to None
noise_multiplier (float, optional) – The noise multiplier of the DP mechanism, defaults to None
checkpoint_path (str, optional) – The path to load and save the checkpoint, defaults to None
save_checkpoint (bool, optional) – Whether to save the checkpoint, defaults to True
fraction_per_label_id (list[float], optional) – The fraction of samples for each label id. The fraction does not have to be normalized. When it is None, the fraction is assumed to be the same as the fraction of label ids in the private data. Defaults to None
- Returns:
The synthetic data
- Return type: