pe.data.data module

class pe.data.data.Data(data_frame=None, metadata={})[source]

Bases: object

The class that holds the private data or synthetic data from PE.

__init__(data_frame=None, metadata={})[source]

Constructor.

Parameters:

data_frame (pandas.DataFrame, optional) – A pandas dataframe that holds the data, defaults to None
metadata (dict, optional) – the metadata of the data, defaults to {}

classmethod concat(data_list, metadata=None)[source]

Concatenate the data frames of a list of data objects

Parameters:

data_list (list[pe.data.Data]) – The list of data objects to concatenate
metadata (dict, optional) – The metadata of the concatenated data. When None, the metadata of the list of data objects must be the same and will be used. Defaults to None

Raises:

ValueError – If the metadata of the data objects are not the same

Returns:

The concatenated data object

Return type:

pe.data.Data

filter(filter_criteria)[source]

Filter the data object according to a filter criteria

Parameters:: filter_criteria (dict) – The filter criteria. None means no filter
Returns:: The filtered data object
Return type:: pe.data.Data

filter_label_id(label_id)[source]

Filter the data frame according to a label id

Parameters:: label_id (int) – The label id that is used to filter the data frame
Returns:: pe.data.Data object with the filtered data frame
Return type:: pe.data.Data

load_checkpoint(path)[source]

Load data from a checkpoint

Parameters:: path (str) – The folder that contains the checkpoint
Returns:: Whether the checkpoint is loaded successfully
Return type:: bool

merge(data)[source]

Merge the data object with another data object

Parameters:: data (pe.data.Data) – The data object to merge
Raises:: ValueError – If the metadata of data is not the same as the metadata of the current object
Returns:: The merged data object
Return type:: pe.data.Data

random_split(num_samples_list, seed=0)[source]

Randomly split the data frame into multiple data frames

Parameters:

num_samples_list (list[int]) – The list of numbers of samples for each data frame
seed (int, optional) – The seed for the random number generator, defaults to 0

Raises:

ValueError – If the sum of num_samples_list is not equal to the number of samples

Returns:

The list of pe.data.Data objects with the splited data

Return type:

list[pe.data.Data]

random_truncate(num_samples)[source]

Randomly truncate the data frame to a certain number of samples

Parameters:: num_samples (int) – The number of samples to randomly truncate
Returns:: A new pe.data.Data object with the randomly truncated data frame
Return type:: pe.data.Data

reset_index(**kwargs)[source]

Reset the index of the data frame

Parameters:: kwargs (dict) – The keyword arguments to pass to the pandas reset_index function
Returns:: A new pe.data.Data object with the reset index data frame
Return type:: pe.data.Data

save_checkpoint(path)[source]

Save the data to a checkpoint.

Parameters:

path (str) – The folder to save the checkpoint

Raises:

ValueError – If the path is None
ValueError – If the data frame is empty

set_label_id(label_id)[source]

Set the label id for the data frame

Parameters:: label_id (int) – The label id to set

split_by_client()[source]

Split the data frame by client ID

Raises:: ValueError – If the client ID column is not in the data frame
Returns:: The list of data objects with the splited data
Return type:: list[pe.data.Data]

split_by_index()[source]

Split the data frame by index

Returns:: The list of data objects with the splited data
Return type:: list[pe.data.Data]

truncate(num_samples)[source]

Truncate the data frame to a certain number of samples

Parameters:: num_samples (int) – The number of samples to truncate
Returns:: A new pe.data.Data object with the truncated data frame
Return type:: pe.data.Data