pe.data.data module
- class pe.data.data.Data(data_frame=None, metadata={})[source]
Bases:
object
The class that holds the private data or synthetic data from PE.
- __init__(data_frame=None, metadata={})[source]
Constructor.
- Parameters:
data_frame (
pandas.DataFrame
, optional) – A pandas dataframe that holds the data, defaults to Nonemetadata (dict, optional) – the metadata of the data, defaults to {}
- classmethod concat(data_list, metadata=None)[source]
Concatenate the data frames of a list of data objects
- Parameters:
data_list (list[
pe.data.Data
]) – The list of data objects to concatenatemetadata (dict, optional) – The metadata of the concatenated data. When None, the metadata of the list of data objects must be the same and will be used. Defaults to None
- Raises:
ValueError – If the metadata of the data objects are not the same
- Returns:
The concatenated data object
- Return type:
- filter(filter_criteria)[source]
Filter the data object according to a filter criteria
- Parameters:
filter_criteria (dict) – The filter criteria. None means no filter
- Returns:
The filtered data object
- Return type:
- filter_label_id(label_id)[source]
Filter the data frame according to a label id
- Parameters:
label_id (int) – The label id that is used to filter the data frame
- Returns:
pe.data.Data
object with the filtered data frame- Return type:
- load_checkpoint(path)[source]
Load data from a checkpoint
- Parameters:
path (str) – The folder that contains the checkpoint
- Returns:
Whether the checkpoint is loaded successfully
- Return type:
bool
- merge(data)[source]
Merge the data object with another data object
- Parameters:
data (
pe.data.Data
) – The data object to merge- Raises:
ValueError – If the metadata of data is not the same as the metadata of the current object
- Returns:
The merged data object
- Return type:
- random_split(num_samples_list, seed=0)[source]
Randomly split the data frame into multiple data frames
- Parameters:
num_samples_list (list[int]) – The list of numbers of samples for each data frame
seed (int, optional) – The seed for the random number generator, defaults to 0
- Raises:
ValueError – If the sum of num_samples_list is not equal to the number of samples
- Returns:
The list of
pe.data.Data
objects with the splited data- Return type:
list[
pe.data.Data
]
- random_truncate(num_samples)[source]
Randomly truncate the data frame to a certain number of samples
- Parameters:
num_samples (int) – The number of samples to randomly truncate
- Returns:
A new
pe.data.Data
object with the randomly truncated data frame- Return type:
- reset_index(**kwargs)[source]
Reset the index of the data frame
- Parameters:
kwargs (dict) – The keyword arguments to pass to the pandas reset_index function
- Returns:
A new
pe.data.Data
object with the reset index data frame- Return type:
- save_checkpoint(path)[source]
Save the data to a checkpoint.
- Parameters:
path (str) – The folder to save the checkpoint
- Raises:
ValueError – If the path is None
ValueError – If the data frame is empty
- set_label_id(label_id)[source]
Set the label id for the data frame
- Parameters:
label_id (int) – The label id to set
- split_by_client()[source]
Split the data frame by client ID
- Raises:
ValueError – If the client ID column is not in the data frame
- Returns:
The list of data objects with the splited data
- Return type:
list[
pe.data.Data
]
- split_by_index()[source]
Split the data frame by index
- Returns:
The list of data objects with the splited data
- Return type:
list[
pe.data.Data
]
- truncate(num_samples)[source]
Truncate the data frame to a certain number of samples
- Parameters:
num_samples (int) – The number of samples to truncate
- Returns:
A new
pe.data.Data
object with the truncated data frame- Return type: