Data Utils
- raimitigations.utils.data_utils.err_float_01(param, param_name)
Raises an error if param is not in the range [0, 1]. param_name represents the name of the parameter ‘param’. This makes it easier for the user to identify where the error occurred.
- Parameters
param – the numerical parameter being analyzed;
param_name – the internal name used for the parameter provided in param.
- raimitigations.utils.data_utils.freedman_diaconis(data: Series)
Computes the optimal number of bins for a set of data using the Freedman Diaconis rule.
- Parameters
data – the data column used to compute the number of bins.
- raimitigations.utils.data_utils.get_cat_cols(df: DataFrame, subset: Optional[list] = None)
Returns a list of all categorical columns in the dataset df. If subset != None, check for categorical columns only in the columns within the subset list.
- Parameters
df – the dataset being analyzed;
subset – the list of columns that should be analyzed. If subset is None, then check all columns.
- Returns
a list with the name of all categorical columns.
- Return type
list
- raimitigations.utils.data_utils.ordinal_to_onehot(arr: list, n_class: int)
Converts a list of ordinal values that ranges from [0, n_class] to a one-hot matrix with shape (len(arr), n_class).
- Parameters
arr – a list of labels
n_class – the number of classes in arr
- Returns
a list of lists (a matrix) of one-hot encodings, where each label in arr is one-hot encoded according to the maximum number of classes n_class.
- Return type
list of lists