Data Utils

raimitigations.utils.data_utils.err_float_01(param, param_name)

Raises an error if param is not in the range [0, 1]. param_name represents the name of the parameter ‘param’. This makes it easier for the user to identify where the error occurred.

Parameters
  • param – the numerical parameter being analyzed;

  • param_name – the internal name used for the parameter provided in param.

raimitigations.utils.data_utils.freedman_diaconis(data: Series)

Computes the optimal number of bins for a set of data using the Freedman Diaconis rule.

Parameters

data – the data column used to compute the number of bins.

raimitigations.utils.data_utils.get_cat_cols(df: DataFrame, subset: Optional[list] = None)

Returns a list of all categorical columns in the dataset df. If subset != None, check for categorical columns only in the columns within the subset list.

Parameters
  • df – the dataset being analyzed;

  • subset – the list of columns that should be analyzed. If subset is None, then check all columns.

Returns

a list with the name of all categorical columns.

Return type

list

raimitigations.utils.data_utils.ordinal_to_onehot(arr: list, n_class: int)

Converts a list of ordinal values that ranges from [0, n_class] to a one-hot matrix with shape (len(arr), n_class).

Parameters
  • arr – a list of labels

  • n_class – the number of classes in arr

Returns

a list of lists (a matrix) of one-hot encodings, where each label in arr is one-hot encoded according to the maximum number of classes n_class.

Return type

list of lists