DataRobustScaler

class raimitigations.dataprocessing.DataRobustScaler(scaler_obj: Optional[RobustScaler] = None, df: Optional[Union[DataFrame, ndarray]] = None, exclude_cols: Optional[list] = None, include_cols: Optional[list] = None, transform_pipe: Optional[list] = None, verbose: bool = True)

Bases: DataScaler

Concrete class that applies the RobustScaler scaler over a given dataset. This class uses the sklearn’s implementation of this scaler (RobustScaler) at its root, but also makes it more simple to be applied to a dataset. For example, the user can use a dataset with categorical columns and the scaler will be applied only to the numerical columns. Also, the user can provide a pipeline of scalers, and all of the scalers in the pipeline will be applied before the RobustScaler scaler. The user can also use a list of transformations using other non-scaler classes implemented in this library (feature selection, encoding, imputation, etc.). For more details on how the RobustScaler scales the data, check the official documentation from sklearn.

Parameters
  • scaler_obj – an object from the RobustScaler class. This sklearn scaler will be used to perform the scaling process. If None, a RobustScaler is created using default values;

  • df – pandas data frame that contains the columns to be scaled and/or transformed;

  • exclude_cols – a list of the column names or indexes that shouldn’t be transformed, that is, a list of columns to be ignored. This way, it is possible to transform only certain columns and leave other columns unmodified. This is useful if the dataset contains a set of binary columns that should be left as is, or a set of categorical columns (which can’t be scaled or transformed). If the categorical columns are not added in this list (exclude_cols), the categorical columns will be automatically identified and added into the exclude_cols list. If None, this parameter will be set automatically as being a list of all categorical variables in the dataset;

  • include_cols – list of the column names or indexes that should be transformed, that is, a list of columns to be included in the dataset being transformed. This parameter uses an inverse logic from the exclude_cols, and thus these two parameters shouldn’t be used at the same time. The user must used either the include_cols, or the exclude_cols, or neither of them;

  • transform_pipe – a list of transformations to be used as a pre-processing pipeline. Each transformation in this list must be a valid subclass of the current library (EncoderOrdinal, BasicImputer, etc.). Some feature selection methods require a dataset with no categorical features or with no missing values (depending on the approach). If no transformations are provided, a set of default transformations will be used, which depends on the feature selection approach (subclass dependent). This parameter also accepts other scalers in the list. When this happens and the inverse_transform() method of self is called, the inverse_transform() method of all scaler objects that appear in the transform_pipe list after the last non-scaler object are called in a reversed order. For example, if is instantiated with transform_pipe=[BasicImputer(), DataQuantileTransformer(), EncoderOHE(), DataPowerTransformer()], then, when calling :meth:`fit on the DataMinMaxScaler object, first the dataset will be fitted and transformed using BasicImputer, followed by DataQuantileTransformer, EncoderOHE, and DataPowerTransformer, and only then it will be fitted and transformed using the current DataMinMaxScaler. The transform() method works in a similar way, the difference being that it doesn’t call fit() for the data scaler in the transform_pipe. For the inverse_transform() method, the inverse transforms are applied in reverse order, but only the scaler objects that appear after the last non-scaler object in the transform_pipe: first, we inverse the DataMinMaxScaler, followed by the inversion of the DataPowerTransformer. The DataQuantileTransformer isn’t reversed because it appears between two non-scaler objects: BasicImputer and EncoderOHE;

  • verbose – indicates whether internal messages should be printed or not.

Class Diagram

Inheritance diagram of raimitigations.dataprocessing.DataRobustScaler

Example