DataQuantileTransformer
- class raimitigations.dataprocessing.DataQuantileTransformer(scaler_obj: Optional[QuantileTransformer] = None, df: Optional[Union[DataFrame, ndarray]] = None, exclude_cols: Optional[list] = None, include_cols: Optional[list] = None, transform_pipe: Optional[list] = None, verbose: bool = True)
Bases:
DataScaler
Concrete class that applies the
QuantileTransformer
scaler over a given dataset. This class uses thesklearn
’s implementation of this scaler (QuantileTransformer
) at its root, but also makes it more simple to be applied to a dataset. For example, the user can use a dataset with categorical columns and the scaler will be applied only to the numerical columns. Also, the user can provide a pipeline of scalers, and all of the scalers in the pipeline will be applied before theQuantileTransformer
scaler. The user can also use a list of transformations using other non-scaler classes implemented in this library (feature selection, encoding, imputation, etc.). For more details on how theQuantileTransformer
changes the data, check the official documentation from sklearn.- Parameters
scaler_obj – an object from the
QuantileTransformer
class. Thissklearn
scaler will be used to perform the scaling process. If None, aQuantileTransformer
is created using default values;df – pandas data frame that contains the columns to be scaled and/or transformed;
exclude_cols – a list of the column names or indexes that shouldn’t be transformed, that is, a list of columns to be ignored. This way, it is possible to transform only certain columns and leave other columns unmodified. This is useful if the dataset contains a set of binary columns that should be left as is, or a set of categorical columns (which can’t be scaled or transformed). If the categorical columns are not added in this list (
exclude_cols
), the categorical columns will be automatically identified and added into theexclude_cols
list. If None, this parameter will be set automatically as being a list of all categorical variables in the dataset;include_cols – list of the column names or indexes that should be transformed, that is, a list of columns to be included in the dataset being transformed. This parameter uses an inverse logic from the
exclude_cols
, and thus these two parameters shouldn’t be used at the same time. The user must used either theinclude_cols
, or theexclude_cols
, or neither of them;transform_pipe – a list of transformations to be used as a pre-processing pipeline. Each transformation in this list must be a valid subclass of the current library (
EncoderOrdinal
,BasicImputer
, etc.). Some feature selection methods require a dataset with no categorical features or with no missing values (depending on the approach). If no transformations are provided, a set of default transformations will be used, which depends on the feature selection approach (subclass dependent). This parameter also accepts other scalers in the list. When this happens and theinverse_transform()
method of self is called, theinverse_transform()
method of all scaler objects that appear in thetransform_pipe
list after the last non-scaler object are called in a reversed order. For example, ifis instantiated with transform_pipe=[BasicImputer(), DataQuantileTransformer(), EncoderOHE(), DataPowerTransformer()], then, when calling :meth:`fit
on theDataMinMaxScaler
object, first the dataset will be fitted and transformed using BasicImputer, followed by DataQuantileTransformer, EncoderOHE, and DataPowerTransformer, and only then it will be fitted and transformed using the current DataMinMaxScaler. Thetransform()
method works in a similar way, the difference being that it doesn’t callfit()
for the data scaler in thetransform_pipe
. For theinverse_transform()
method, the inverse transforms are applied in reverse order, but only the scaler objects that appear after the last non-scaler object in thetransform_pipe
: first, we inverse theDataMinMaxScaler
, followed by the inversion of theDataPowerTransformer
. TheDataQuantileTransformer
isn’t reversed because it appears between two non-scaler objects:BasicImputer
andEncoderOHE
;verbose – indicates whether internal messages should be printed or not.
Class Diagram