BasicImputer
- class raimitigations.dataprocessing.BasicImputer(df: Optional[Union[DataFrame, ndarray]] = None, col_impute: Optional[list] = None, categorical: Optional[dict] = None, numerical: Optional[dict] = None, specific_col: Optional[dict] = None, verbose: bool = True)
Bases:
DataImputerConcrete class that imputes missing data in a dataset using a set of simple strategies. Implements a simple imputation approach, where the missing values are filled with the mean, median, constant value, or the most frequent value, where mean and median are only valid for numerical values. This subclass uses the
SimpleImputerclass fromsklearnin the background. The main advantage is that this subclass allows using the simple imputation approach over several different columns at once, each with its own set of parameters. For more details check the SimpleImputer’s documentation.- Parameters
df – pandas data frame that contains the columns to be imputed;
col_impute – a list of the column names or indexes that will be imputed. If None, this parameter will be set automatically as being a list of all columns;
categorical –
a dict indicating the parameters used by
SimpleImputer. Represents the parameters ofSimpleImputerused on all categorical columns not represented in thespecific_colparam. The dict has the following structure:{‘missing_values’:np.nan,’strategy’:’constant’,’fill_value’:’NULL’}where ‘missing_values’, ‘strategy’, and ‘fill_value’ are the parameters used by sklearn’s SimpleImputer. If None, this dict will be auto-filled as the one above;
numerical –
similar to
categorical, but instead, represents the parameters of the SimpleImputer to be used on all numerical columns not present in thespecific_colparam. If None, this dict will be auto-filled as follows:{‘missing_values’:np.nan,’strategy’:’mean’,’fill_value’:None}specific_col –
a dict of dicts. Each key of the main dict must be a column name present in the
col_imputeparam. This key must be associated with a dict similar to the one incategoricalparam, which indicates the parameters to be used by the SimpleImputer for the specified column (key). If one of the columns incol_imputeare not present in the main dict, then the type of this column is automatically identified as being either numeric or categorical. And then, thecategoricalornumericalparameters are used for those columns. The dict structure is given by:{COL_NAME1: {‘missing_values’: np.nan,’strategy’: ‘constant’,’fill_value’: ‘NULL’}COL_NAME2: {‘missing_values’: np.nan,’strategy’: ‘constant’,’fill_value’: ‘NULL’}etc.}verbose – indicates whether internal messages should be printed or not.
Class Diagram
