BasicImputer
- class raimitigations.dataprocessing.BasicImputer(df: Optional[Union[DataFrame, ndarray]] = None, col_impute: Optional[list] = None, categorical: Optional[dict] = None, numerical: Optional[dict] = None, specific_col: Optional[dict] = None, verbose: bool = True)
Bases:
DataImputer
Concrete class that imputes missing data in a dataset using a set of simple strategies. Implements a simple imputation approach, where the missing values are filled with the mean, median, constant value, or the most frequent value, where mean and median are only valid for numerical values. This subclass uses the
SimpleImputer
class fromsklearn
in the background. The main advantage is that this subclass allows using the simple imputation approach over several different columns at once, each with its own set of parameters. For more details check the SimpleImputer’s documentation.- Parameters
df – pandas data frame that contains the columns to be imputed;
col_impute – a list of the column names or indexes that will be imputed. If None, this parameter will be set automatically as being a list of all columns;
categorical –
a dict indicating the parameters used by
SimpleImputer
. Represents the parameters ofSimpleImputer
used on all categorical columns not represented in thespecific_col
param. The dict has the following structure:{‘missing_values’:np.nan,’strategy’:’constant’,’fill_value’:’NULL’}where ‘missing_values’, ‘strategy’, and ‘fill_value’ are the parameters used by sklearn’s SimpleImputer. If None, this dict will be auto-filled as the one above;
numerical –
similar to
categorical
, but instead, represents the parameters of the SimpleImputer to be used on all numerical columns not present in thespecific_col
param. If None, this dict will be auto-filled as follows:{‘missing_values’:np.nan,’strategy’:’mean’,’fill_value’:None}specific_col –
a dict of dicts. Each key of the main dict must be a column name present in the
col_impute
param. This key must be associated with a dict similar to the one incategorical
param, which indicates the parameters to be used by the SimpleImputer for the specified column (key). If one of the columns incol_impute
are not present in the main dict, then the type of this column is automatically identified as being either numeric or categorical. And then, thecategorical
ornumerical
parameters are used for those columns. The dict structure is given by:{COL_NAME1: {‘missing_values’: np.nan,’strategy’: ‘constant’,’fill_value’: ‘NULL’}COL_NAME2: {‘missing_values’: np.nan,’strategy’: ‘constant’,’fill_value’: ‘NULL’}etc.}verbose – indicates whether internal messages should be printed or not.
Class Diagram