BasicImputer

class raimitigations.dataprocessing.BasicImputer(df: Optional[Union[DataFrame, ndarray]] = None, col_impute: Optional[list] = None, categorical: Optional[dict] = None, numerical: Optional[dict] = None, specific_col: Optional[dict] = None, verbose: bool = True)

Bases: DataImputer

Concrete class that imputes missing data in a dataset using a set of simple strategies. Implements a simple imputation approach, where the missing values are filled with the mean, median, constant value, or the most frequent value, where mean and median are only valid for numerical values. This subclass uses the SimpleImputer class from sklearn in the background. The main advantage is that this subclass allows using the simple imputation approach over several different columns at once, each with its own set of parameters. For more details check the SimpleImputer’s documentation.

Parameters
  • df – pandas data frame that contains the columns to be imputed;

  • col_impute – a list of the column names or indexes that will be imputed. If None, this parameter will be set automatically as being a list of all columns;

  • categorical

    a dict indicating the parameters used by SimpleImputer. Represents the parameters of SimpleImputer used on all categorical columns not represented in the specific_col param. The dict has the following structure:

    {
    ‘missing_values’:np.nan,
    ’strategy’:’constant’,
    ’fill_value’:’NULL’
    }

    where ‘missing_values’, ‘strategy’, and ‘fill_value’ are the parameters used by sklearn’s SimpleImputer. If None, this dict will be auto-filled as the one above;

  • numerical

    similar to categorical, but instead, represents the parameters of the SimpleImputer to be used on all numerical columns not present in the specific_col param. If None, this dict will be auto-filled as follows:

    {
    ‘missing_values’:np.nan,
    ’strategy’:’mean’,
    ’fill_value’:None
    }

  • specific_col

    a dict of dicts. Each key of the main dict must be a column name present in the col_impute param. This key must be associated with a dict similar to the one in categorical param, which indicates the parameters to be used by the SimpleImputer for the specified column (key). If one of the columns in col_impute are not present in the main dict, then the type of this column is automatically identified as being either numeric or categorical. And then, the categorical or numerical parameters are used for those columns. The dict structure is given by:

    {
    COL_NAME1: {
    ‘missing_values’: np.nan,
    ’strategy’: ‘constant’,
    ’fill_value’: ‘NULL’
    }
    COL_NAME2: {
    ‘missing_values’: np.nan,
    ’strategy’: ‘constant’,
    ’fill_value’: ‘NULL’
    }
    etc.
    }

  • verbose – indicates whether internal messages should be printed or not.

Class Diagram

Inheritance diagram of raimitigations.dataprocessing.BasicImputer

Example