IterativeDataImputer

class raimitigations.dataprocessing.IterativeDataImputer(df: Optional[Union[DataFrame, ndarray]] = None, col_impute: Optional[list] = None, enable_encoder: bool = False, iterative_params: Optional[dict] = None, sklearn_obj: Optional[object] = None, verbose: bool = True)

Bases: DataImputer

Concrete class that imputes missing data of a feature using the other features. It uses a round-robin method of modeling each feature with missing values to be imputed as a function of the other features. This subclass uses the IterativeImputer class from sklearn in the background (note that this sklearn class is still in an experimental stage). sklearn.impute.IterativeImputer can only handle numerical data, however, this subclass allows for categorical input by applying ordinal encoding before calling the sklearn class. In order to use this function, use enable_encoder=True. Note that encoded columns are not guaranteed to reverse transform if they have imputed values. If you’d like to use a different type of encoding before imputation, consider using the Pipeline class and call your own encoder before calling this subclass for imputation. For more details see: https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html#

Parameters
  • df – pandas data frame that contains the columns to be imputed;

  • col_impute – a list of the column names or indexes that will be imputed. If None, this parameter will be set automatically as being a list of all columns with any NaN value;

  • enable_encoder – a boolean flag to allow for applying ordinal encoding of categorical data before applying the IterativeImputer since it only accepts numerical values.

  • iterative_params

    a dict indicating the parameters used by IterativeImputer. The dict has the following structure:

    {
    ‘estimator’:BayesianRidge(),
    ’missing_values’:np.nan,
    ’sample_posterior’:False,
    ’max_iter’:10,
    ’tol’:1e-3,
    ’n_nearest_features’:None,
    ’initial_strategy’:’mean’,
    ’imputation_order’:’ascending’,
    ’skip_complete’:False,
    ’min_value’:-np.inf,
    ’max_value’:np.inf,
    ’random_state’:None
    }

    where these are the parameters used by sklearn’s IterativeImputer. If None, this dict will be auto-filled as the one above. Note: initial_strategy can take one of these values: ['mean', 'median', 'most_frequent', 'constant']

  • sklearn_obj – an sklearn.impute.IterativeImputer object to use directly. If this parameter is used, iterative_params will be overwritten.

  • verbose – indicates whether internal messages should be printed or not.

Class Diagram

Inheritance diagram of raimitigations.dataprocessing.IterativeDataImputer

Example