EncoderOHE

class raimitigations.dataprocessing.EncoderOHE(df: Optional[Union[DataFrame, ndarray]] = None, col_encode: Optional[list] = None, drop: bool = True, unknown_err: bool = True, verbose: bool = True)

Bases: DataEncoding

Concrete class that applies one-hot encoding over a dataset. The categorical features are encoded using the One-Hot encoding class from sklearn. The main difference between using the sklearn implementation directly is that the transform method implemented here returns a data frame instead of a numpy array. This is useful when it is important to maintain the data frame format without losing the name of the columns. The new columns created for the one-hot encoding are named according to the original’s dataset column name and the value that is one-hot encoded.

Parameters
  • df – pandas data frame that contains the columns to be encoded;

  • col_encode – a list of the column names or indexes that will be encoded. If None, this parameter will be set automatically as being a list of all categorical variables in the dataset;

  • drop – if True, drop the one-hot encoded column of the first category of a given feature. This way, a feature with N different categories will be encoded using N-1 one-hot encoded columns. This is useful when using models that does not work properly with colinear columns: when using all one-hot columns, each of these columns can be expressed as a linear combination of the other columns. By removing one of these columns using drop=True, we remove this collinearity. Note however that several models can work even with colinear columns;

  • unknown_err – if True, when an unknown category is encountered, an error is raised. If False, when an unknown category is found, all encoded columns will be set to zero. Note that unknown_err = False does not work with drop = True;

  • verbose – indicates whether internal messages should be printed or not.

get_one_hot_columns()

Returns a list with the column names or column indices of the one-hot encoded columns. These are the columns created by the one-hot encoder that replaced the original columns.

Returns

a list with the column names or column indices of the one-hot encoded columns.

Return type

list

Class Diagram

Inheritance diagram of raimitigations.dataprocessing.EncoderOHE

Example