EncoderOHE
- class raimitigations.dataprocessing.EncoderOHE(df: Optional[Union[DataFrame, ndarray]] = None, col_encode: Optional[list] = None, drop: bool = True, unknown_err: bool = True, verbose: bool = True)
- Bases: - DataEncoding- Concrete class that applies one-hot encoding over a dataset. The categorical features are encoded using the One-Hot encoding class from - sklearn. The main difference between using the- sklearnimplementation directly is that the transform method implemented here returns a data frame instead of a numpy array. This is useful when it is important to maintain the data frame format without losing the name of the columns. The new columns created for the one-hot encoding are named according to the original’s dataset column name and the value that is one-hot encoded.- Parameters
- df – pandas data frame that contains the columns to be encoded; 
- col_encode – a list of the column names or indexes that will be encoded. If None, this parameter will be set automatically as being a list of all categorical variables in the dataset; 
- drop – if True, drop the one-hot encoded column of the first category of a given feature. This way, a feature with N different categories will be encoded using N-1 one-hot encoded columns. This is useful when using models that does not work properly with colinear columns: when using all one-hot columns, each of these columns can be expressed as a linear combination of the other columns. By removing one of these columns using drop=True, we remove this collinearity. Note however that several models can work even with colinear columns; 
- unknown_err – if True, when an unknown category is encountered, an error is raised. If False, when an unknown category is found, all encoded columns will be set to zero. Note that unknown_err = False does not work with drop = True; 
- verbose – indicates whether internal messages should be printed or not. 
 
 - get_one_hot_columns()
- Returns a list with the column names or column indices of the one-hot encoded columns. These are the columns created by the one-hot encoder that replaced the original columns. - Returns
- a list with the column names or column indices of the one-hot encoded columns. 
- Return type
- list 
 
 
Class Diagram
