Encoders

This sub-module of the dataprocessing package collects all encoding transformers implemented here. Encoders are responsible for encoding categorical features into numerical features. All the encoder methods from the dataprocessing package are based on the abstract class presented below, called DataEncoding.

class raimitigations.dataprocessing.DataEncoding(df: Optional[Union[DataFrame, ndarray]] = None, col_encode: Optional[list] = None, verbose: bool = True)

Bases: DataProcessing

Base class for all encoding subclasses. Implements basic functionalities that can be used by all encoding approaches.

Parameters
  • df – pandas data frame that contains the columns to be encoded;

  • col_encode – a list of the column names or indexes that will be encoded. If None, this parameter will be set automatically as being a list of all categorical variables in the dataset;

  • verbose – indicates whether internal messages should be printed or not.

fit(df: Optional[Union[DataFrame, ndarray]] = None, y: Optional[Union[Series, ndarray]] = None)

Default fit method for all encoders that inherit from the DataEncoding class. The following steps are executed: (i) set the dataset, (ii) set the list of columns that will be encoded, (iii) check for any invalid input, (iv) call the fit method of the child class.

Parameters
  • df – the full dataset;

  • y – ignored. This exists for compatibility with the sklearn’s Pipeline class.

get_encoded_columns()

Returns a list with the column names or column indices of the encoded columns.

Returns

a list with the column names or column indices of the encoded columns.

Return type

list

transform(df: Union[DataFrame, ndarray])

Transforms a given dataset by encoding all columns specified by the col_encode parameter. Returns a dataset with the encoded columns.

Parameters

df – the full dataset with the columns to be encoded.

Returns

the transformed dataset.

Return type

pd.DataFrame or np.ndarray

The following is a list of all encoders implemented in this module. All of the classes below inherit from the DataEncoding class, and thus, have access to all of the methods previously shown.

Child Classes

Class Diagram

Inheritance diagram of raimitigations.dataprocessing.EncoderOHE, raimitigations.dataprocessing.EncoderOrdinal

Examples