EncoderOrdinal

class raimitigations.dataprocessing.EncoderOrdinal(df: Optional[Union[DataFrame, ndarray]] = None, col_encode: Optional[list] = None, categories: Union[dict, str] = 'auto', unknown_err: bool = False, unknown_value: Union[int, float] = -1, verbose: bool = True)

Bases: DataEncoding

Concrete class that applies ordinal encoding over a dataset. The categorical features are encoded using the ordinal encoding class from``sklearn``. The main difference between using the``sklearn`` implementation directly is that we allow the user to pass a list of columns to be encoded when creating the class, instead of having to explicitly use the sklearn class over each column individually.

Parameters
  • df – pandas data frame that contains the columns to be encoded;

  • col_encode – a list of the column names or indexes that will be encoded. If None, this parameter will be set automatically as being a list of all categorical variables in the dataset;

  • categories

    can be a dict or a string:

    • dict: a dict that indicates the order of the values for each column in col_encode. That is, a dict of lists, where the keys must be valid column names and the value associated to key k is a list of size n, where n is the number of different values that exist in column k, and this list represents the order of the encoding for that column;

    • string: the only string value allowed is “auto”. When categories = “auto”, the categories dict used by``sklearn`` is generated automatically;

  • unknown_err – if True, an error will occur when the transform method is called upon a dataset with a new category in one of the encoded columns that were not present in the training dataset (provided to the fit() method). If False, no error will occur in the previous situation. Instead, every unknown category in a given encoded column will be replaced by the label unknown_value;

  • unknown_value – the value used when an unknown category is found in one of the encoded columns. This parameter must be different than the other labels already used by the column(s) with unknown values. We recommend using negative values to avoid conflicts;

  • verbose – indicates whether internal messages should be printed or not.

get_mapping()

Returns a dictionary with all the information regarding the mapping performed by the ordinal encoder. The dictionary contains the following structure:

  • One key for each column. Each key is associated with a secondary dictionary with the following keys:

    • “values”: the unique values encountered in the column;

    • “labels”: the labels assigned to each of the unique values. the list from the “values” key is aligned with this list, that is, mapping[column][“labels”][i] is the label assigned to the value mapping[column][“values”][i].

    • “n_labels”: the number of labels. If unknown_err is set to False, this will account for the label for unknown values.

Returns

a dictionary with all the information regarding the mapping performed by the ordinal encoder.

Return type

dict

Class Diagram

Inheritance diagram of raimitigations.dataprocessing.EncoderOrdinal

Example