Decoupled Classifiers Case Study 1

For the first case study, we’ll highlight the benefits of using the decoupled classifiers over different cohorts of the data. This module implements techniques for searching and combining cohorts to optimize for different definitions of group fairness based on the approach presented in the paper Decoupled classifiers for group-fair and efficient machine learning.

The techniques implemented in this module work with the Cohort module of this library to fit an estimator over each cohort while leveraging transfer learning and other optimization techniques for minority cohorts when the data for such cohorts is not sufficient.

import pandas as pd
import numpy as np
import random

from sklearn.tree import DecisionTreeClassifier
from lightgbm import LGBMClassifier

from raimitigations.utils import split_data
import raimitigations.dataprocessing as dp
from raimitigations.cohort import DecoupledClass, CohortDefinition, CohortManager, fetch_cohort_results
from sklearn.pipeline import Pipeline
SEED = 100

Throughout this case study, we will recreate and use a synthetic dataset created as part of Cohort case study 1 to showcase the additional techniques this module can use to optimize fairness and performance over cohorts.

def _create_country_df(samples: int, sectors: dict, country_name: str):
    df = None
    for key in sectors.keys():
        size = int(samples * sectors[key]["prob_occur"])
        invest = np.random.uniform(low=sectors[key]["min"], high=sectors[key]["max"], size=size)
        min_invest = min(invest)
        max_invest = max(invest)
        range_invest = max_invest - min_invest
        bankrupt_th = sectors[key]["prob_success"] * range_invest
        inverted_behavior = sectors[key]["inverted_behavior"]
        bankrupt = []
        for i in range(invest.shape[0]):
            inst_class = 1
            if invest[i] > bankrupt_th:
                inst_class = 0
            if inverted_behavior:
                inst_class = int(not inst_class)
        noise_ind = np.random.choice(range(size), int(size*0.05), replace=False)
        for ind in noise_ind:
            bankrupt[ind] = int(not bankrupt[ind])
        noise_ind = np.random.choice(range(size), int(size*0.1), replace=False)
        for ind in noise_ind:
            invest[ind] = np.nan

        country_col = [country_name for _ in range(size)]
        sector_col = [key for _ in range(size)]
        df_sector = pd.DataFrame({

        if df is None:
            df = df_sector
            df = pd.concat([df, df_sector], axis=0)
    return df

def create_df_multiple_distributions(samples: list):
    sectors_c1 = {
        "s1": {"prob_occur":0.5, "prob_success":0.99, "inverted_behavior":False, "min":2e6, "max":1e7},
        "s2": {"prob_occur":0.1, "prob_success":0.2, "inverted_behavior":False, "min":1e7, "max":1.5e9},
        "s3": {"prob_occur":0.1, "prob_success":0.9, "inverted_behavior":True, "min":1e9, "max":1e10},
        "s4": {"prob_occur":0.3, "prob_success":0.7, "inverted_behavior":False, "min":4e10, "max":9e13},
    sectors_c2 = {
        "s1": {"prob_occur":0.1, "prob_success":0.6, "inverted_behavior":True, "min":1e3, "max":5e3},
        "s2": {"prob_occur":0.3, "prob_success":0.9, "inverted_behavior":False, "min":1e5, "max":1.5e6},
        "s3": {"prob_occur":0.5, "prob_success":0.3, "inverted_behavior":False, "min":5e4, "max":3e5},
        "s4": {"prob_occur":0.1, "prob_success":0.8, "inverted_behavior":False, "min":1e6, "max":1e7},
    sectors_c3 = {
        "s1": {"prob_occur":0.3, "prob_success":0.9, "inverted_behavior":False, "min":3e2, "max":6e2},
        "s2": {"prob_occur":0.6, "prob_success":0.7, "inverted_behavior":False, "min":5e3, "max":9e3},
        "s3": {"prob_occur":0.07, "prob_success":0.6, "inverted_behavior":False, "min":4e3, "max":2e4},
        "s4": {"prob_occur":0.03, "prob_success":0.1, "inverted_behavior":True, "min":6e5, "max":1.3e6},
    countries = {
        "A":{"sectors":sectors_c1, "sample_rate":0.85},
        "B":{"sectors":sectors_c2, "sample_rate":0.05},
        "C":{"sectors":sectors_c2, "sample_rate":0.1}
    df = None
    for key in countries.keys():
        n_sample = int(samples * countries[key]["sample_rate"])
        df_c = _create_country_df(n_sample, countries[key]["sectors"], key)
        if df is None:
            df = df_c
            df = pd.concat([df, df_c], axis=0)

    idx = pd.Index([i for i in range(df.shape[0])])
    df = df.set_index(idx)
    return df

Note: this dataset details if a company has gone bankrupt (class 1) or hasn’t (class 0):

df = create_df_multiple_distributions(10000)
investment sector country bankrupt
0 7.405851e+06 s1 A 1
1 2.357697e+06 s1 A 1
2 4.746429e+06 s1 A 1
3 7.152158e+06 s1 A 1
4 NaN s1 A 1
... ... ... ... ...
9995 4.226512e+06 s4 C 1
9996 3.566758e+06 s4 C 0
9997 9.281006e+06 s4 C 0
9998 5.770378e+06 s4 C 1
9999 3.661511e+06 s4 C 1

10000 rows × 4 columns

Split data into train and test sets:

X_train, X_test, y_train, y_test = split_data(df, label="bankrupt", test_size=0.3, random_state=SEED)
def get_model():
    #model = DecisionTreeClassifier(max_features="sqrt")
    model = LGBMClassifier(random_state=SEED)
    return model

Baseline Case

In order to demonstrate the additional benefits of the decoupled classifiers, we start with the CohortManager class as the baseline.

Now, let’s look at the metrics and performance of the “sector” and “country” cohorts:

# BASELINE: "sector"
cht_manager = CohortManager(
), y_train)
pred_cht = cht_manager.predict_proba(X_test)

pred_train = cht_manager.predict_proba(X_train)
metrics_train, th_dict = fetch_cohort_results(X_train, y_train, pred_train, cohort_col=["sector"], return_th_dict=True)
fetch_cohort_results(X_test, y_test, pred_cht, cohort_col=["sector"], fixed_th=th_dict)
cohort cht_query roc precision recall f1 accuracy threshold num_pos %_pos cht_size
0 all all 0.947905 0.929772 0.921173 0.924828 0.927667 0.535672 1829 0.609667 3000
1 cohort_0 (`sector` == "s1") 0.933652 0.863698 0.896230 0.876748 0.893650 0.783542 863 0.660291 1307
2 cohort_1 (`sector` == "s2") 0.942994 0.931929 0.916137 0.921523 0.924084 0.381770 147 0.384817 382
3 cohort_2 (`sector` == "s3") 0.860066 0.716087 0.787660 0.738242 0.817073 0.156317 133 0.270325 492
4 cohort_3 (`sector` == "s4") 0.925276 0.859874 0.885490 0.870424 0.886447 0.724742 536 0.654457 819
# BASELINE: "country"
cht_manager = CohortManager(
), y_train)
pred_cht = cht_manager.predict_proba(X_test)

pred_train = cht_manager.predict_proba(X_train)
metrics_train, th_dict = fetch_cohort_results(X_train, y_train, pred_train, cohort_col=["country"], return_th_dict=True)
fetch_cohort_results(X_test, y_test, pred_cht, cohort_col=["country"], fixed_th=th_dict)

cohort cht_query roc precision recall f1 accuracy threshold num_pos %_pos cht_size
0 all all 0.944930 0.924198 0.917583 0.920483 0.923333 0.618757 1814 0.604667 3000
1 cohort_0 (`country` == "A") 0.945985 0.927582 0.917092 0.921732 0.926562 0.664861 1628 0.635938 2560
2 cohort_1 (`country` == "B") 0.935107 0.869528 0.860549 0.864621 0.875817 0.500140 53 0.346405 153
3 cohort_2 (`country` == "C") 0.934542 0.925572 0.928491 0.926472 0.926829 0.340654 137 0.477352 287

The CohortManager class in this case creates and trains a cohort for each unique value of these columns regardless of label distribution and size of cohort.

DecoupledClass Techniques

Instead, what if we were to use the DecoupledClass to look at the same columns using the same pre-processing pipeline and estimator.

Let’s start with the “sector” column:

preprocessing = [dp.BasicImputer(verbose=False), dp.DataMinMaxScaler(verbose=False), dp.EncoderOHE(verbose=False)]

dec_class = DecoupledClass(
), y_train)

        Size: 3093
                (`sector` == "s1")
        Value Counts:
                1: 2169 (70.13%)
                0: 924 (29.87%)
        Invalid: False

        Size: 918
                (`sector` == "s2")
        Value Counts:
                0: 519 (56.54%)
                1: 399 (43.46%)
        Invalid: False

        Size: 1108
                (`sector` == "s3")
        Value Counts:
                0: 889 (80.23%)
                1: 219 (19.77%)
        Invalid: False

        Size: 1881
                (`sector` == "s4")
        Value Counts:
                1: 1307 (69.48%)
                0: 574 (30.52%)
        Invalid: False

th_dict = dec_class.get_threasholds_dict()
pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=th_dict)
cohort cht_query roc precision recall f1 accuracy threshold num_pos %_pos cht_size
0 all all 0.947905 0.929772 0.921173 0.924828 0.927667 0.500000 1829 0.609667 3000
1 cohort_0 (`sector` == "s1") 0.933652 0.937394 0.886916 0.907627 0.928080 0.630641 994 0.760520 1307
2 cohort_1 (`sector` == "s2") 0.942994 0.931929 0.916137 0.921523 0.924084 0.381770 147 0.384817 382
3 cohort_2 (`sector` == "s3") 0.860066 0.898785 0.851521 0.872572 0.928862 0.310128 76 0.154472 492
4 cohort_3 (`sector` == "s4") 0.925276 0.941381 0.885875 0.907826 0.926740 0.474869 619 0.755800 819

We can see that the 4 cohorts (one for each unique value of the column) are no different than the cohorts created by the Cohort module, the reason for that is that all 4 cohorts were not invalid. Aninvalid cohortis defined as a cohort that has a size < ``max(min_cohort_size, df.shape[0] * min_cohort_pct)`` or is with a minority class (the label value with least occurrences) with an occurrence rate < ``minority_min_rate``.

In the case of invalid cohorts, the DecoupledClass fit method uses a few techniques to create valid cohorts from these invalid ones. So how do we use the DecoupledClass to handle invalid cohorts? For the remainder of this case study, we’ll explore the cohorts of the “country” column to demonstrate the latter.

Merging Invalid Cohorts

First, let’s look at merging invalid cohorts. This technique creates valid cohorts from invalid ones by choosing the smallest cohort different from the invalid cohort and merging the two.

preprocessing = [dp.BasicImputer(verbose=False), dp.DataMinMaxScaler(verbose=False), dp.EncoderOHE(verbose=False)]

dec_class = DecoupledClass(
), y_train)


        Size: 5940
                (`country` == "A")
        Value Counts:
                1: 3612 (60.81%)
                0: 2328 (39.19%)
        Invalid: False

        Size: 1060
                ((`country` == "B")) or ((`country` == "C"))
        Value Counts:
                0: 578 (54.53%)
                1: 482 (45.47%)
        Invalid: False

Using the “country” column above has created 2 cohorts. Initially, we might have expected 3 for each unique value (“A”,”B”,”C”), however, the DecoupledClass found invalid cohorts and performed a merge of the (country=="B") and (country=="C") cohorts to create a single valid one. Now let’s see how does this setup perform over the test set.

th_dict = dec_class.get_threasholds_dict()
pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=th_dict)

cohort cht_query roc precision recall f1 accuracy threshold num_pos %_pos cht_size
0 all all 0.946290 0.927526 0.919864 0.923168 0.926000 0.500000 1822 0.607333 3000
1 cohort_0 (`country` == "A") 0.945985 0.930449 0.916938 0.922738 0.927734 0.525770 1643 0.641797 2560
2 cohort_1 ((`country` == "B")) or ((`country` == "C")) 0.941379 0.919319 0.917429 0.918328 0.920455 0.499605 183 0.415909 440

Comparing the metrics of these merged cohorts to the baseline, it seems in this case that merging the invalid cohort (country=="B") with the cohort (country=="C") has a positive effect on (country=="B"), slightly improving the label distribution and therefore its accuracy.

Transfer Learning

Next, let’s explore the other technique that distinguishes the decoupled classifiers, transfer learning.

In this approach, when calling the fit() method with an invalid cohort, we use data from other cohorts (out-data), while weighing down these instances to create a valid cohort.

In order to use transfer learning with the DecoupledClass module, we simply need to pass a theta value. theta can be a fixed float, a list of floats (the best value in the list is found using cross-validation), or a boolean True to use a default list of floats optimized using cross-validation. If you’d like to learn more about how we select out-data and how to use different types of theta with transfer learning, see the tutorial notebook for the decoupled classifiers.

Let’s take a look at how transfer learning handles the invalid cohorts in our case:

preprocessing = [dp.BasicImputer(verbose=False), dp.DataMinMaxScaler(verbose=False), dp.EncoderOHE(verbose=False)]

dec_class = DecoupledClass(
), y_train)


        Size: 5940
                (`country` == "A")
        Value Counts:
                1: 3612 (60.81%)
                0: 2328 (39.19%)
        Invalid: False

        Size: 347
                (`country` == "B")
        Value Counts:
                0: 190 (54.76%)
                1: 157 (45.24%)
        Invalid: True
                Cohorts used as outside data: ['cohort_0', 'cohort_2']
                Theta = 0.4

        Size: 713
                (`country` == "C")
        Value Counts:
                0: 388 (54.42%)
                1: 325 (45.58%)
        Invalid: True
                Cohorts used as outside data: ['cohort_0', 'cohort_1']
                Theta = 0.6

th_dict = dec_class.get_threasholds_dict()
pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=th_dict)
cohort cht_query roc precision recall f1 accuracy threshold num_pos %_pos cht_size
0 all all 0.947784 0.930492 0.921226 0.925124 0.928000 0.500000 1834 0.611333 3000
1 cohort_0 (`country` == "A") 0.945985 0.930449 0.916938 0.922738 0.927734 0.525770 1643 0.641797 2560
2 cohort_1 (`country` == "B") 0.953976 0.911828 0.923049 0.916697 0.921569 0.475236 60 0.392157 153
3 cohort_2 (`country` == "C") 0.945566 0.922287 0.923322 0.922759 0.923345 0.594635 132 0.459930 287

Comparing these results to the metrics of the baseline valid/invalid cohorts, it seems that transfer learning might have made a positive but less noticeable difference for cohort (country=="B") in this case.

Optimizing Fairness Metrics

Lastly, DecoupledClass offers the option to optimize all models according to a fairness metric. We have the option for a fairness metric between “balanced”, “num_parity” and “dem_parity”, for a more detailed look on these different metrics and how to use them, see the tutorial notebook for the decoupled classfiers.

In this case, we’ll explore this feature over our cohorts using the “dem_parity” metric.

preprocessing = [dp.BasicImputer(verbose=False), dp.DataMinMaxScaler(verbose=False), dp.EncoderOHE(verbose=False)]

dec_class = DecoupledClass(
), y_train)

pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=True)

cohort cht_query roc precision recall f1 accuracy threshold num_pos %_pos cht_size
0 all all 0.946290 0.927526 0.919864 0.923168 0.926000 0.500000 1822 0.607333 3000
1 cohort_0 (`country` == "A") 0.945985 0.886854 0.902609 0.891498 0.894531 0.829787 1420 0.554688 2560
2 cohort_1 ((`country` == "B")) or ((`country` == "C")) 0.941379 0.845511 0.852320 0.838335 0.838636 0.151803 235 0.534091 440

Using fairness optimization in addition to merging cohorts (country=="B") and (country=="C") shows an even better distribution of the labels. Although, one should pay attention to the trade-off between accuracy and label distribution when borrowing training data from other cohorts as the merging capability of this tool does.