Decoupled Classifiers Case Study 1

For the first case study, we’ll highlight the benefits of using the decoupled classifiers over different cohorts of the data. This module implements techniques for searching and combining cohorts to optimize for different definitions of group fairness based on the approach presented in the paper Decoupled classifiers for group-fair and efficient machine learning.

The techniques implemented in this module work with the Cohort module of this library to fit an estimator over each cohort while leveraging transfer learning and other optimization techniques for minority cohorts when the data for such cohorts is not sufficient.

[1]:

import pandas as pd
import numpy as np
import random

from sklearn.tree import DecisionTreeClassifier
from lightgbm import LGBMClassifier

from raimitigations.utils import split_data
import raimitigations.dataprocessing as dp
from raimitigations.cohort import DecoupledClass, CohortDefinition, CohortManager, fetch_cohort_results
from sklearn.pipeline import Pipeline
SEED = 100

Throughout this case study, we will recreate and use a synthetic dataset created as part of Cohort case study 1 to showcase the additional techniques this module can use to optimize fairness and performance over cohorts.

[2]:

def _create_country_df(samples: int, sectors: dict, country_name: str):
    df = None
    for key in sectors.keys():
        size = int(samples * sectors[key]["prob_occur"])
        invest = np.random.uniform(low=sectors[key]["min"], high=sectors[key]["max"], size=size)
        min_invest = min(invest)
        max_invest = max(invest)
        range_invest = max_invest - min_invest
        bankrupt_th = sectors[key]["prob_success"] * range_invest
        inverted_behavior = sectors[key]["inverted_behavior"]
        bankrupt = []
        for i in range(invest.shape[0]):
            inst_class = 1
            if invest[i] > bankrupt_th:
                inst_class = 0
            if inverted_behavior:
                inst_class = int(not inst_class)
            bankrupt.append(inst_class)
        noise_ind = np.random.choice(range(size), int(size*0.05), replace=False)
        for ind in noise_ind:
            bankrupt[ind] = int(not bankrupt[ind])
        noise_ind = np.random.choice(range(size), int(size*0.1), replace=False)
        for ind in noise_ind:
            invest[ind] = np.nan

        country_col = [country_name for _ in range(size)]
        sector_col = [key for _ in range(size)]
        df_sector = pd.DataFrame({
            "investment":invest,
            "sector":sector_col,
            "country":country_col,
            "bankrupt":bankrupt
        })

        if df is None:
            df = df_sector
        else:
            df = pd.concat([df, df_sector], axis=0)
    return df

def create_df_multiple_distributions(samples: list):
    sectors_c1 = {
        "s1": {"prob_occur":0.5, "prob_success":0.99, "inverted_behavior":False, "min":2e6, "max":1e7},
        "s2": {"prob_occur":0.1, "prob_success":0.2, "inverted_behavior":False, "min":1e7, "max":1.5e9},
        "s3": {"prob_occur":0.1, "prob_success":0.9, "inverted_behavior":True, "min":1e9, "max":1e10},
        "s4": {"prob_occur":0.3, "prob_success":0.7, "inverted_behavior":False, "min":4e10, "max":9e13},
    }
    sectors_c2 = {
        "s1": {"prob_occur":0.1, "prob_success":0.6, "inverted_behavior":True, "min":1e3, "max":5e3},
        "s2": {"prob_occur":0.3, "prob_success":0.9, "inverted_behavior":False, "min":1e5, "max":1.5e6},
        "s3": {"prob_occur":0.5, "prob_success":0.3, "inverted_behavior":False, "min":5e4, "max":3e5},
        "s4": {"prob_occur":0.1, "prob_success":0.8, "inverted_behavior":False, "min":1e6, "max":1e7},
    }
    sectors_c3 = {
        "s1": {"prob_occur":0.3, "prob_success":0.9, "inverted_behavior":False, "min":3e2, "max":6e2},
        "s2": {"prob_occur":0.6, "prob_success":0.7, "inverted_behavior":False, "min":5e3, "max":9e3},
        "s3": {"prob_occur":0.07, "prob_success":0.6, "inverted_behavior":False, "min":4e3, "max":2e4},
        "s4": {"prob_occur":0.03, "prob_success":0.1, "inverted_behavior":True, "min":6e5, "max":1.3e6},
    }
    countries = {
        "A":{"sectors":sectors_c1, "sample_rate":0.85},
        "B":{"sectors":sectors_c2, "sample_rate":0.05},
        "C":{"sectors":sectors_c2, "sample_rate":0.1}
    }
    df = None
    for key in countries.keys():
        n_sample = int(samples * countries[key]["sample_rate"])
        df_c = _create_country_df(n_sample, countries[key]["sectors"], key)
        if df is None:
            df = df_c
        else:
            df = pd.concat([df, df_c], axis=0)

    idx = pd.Index([i for i in range(df.shape[0])])
    df = df.set_index(idx)
    return df

Note: this dataset details if a company has gone bankrupt (class 1) or hasn’t (class 0):

[3]:

np.random.seed(51)
df = create_df_multiple_distributions(10000)
df

[3]:

	investment	sector	country	bankrupt
0	7.405851e+06	s1	A	1
1	2.357697e+06	s1	A	1
2	4.746429e+06	s1	A	1
3	7.152158e+06	s1	A	1
4	NaN	s1	A	1
...	...	...	...	...
9995	4.226512e+06	s4	C	1
9996	3.566758e+06	s4	C	0
9997	9.281006e+06	s4	C	0
9998	5.770378e+06	s4	C	1
9999	3.661511e+06	s4	C	1

10000 rows × 4 columns

Split data into train and test sets:

[3]:

X_train, X_test, y_train, y_test = split_data(df, label="bankrupt", test_size=0.3, random_state=SEED)

[4]:

def get_model():
    #model = DecisionTreeClassifier(max_features="sqrt")
    model = LGBMClassifier(random_state=SEED)
    return model

Baseline Case

In order to demonstrate the additional benefits of the decoupled classifiers, we start with the CohortManager class as the baseline.

Now, let’s look at the metrics and performance of the “sector” and “country” cohorts:

[5]:

# BASELINE: "sector"
cht_manager = CohortManager(
    transform_pipe=[
        dp.BasicImputer(verbose=False),
        dp.DataMinMaxScaler(verbose=False),
        dp.EncoderOHE(verbose=False),
        get_model()
    ],
    cohort_col=["sector"]
)
cht_manager.fit(X_train, y_train)
pred_cht = cht_manager.predict_proba(X_test)

pred_train = cht_manager.predict_proba(X_train)
metrics_train, th_dict = fetch_cohort_results(X_train, y_train, pred_train, cohort_col=["sector"], return_th_dict=True)
fetch_cohort_results(X_test, y_test, pred_cht, cohort_col=["sector"], fixed_th=th_dict)

[5]:

	cohort	cht_query	roc	precision	recall	f1	accuracy	threshold	num_pos	%_pos	cht_size
0	all	all	0.947905	0.929772	0.921173	0.924828	0.927667	0.535672	1829	0.609667	3000
1	cohort_0	(`sector` == "s1")	0.933652	0.863698	0.896230	0.876748	0.893650	0.783542	863	0.660291	1307
2	cohort_1	(`sector` == "s2")	0.942994	0.931929	0.916137	0.921523	0.924084	0.381770	147	0.384817	382
3	cohort_2	(`sector` == "s3")	0.860066	0.716087	0.787660	0.738242	0.817073	0.156317	133	0.270325	492
4	cohort_3	(`sector` == "s4")	0.925276	0.859874	0.885490	0.870424	0.886447	0.724742	536	0.654457	819

[6]:

# BASELINE: "country"
cht_manager = CohortManager(
    transform_pipe=[
        dp.BasicImputer(verbose=False),
        dp.DataMinMaxScaler(verbose=False),
        dp.EncoderOHE(verbose=False),
        get_model()
    ],
    cohort_col=["country"]
)
cht_manager.fit(X_train, y_train)
pred_cht = cht_manager.predict_proba(X_test)

pred_train = cht_manager.predict_proba(X_train)
metrics_train, th_dict = fetch_cohort_results(X_train, y_train, pred_train, cohort_col=["country"], return_th_dict=True)
fetch_cohort_results(X_test, y_test, pred_cht, cohort_col=["country"], fixed_th=th_dict)

[6]:

	cohort	cht_query	roc	precision	recall	f1	accuracy	threshold	num_pos	%_pos	cht_size
0	all	all	0.944930	0.924198	0.917583	0.920483	0.923333	0.618757	1814	0.604667	3000
1	cohort_0	(`country` == "A")	0.945985	0.927582	0.917092	0.921732	0.926562	0.664861	1628	0.635938	2560
2	cohort_1	(`country` == "B")	0.935107	0.869528	0.860549	0.864621	0.875817	0.500140	53	0.346405	153
3	cohort_2	(`country` == "C")	0.934542	0.925572	0.928491	0.926472	0.926829	0.340654	137	0.477352	287

The CohortManager class in this case creates and trains a cohort for each unique value of these columns regardless of label distribution and size of cohort.

DecoupledClass Techniques

Instead, what if we were to use the DecoupledClass to look at the same columns using the same pre-processing pipeline and estimator.

Let’s start with the “sector” column:

[7]:

preprocessing = [dp.BasicImputer(verbose=False), dp.DataMinMaxScaler(verbose=False), dp.EncoderOHE(verbose=False)]

dec_class = DecoupledClass(
    cohort_col=['sector'],
    transform_pipe=preprocessing,
    estimator=get_model()
)
dec_class.fit(X_train, y_train)

dec_class.print_cohorts()

FINAL COHORTS
cohort_0:
        Size: 3093
        Query:
                (`sector` == "s1")
        Value Counts:
                1: 2169 (70.13%)
                0: 924 (29.87%)
        Invalid: False


cohort_1:
        Size: 918
        Query:
                (`sector` == "s2")
        Value Counts:
                0: 519 (56.54%)
                1: 399 (43.46%)
        Invalid: False


cohort_2:
        Size: 1108
        Query:
                (`sector` == "s3")
        Value Counts:
                0: 889 (80.23%)
                1: 219 (19.77%)
        Invalid: False


cohort_3:
        Size: 1881
        Query:
                (`sector` == "s4")
        Value Counts:
                1: 1307 (69.48%)
                0: 574 (30.52%)
        Invalid: False

[8]:

th_dict = dec_class.get_threasholds_dict()
pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=th_dict)

[8]:

	cohort	cht_query	roc	precision	recall	f1	accuracy	threshold	num_pos	%_pos	cht_size
0	all	all	0.947905	0.929772	0.921173	0.924828	0.927667	0.500000	1829	0.609667	3000
1	cohort_0	(`sector` == "s1")	0.933652	0.937394	0.886916	0.907627	0.928080	0.630641	994	0.760520	1307
2	cohort_1	(`sector` == "s2")	0.942994	0.931929	0.916137	0.921523	0.924084	0.381770	147	0.384817	382
3	cohort_2	(`sector` == "s3")	0.860066	0.898785	0.851521	0.872572	0.928862	0.310128	76	0.154472	492
4	cohort_3	(`sector` == "s4")	0.925276	0.941381	0.885875	0.907826	0.926740	0.474869	619	0.755800	819

We can see that the 4 cohorts (one for each unique value of the column) are no different than the cohorts created by the Cohort module, the reason for that is that all 4 cohorts were not invalid. Aninvalid cohortis defined as a cohort that has a size < ``max(min_cohort_size, df.shape[0] * min_cohort_pct)`` or is with a minority class (the label value with least occurrences) with an occurrence rate < ``minority_min_rate``.

In the case of invalid cohorts, the DecoupledClass fit method uses a few techniques to create valid cohorts from these invalid ones. So how do we use the DecoupledClass to handle invalid cohorts? For the remainder of this case study, we’ll explore the cohorts of the “country” column to demonstrate the latter.

Merging Invalid Cohorts

First, let’s look at merging invalid cohorts. This technique creates valid cohorts from invalid ones by choosing the smallest cohort different from the invalid cohort and merging the two.

[9]:

preprocessing = [dp.BasicImputer(verbose=False), dp.DataMinMaxScaler(verbose=False), dp.EncoderOHE(verbose=False)]

dec_class = DecoupledClass(
    cohort_col=["country"],
    min_cohort_pct=0.15,
    minority_min_rate=0.15,
    transform_pipe=preprocessing,
    estimator=get_model()
)
dec_class.fit(X_train, y_train)

dec_class.print_cohorts()

FINAL COHORTS
cohort_0:
        Size: 5940
        Query:
                (`country` == "A")
        Value Counts:
                1: 3612 (60.81%)
                0: 2328 (39.19%)
        Invalid: False


cohort_1:
        Size: 1060
        Query:
                ((`country` == "B")) or ((`country` == "C"))
        Value Counts:
                0: 578 (54.53%)
                1: 482 (45.47%)
        Invalid: False

Using the “country” column above has created 2 cohorts. Initially, we might have expected 3 for each unique value (“A”,”B”,”C”), however, the DecoupledClass found invalid cohorts and performed a merge of the (country=="B") and (country=="C") cohorts to create a single valid one. Now let’s see how does this setup perform over the test set.

[11]:

th_dict = dec_class.get_threasholds_dict()
pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=th_dict)

[11]:

	cohort	cht_query	roc	precision	recall	f1	accuracy	threshold	num_pos	%_pos	cht_size
0	all	all	0.946290	0.927526	0.919864	0.923168	0.926000	0.500000	1822	0.607333	3000
1	cohort_0	(`country` == "A")	0.945985	0.930449	0.916938	0.922738	0.927734	0.525770	1643	0.641797	2560
2	cohort_1	((`country` == "B")) or ((`country` == "C"))	0.941379	0.919319	0.917429	0.918328	0.920455	0.499605	183	0.415909	440

Comparing the metrics of these merged cohorts to the baseline, it seems in this case that merging the invalid cohort (country=="B") with the cohort (country=="C") has a positive effect on (country=="B"), slightly improving the label distribution and therefore its accuracy.

Transfer Learning

Next, let’s explore the other technique that distinguishes the decoupled classifiers, transfer learning.

In this approach, when calling the fit() method with an invalid cohort, we use data from other cohorts (out-data), while weighing down these instances to create a valid cohort.

In order to use transfer learning with the DecoupledClass module, we simply need to pass a theta value. theta can be a fixed float, a list of floats (the best value in the list is found using cross-validation), or a boolean True to use a default list of floats optimized using cross-validation. If you’d like to learn more about how we select out-data and how to use different types of theta with transfer learning, see the tutorial notebook for the decoupled classifiers.

Let’s take a look at how transfer learning handles the invalid cohorts in our case:

[12]:

preprocessing = [dp.BasicImputer(verbose=False), dp.DataMinMaxScaler(verbose=False), dp.EncoderOHE(verbose=False)]

dec_class = DecoupledClass(
    cohort_col=["country"],
    theta=True,
    min_fold_size_theta=5,
    min_cohort_pct=0.2,
    minority_min_rate=0.15,
    transform_pipe=preprocessing,
    estimator=get_model()
)

dec_class.fit(X_train, y_train)

dec_class.print_cohorts()

FINAL COHORTS
cohort_0:
        Size: 5940
        Query:
                (`country` == "A")
        Value Counts:
                1: 3612 (60.81%)
                0: 2328 (39.19%)
        Invalid: False


cohort_1:
        Size: 347
        Query:
                (`country` == "B")
        Value Counts:
                0: 190 (54.76%)
                1: 157 (45.24%)
        Invalid: True
                Cohorts used as outside data: ['cohort_0', 'cohort_2']
                Theta = 0.4


cohort_2:
        Size: 713
        Query:
                (`country` == "C")
        Value Counts:
                0: 388 (54.42%)
                1: 325 (45.58%)
        Invalid: True
                Cohorts used as outside data: ['cohort_0', 'cohort_1']
                Theta = 0.6

[13]:

th_dict = dec_class.get_threasholds_dict()
pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=th_dict)

[13]:

	cohort	cht_query	roc	precision	recall	f1	accuracy	threshold	num_pos	%_pos	cht_size
0	all	all	0.947784	0.930492	0.921226	0.925124	0.928000	0.500000	1834	0.611333	3000
1	cohort_0	(`country` == "A")	0.945985	0.930449	0.916938	0.922738	0.927734	0.525770	1643	0.641797	2560
2	cohort_1	(`country` == "B")	0.953976	0.911828	0.923049	0.916697	0.921569	0.475236	60	0.392157	153
3	cohort_2	(`country` == "C")	0.945566	0.922287	0.923322	0.922759	0.923345	0.594635	132	0.459930	287

Comparing these results to the metrics of the baseline valid/invalid cohorts, it seems that transfer learning might have made a positive but less noticeable difference for cohort (country=="B") in this case.

Optimizing Fairness Metrics

Lastly, DecoupledClass offers the option to optimize all models according to a fairness metric. We have the option for a fairness metric between “balanced”, “num_parity” and “dem_parity”, for a more detailed look on these different metrics and how to use them, see the tutorial notebook for the decoupled classfiers.

In this case, we’ll explore this feature over our cohorts using the “dem_parity” metric.

[14]:

preprocessing = [dp.BasicImputer(verbose=False), dp.DataMinMaxScaler(verbose=False), dp.EncoderOHE(verbose=False)]

dec_class = DecoupledClass(
    cohort_col=["country"],
    transform_pipe=preprocessing,
    estimator=get_model(),
    minority_min_rate=0.2,
    min_cohort_pct=0.15,
    theta=False,
    fairness_loss="dem_parity",
    lambda_coef=0.8,
    max_joint_loss_time=2000
)
dec_class.fit(X_train, y_train)

pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=True)

[14]:

	cohort	cht_query	roc	precision	recall	f1	accuracy	threshold	num_pos	%_pos	cht_size
0	all	all	0.946290	0.927526	0.919864	0.923168	0.926000	0.500000	1822	0.607333	3000
1	cohort_0	(`country` == "A")	0.945985	0.886854	0.902609	0.891498	0.894531	0.829787	1420	0.554688	2560
2	cohort_1	((`country` == "B")) or ((`country` == "C"))	0.941379	0.845511	0.852320	0.838335	0.838636	0.151803	235	0.534091	440

Using fairness optimization in addition to merging cohorts (country=="B") and (country=="C") shows an even better distribution of the labels. Although, one should pay attention to the trade-off between accuracy and label distribution when borrowing training data from other cohorts as the merging capability of this tool does.