Decoupled Classifiers Case Study 3

This notebook will follow a similar approach to what was done in the notebook Decoupled Classifiers Case Study 2.

[16]:
import sys
sys.path.append('../../../../notebooks')

import pandas as pd
import numpy as np
import random

from sklearn.tree import DecisionTreeClassifier
from lightgbm import LGBMClassifier

from raimitigations.utils import split_data
import raimitigations.dataprocessing as dp
from raimitigations.cohort import DecoupledClass, CohortDefinition, CohortManager, fetch_cohort_results, plot_value_counts_cohort
from sklearn.pipeline import Pipeline
from download import download_datasets

SEED = 100

Load and split the data into train and test sets:

[17]:
data_dir = '../../../../datasets/'
download_datasets(data_dir)
df = pd.read_csv(data_dir + 'hr_promotion/train.csv')
df.drop(columns=['employee_id'], inplace=True)
label_col = 'is_promoted'

X_train, X_test, y_train, y_test = split_data(df, label_col, test_size=0.3, random_state=SEED)

df

[17]:
department region education gender recruitment_channel no_of_trainings age previous_year_rating length_of_service KPIs_met >80% awards_won? avg_training_score is_promoted
0 Sales & Marketing region_7 Master's & above f sourcing 1 35 5.0 8 1 0 49 0
1 Operations region_22 Bachelor's m other 1 30 5.0 4 0 0 60 0
2 Sales & Marketing region_19 Bachelor's m sourcing 1 34 3.0 7 0 0 50 0
3 Sales & Marketing region_23 Bachelor's m other 2 39 1.0 10 0 0 50 0
4 Technology region_26 Bachelor's m other 1 45 3.0 2 0 0 73 0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
54803 Technology region_14 Bachelor's m sourcing 1 48 3.0 17 0 0 78 0
54804 Operations region_27 Master's & above f other 1 37 2.0 6 0 0 56 0
54805 Analytics region_1 Bachelor's m other 1 27 5.0 3 1 0 79 0
54806 Sales & Marketing region_9 NaN m sourcing 1 29 1.0 2 0 0 45 0
54807 HR region_22 Bachelor's m other 1 27 1.0 5 0 0 49 0

54808 rows × 13 columns

[18]:
def get_model():
    #model = DecisionTreeClassifier(max_features="sqrt")
    model = LGBMClassifier(random_state=SEED)
    return model

Baseline Pipeline

This dataset is the same one used in the Cohort Case Study 3 notebook, and as depicted there, the department column is the feature where we can identify some disparity between cohorts. Here, we’ll once again consider this feature as a sensitive feature.

As our first experiment, let’s create a simple pipeline for our baseline:

[19]:
pipe = Pipeline([
    ("imputer", dp.BasicImputer(verbose=False)),
    ("scaler", dp.DataStandardScaler(verbose=False)),
    ("encoder", dp.EncoderOHE(verbose=False)),
    ("estimator", get_model())
])

pipe.fit(X_train, y_train)
pred = pipe.predict_proba(X_test)

pred_train = pipe.predict_proba(X_train)
_, th_dict = fetch_cohort_results(X_train, y_train, pred_train, cohort_col=["department"], return_th_dict=True)
fetch_cohort_results(X_test, y_test, pred, cohort_col=["department"], fixed_th=th_dict)
[19]:
cohort cht_query roc precision recall f1 accuracy threshold num_pos %_pos cht_size
0 all all 0.910026 0.633798 0.808795 0.658027 0.816092 0.122429 3864 0.234994 16443
1 cohort_0 (`department` == "Analytics") 0.815071 0.615072 0.737173 0.633686 0.803662 0.144258 357 0.225379 1584
2 cohort_1 (`department` == "Finance") 0.928899 0.642252 0.818997 0.660270 0.791501 0.099002 210 0.278884 753
3 cohort_2 (`department` == "HR") 0.924777 0.602336 0.845885 0.610867 0.794034 0.096400 179 0.254261 704
4 cohort_3 (`department` == "Legal") 0.909836 0.579581 0.763503 0.581716 0.787037 0.107423 78 0.240741 324
5 cohort_4 (`department` == "Operations") 0.899921 0.621445 0.797078 0.632185 0.777155 0.107618 953 0.281287 3388
6 cohort_5 (`department` == "Procurement") 0.914249 0.636048 0.823387 0.654546 0.796365 0.137644 576 0.268406 2146
7 cohort_6 (`department` == "R&D") 0.721571 0.551094 0.633618 0.546714 0.768166 0.113184 66 0.228374 289
8 cohort_7 (`department` == "Sales & Marketing") 0.945638 0.658135 0.868133 0.698209 0.860826 0.129820 999 0.193642 5159
9 cohort_8 (`department` == "Technology") 0.879545 0.629866 0.778378 0.644261 0.780057 0.132835 582 0.277672 2096

Using the CohortManager

Let’s now see if we can improve our results over each cohort by building a separate pipeline for each cohort using the CohortManager class, similar to what we did in the Cohort Case Study 3 notebook:

[20]:
cht_manager = CohortManager(
    transform_pipe=[
        dp.BasicImputer(verbose=False),
        dp.DataStandardScaler(verbose=False),
        dp.EncoderOHE(drop=False, unknown_err=False, verbose=False),
        get_model()
    ],
    cohort_col=["department"]
)
cht_manager.fit(X_train, y_train)
pred_cht = cht_manager.predict_proba(X_test)

pred_train = cht_manager.predict_proba(X_train)
metrics_train, th_dict = fetch_cohort_results(X_train, y_train, pred_train, cohort_col=["department"], return_th_dict=True)
fetch_cohort_results(X_test, y_test, pred_cht, cohort_col=["department"], fixed_th=th_dict)
[20]:
cohort cht_query roc precision recall f1 accuracy threshold num_pos %_pos cht_size
0 all all 0.897907 0.666922 0.740960 0.693698 0.882807 0.178975 2123 0.129113 16443
1 cohort_0 (`department` == "Analytics") 0.801956 0.610089 0.668797 0.628000 0.839015 0.169265 243 0.153409 1584
2 cohort_1 (`department` == "Finance") 0.915831 0.853684 0.744484 0.786259 0.934927 0.500199 50 0.066401 753
3 cohort_2 (`department` == "HR") 0.894188 0.788518 0.682923 0.721861 0.948864 0.420675 26 0.036932 704
4 cohort_3 (`department` == "Legal") 0.911130 0.894654 0.629940 0.687961 0.953704 0.807405 6 0.018519 324
5 cohort_4 (`department` == "Operations") 0.894509 0.666407 0.751623 0.695174 0.872491 0.162195 496 0.146399 3388
6 cohort_5 (`department` == "Procurement") 0.898282 0.674369 0.738740 0.698804 0.881640 0.223448 279 0.130009 2146
7 cohort_6 (`department` == "R&D") 0.588664 0.463415 0.496269 0.479279 0.920415 0.742253 2 0.006920 289
8 cohort_7 (`department` == "Sales & Marketing") 0.938696 0.670041 0.787935 0.707151 0.893196 0.165917 674 0.130645 5159
9 cohort_8 (`department` == "Technology") 0.877133 0.662127 0.727013 0.685259 0.857824 0.201117 325 0.155057 2096

We can see that by just training one model for each cohort based on the department column we can get an improvement over certain metrics.

DecoupledClassifier - Transfer Learning

Let’s see if we can improve the “department” cohorts using the DecoupledClass. We’ll use transfer learning, but for now we won’t use any fairness optimization.

[9]:
preprocessing = [
    dp.BasicImputer(verbose=False),
    dp.DataMinMaxScaler(verbose=False),
    dp.EncoderOHE(drop=False, unknown_err=False, verbose=False)
]

dec_class = DecoupledClass(
    cohort_col=["department"],
    transform_pipe=preprocessing,
    estimator=get_model(),
    minority_min_rate=0.03,
    min_cohort_pct=0.05,
    theta=[0.2, 0.5, 0.8],
)
dec_class.fit(X_train, y_train)

pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=True)
[9]:
cohort cht_query roc precision recall f1 accuracy threshold num_pos %_pos cht_size
0 all all 0.901364 0.885156 0.675796 0.733884 0.939001 0.500000 607 0.036915 16443
1 cohort_0 (`department` == "Analytics") 0.801923 0.648600 0.611458 0.626030 0.888889 0.285810 108 0.068182 1584
2 cohort_1 (`department` == "Finance") 0.928840 0.883305 0.760029 0.806662 0.941567 0.237477 49 0.065073 753
3 cohort_2 (`department` == "HR") 0.910229 0.803801 0.649475 0.696304 0.948864 0.237990 20 0.028409 704
4 cohort_3 (`department` == "Legal") 0.917688 0.733660 0.722088 0.727695 0.941358 0.227371 18 0.055556 324
5 cohort_4 (`department` == "Operations") 0.894497 0.756851 0.721753 0.737574 0.919126 0.270537 262 0.077332 3388
6 cohort_5 (`department` == "Procurement") 0.898306 0.752563 0.721805 0.735838 0.917987 0.342704 169 0.078751 2146
7 cohort_6 (`department` == "R&D") 0.699893 0.545614 0.541578 0.543401 0.882353 0.226767 19 0.065744 289
8 cohort_7 (`department` == "Sales & Marketing") 0.938689 0.777230 0.739770 0.756811 0.939135 0.272814 319 0.061834 5159
9 cohort_8 (`department` == "Technology") 0.877133 0.683276 0.678080 0.680620 0.882156 0.299429 212 0.101145 2096

DecoupledClassifier - Transfer Learning + Fairness Optimization

For our final experiment, we try adding fairness optimization in our DecoupledClass object. This time, we’ll optimize the Numerical Parity, which aims to make the number of positive labels for each cohort as equal as possible. We’ll set \(\lambda\) to a small value to give more weight to the fairness metric. This way, we expect the estimator to sacrifice a considerable amount of performance metric (precision, recall, F1, and accuracy) over fairness (equal number of positive labels).

[15]:
preprocessing = [
    dp.BasicImputer(verbose=False),
    dp.DataMinMaxScaler(verbose=False),
    dp.EncoderOHE(drop=False, unknown_err=False, verbose=False)
]

dec_class = DecoupledClass(
    cohort_col=["department"],
    transform_pipe=preprocessing,
    estimator=get_model(),
    minority_min_rate=0.03,
    min_cohort_pct=0.05,
    theta=[0.2, 0.5, 0.8],
    fairness_loss="num_parity",
    lambda_coef=0.2,
    max_joint_loss_time=600,
)
dec_class.fit(X_train, y_train)

pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=True)
[15]:
cohort cht_query roc precision recall f1 accuracy threshold num_pos %_pos cht_size
0 all all 0.901364 0.885156 0.675796 0.733884 0.939001 0.500000 607 0.036915 16443
1 cohort_0 (`department` == "Analytics") 0.801923 0.659641 0.613547 0.630870 0.892677 0.295972 102 0.064394 1584
2 cohort_1 (`department` == "Finance") 0.928840 0.665214 0.838909 0.695979 0.827357 0.087748 183 0.243028 753
3 cohort_2 (`department` == "HR") 0.910229 0.591032 0.830024 0.585841 0.764205 0.064236 200 0.284091 704
4 cohort_3 (`department` == "Legal") 0.917688 0.542793 0.667213 0.329452 0.373457 0.001128 222 0.685185 324
5 cohort_4 (`department` == "Operations") 0.894497 0.881685 0.701623 0.757745 0.938312 0.436915 155 0.045750 3388
6 cohort_5 (`department` == "Procurement") 0.898306 0.807227 0.719237 0.753797 0.930103 0.402534 135 0.062908 2146
7 cohort_6 (`department` == "R&D") 0.699893 0.549763 0.645522 0.315951 0.342561 0.017962 211 0.730104 289
8 cohort_7 (`department` == "Sales & Marketing") 0.938689 0.956582 0.707043 0.778220 0.956387 0.611425 162 0.031401 5159
9 cohort_8 (`department` == "Technology") 0.877133 0.735181 0.650417 0.679839 0.901240 0.420690 134 0.063931 2096

As we can see in the previous results, there was a huge drop in the performance metric of certain cohorts in order to allow the estimator to be as fair as possible according to the Numerical Parity metric. This is an approach that must be used with care, since we should always consider the nature of the dataset, and how important is the performance and fairness. We could also try different values of \(\lambda\) and try to find a better balance between performance and numerical parity.