Decoupled Classifiers Case Study 3

This notebook will follow a similar approach to what was done in the notebook Decoupled Classifiers Case Study 2.

[16]:

import sys
sys.path.append('../../../../notebooks')

import pandas as pd
import numpy as np
import random

from sklearn.tree import DecisionTreeClassifier
from lightgbm import LGBMClassifier

from raimitigations.utils import split_data
import raimitigations.dataprocessing as dp
from raimitigations.cohort import DecoupledClass, CohortDefinition, CohortManager, fetch_cohort_results, plot_value_counts_cohort
from sklearn.pipeline import Pipeline
from download import download_datasets

SEED = 100

Load and split the data into train and test sets:

[17]:

data_dir = '../../../../datasets/'
download_datasets(data_dir)
df = pd.read_csv(data_dir + 'hr_promotion/train.csv')
df.drop(columns=['employee_id'], inplace=True)
label_col = 'is_promoted'

X_train, X_test, y_train, y_test = split_data(df, label_col, test_size=0.3, random_state=SEED)

df

[17]:

	department	region	education	gender	recruitment_channel	no_of_trainings	age	previous_year_rating	length_of_service	KPIs_met >80%	awards_won?	avg_training_score	is_promoted
0	Sales & Marketing	region_7	Master's & above	f	sourcing	1	35	5.0	8	1	0	49	0
1	Operations	region_22	Bachelor's	m	other	1	30	5.0	4	0	0	60	0
2	Sales & Marketing	region_19	Bachelor's	m	sourcing	1	34	3.0	7	0	0	50	0
3	Sales & Marketing	region_23	Bachelor's	m	other	2	39	1.0	10	0	0	50	0
4	Technology	region_26	Bachelor's	m	other	1	45	3.0	2	0	0	73	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...
54803	Technology	region_14	Bachelor's	m	sourcing	1	48	3.0	17	0	0	78	0
54804	Operations	region_27	Master's & above	f	other	1	37	2.0	6	0	0	56	0
54805	Analytics	region_1	Bachelor's	m	other	1	27	5.0	3	1	0	79	0
54806	Sales & Marketing	region_9	NaN	m	sourcing	1	29	1.0	2	0	0	45	0
54807	HR	region_22	Bachelor's	m	other	1	27	1.0	5	0	0	49	0

54808 rows × 13 columns

[18]:

def get_model():
    #model = DecisionTreeClassifier(max_features="sqrt")
    model = LGBMClassifier(random_state=SEED)
    return model

Baseline Pipeline

This dataset is the same one used in the Cohort Case Study 3 notebook, and as depicted there, the department column is the feature where we can identify some disparity between cohorts. Here, we’ll once again consider this feature as a sensitive feature.

As our first experiment, let’s create a simple pipeline for our baseline:

[19]:

pipe = Pipeline([
    ("imputer", dp.BasicImputer(verbose=False)),
    ("scaler", dp.DataStandardScaler(verbose=False)),
    ("encoder", dp.EncoderOHE(verbose=False)),
    ("estimator", get_model())
])

pipe.fit(X_train, y_train)
pred = pipe.predict_proba(X_test)

pred_train = pipe.predict_proba(X_train)
_, th_dict = fetch_cohort_results(X_train, y_train, pred_train, cohort_col=["department"], return_th_dict=True)
fetch_cohort_results(X_test, y_test, pred, cohort_col=["department"], fixed_th=th_dict)

[19]:

	cohort	cht_query	roc	precision	recall	f1	accuracy	threshold	num_pos	%_pos	cht_size
0	all	all	0.910026	0.633798	0.808795	0.658027	0.816092	0.122429	3864	0.234994	16443
1	cohort_0	(`department` == "Analytics")	0.815071	0.615072	0.737173	0.633686	0.803662	0.144258	357	0.225379	1584
2	cohort_1	(`department` == "Finance")	0.928899	0.642252	0.818997	0.660270	0.791501	0.099002	210	0.278884	753
3	cohort_2	(`department` == "HR")	0.924777	0.602336	0.845885	0.610867	0.794034	0.096400	179	0.254261	704
4	cohort_3	(`department` == "Legal")	0.909836	0.579581	0.763503	0.581716	0.787037	0.107423	78	0.240741	324
5	cohort_4	(`department` == "Operations")	0.899921	0.621445	0.797078	0.632185	0.777155	0.107618	953	0.281287	3388
6	cohort_5	(`department` == "Procurement")	0.914249	0.636048	0.823387	0.654546	0.796365	0.137644	576	0.268406	2146
7	cohort_6	(`department` == "R&D")	0.721571	0.551094	0.633618	0.546714	0.768166	0.113184	66	0.228374	289
8	cohort_7	(`department` == "Sales & Marketing")	0.945638	0.658135	0.868133	0.698209	0.860826	0.129820	999	0.193642	5159
9	cohort_8	(`department` == "Technology")	0.879545	0.629866	0.778378	0.644261	0.780057	0.132835	582	0.277672	2096

Using the CohortManager

Let’s now see if we can improve our results over each cohort by building a separate pipeline for each cohort using the CohortManager class, similar to what we did in the Cohort Case Study 3 notebook:

[20]:

cht_manager = CohortManager(
    transform_pipe=[
        dp.BasicImputer(verbose=False),
        dp.DataStandardScaler(verbose=False),
        dp.EncoderOHE(drop=False, unknown_err=False, verbose=False),
        get_model()
    ],
    cohort_col=["department"]
)
cht_manager.fit(X_train, y_train)
pred_cht = cht_manager.predict_proba(X_test)

pred_train = cht_manager.predict_proba(X_train)
metrics_train, th_dict = fetch_cohort_results(X_train, y_train, pred_train, cohort_col=["department"], return_th_dict=True)
fetch_cohort_results(X_test, y_test, pred_cht, cohort_col=["department"], fixed_th=th_dict)

[20]:

	cohort	cht_query	roc	precision	recall	f1	accuracy	threshold	num_pos	%_pos	cht_size
0	all	all	0.897907	0.666922	0.740960	0.693698	0.882807	0.178975	2123	0.129113	16443
1	cohort_0	(`department` == "Analytics")	0.801956	0.610089	0.668797	0.628000	0.839015	0.169265	243	0.153409	1584
2	cohort_1	(`department` == "Finance")	0.915831	0.853684	0.744484	0.786259	0.934927	0.500199	50	0.066401	753
3	cohort_2	(`department` == "HR")	0.894188	0.788518	0.682923	0.721861	0.948864	0.420675	26	0.036932	704
4	cohort_3	(`department` == "Legal")	0.911130	0.894654	0.629940	0.687961	0.953704	0.807405	6	0.018519	324
5	cohort_4	(`department` == "Operations")	0.894509	0.666407	0.751623	0.695174	0.872491	0.162195	496	0.146399	3388
6	cohort_5	(`department` == "Procurement")	0.898282	0.674369	0.738740	0.698804	0.881640	0.223448	279	0.130009	2146
7	cohort_6	(`department` == "R&D")	0.588664	0.463415	0.496269	0.479279	0.920415	0.742253	2	0.006920	289
8	cohort_7	(`department` == "Sales & Marketing")	0.938696	0.670041	0.787935	0.707151	0.893196	0.165917	674	0.130645	5159
9	cohort_8	(`department` == "Technology")	0.877133	0.662127	0.727013	0.685259	0.857824	0.201117	325	0.155057	2096

We can see that by just training one model for each cohort based on the department column we can get an improvement over certain metrics.

DecoupledClassifier - Transfer Learning

Let’s see if we can improve the “department” cohorts using the DecoupledClass. We’ll use transfer learning, but for now we won’t use any fairness optimization.

[9]:

preprocessing = [
    dp.BasicImputer(verbose=False),
    dp.DataMinMaxScaler(verbose=False),
    dp.EncoderOHE(drop=False, unknown_err=False, verbose=False)
]

dec_class = DecoupledClass(
    cohort_col=["department"],
    transform_pipe=preprocessing,
    estimator=get_model(),
    minority_min_rate=0.03,
    min_cohort_pct=0.05,
    theta=[0.2, 0.5, 0.8],
)
dec_class.fit(X_train, y_train)

pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=True)

[9]:

	cohort	cht_query	roc	precision	recall	f1	accuracy	threshold	num_pos	%_pos	cht_size
0	all	all	0.901364	0.885156	0.675796	0.733884	0.939001	0.500000	607	0.036915	16443
1	cohort_0	(`department` == "Analytics")	0.801923	0.648600	0.611458	0.626030	0.888889	0.285810	108	0.068182	1584
2	cohort_1	(`department` == "Finance")	0.928840	0.883305	0.760029	0.806662	0.941567	0.237477	49	0.065073	753
3	cohort_2	(`department` == "HR")	0.910229	0.803801	0.649475	0.696304	0.948864	0.237990	20	0.028409	704
4	cohort_3	(`department` == "Legal")	0.917688	0.733660	0.722088	0.727695	0.941358	0.227371	18	0.055556	324
5	cohort_4	(`department` == "Operations")	0.894497	0.756851	0.721753	0.737574	0.919126	0.270537	262	0.077332	3388
6	cohort_5	(`department` == "Procurement")	0.898306	0.752563	0.721805	0.735838	0.917987	0.342704	169	0.078751	2146
7	cohort_6	(`department` == "R&D")	0.699893	0.545614	0.541578	0.543401	0.882353	0.226767	19	0.065744	289
8	cohort_7	(`department` == "Sales & Marketing")	0.938689	0.777230	0.739770	0.756811	0.939135	0.272814	319	0.061834	5159
9	cohort_8	(`department` == "Technology")	0.877133	0.683276	0.678080	0.680620	0.882156	0.299429	212	0.101145	2096

DecoupledClassifier - Transfer Learning + Fairness Optimization

For our final experiment, we try adding fairness optimization in our DecoupledClass object. This time, we’ll optimize the Numerical Parity, which aims to make the number of positive labels for each cohort as equal as possible. We’ll set \(\lambda\) to a small value to give more weight to the fairness metric. This way, we expect the estimator to sacrifice a considerable amount of performance metric (precision, recall, F1, and accuracy) over fairness (equal number of positive labels).

[15]:

preprocessing = [
    dp.BasicImputer(verbose=False),
    dp.DataMinMaxScaler(verbose=False),
    dp.EncoderOHE(drop=False, unknown_err=False, verbose=False)
]

dec_class = DecoupledClass(
    cohort_col=["department"],
    transform_pipe=preprocessing,
    estimator=get_model(),
    minority_min_rate=0.03,
    min_cohort_pct=0.05,
    theta=[0.2, 0.5, 0.8],
    fairness_loss="num_parity",
    lambda_coef=0.2,
    max_joint_loss_time=600,
)
dec_class.fit(X_train, y_train)

pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=True)

[15]:

	cohort	cht_query	roc	precision	recall	f1	accuracy	threshold	num_pos	%_pos	cht_size
0	all	all	0.901364	0.885156	0.675796	0.733884	0.939001	0.500000	607	0.036915	16443
1	cohort_0	(`department` == "Analytics")	0.801923	0.659641	0.613547	0.630870	0.892677	0.295972	102	0.064394	1584
2	cohort_1	(`department` == "Finance")	0.928840	0.665214	0.838909	0.695979	0.827357	0.087748	183	0.243028	753
3	cohort_2	(`department` == "HR")	0.910229	0.591032	0.830024	0.585841	0.764205	0.064236	200	0.284091	704
4	cohort_3	(`department` == "Legal")	0.917688	0.542793	0.667213	0.329452	0.373457	0.001128	222	0.685185	324
5	cohort_4	(`department` == "Operations")	0.894497	0.881685	0.701623	0.757745	0.938312	0.436915	155	0.045750	3388
6	cohort_5	(`department` == "Procurement")	0.898306	0.807227	0.719237	0.753797	0.930103	0.402534	135	0.062908	2146
7	cohort_6	(`department` == "R&D")	0.699893	0.549763	0.645522	0.315951	0.342561	0.017962	211	0.730104	289
8	cohort_7	(`department` == "Sales & Marketing")	0.938689	0.956582	0.707043	0.778220	0.956387	0.611425	162	0.031401	5159
9	cohort_8	(`department` == "Technology")	0.877133	0.735181	0.650417	0.679839	0.901240	0.420690	134	0.063931	2096

As we can see in the previous results, there was a huge drop in the performance metric of certain cohorts in order to allow the estimator to be as fair as possible according to the Numerical Parity metric. This is an approach that must be used with care, since we should always consider the nature of the dataset, and how important is the performance and fairness. We could also try different values of \(\lambda\) and try to find a better balance between performance and numerical parity.