Decoupled Classifiers Case Study 3
This notebook will follow a similar approach to what was done in the notebook Decoupled Classifiers Case Study 2.
[16]:
import sys
sys.path.append('../../../../notebooks')
import pandas as pd
import numpy as np
import random
from sklearn.tree import DecisionTreeClassifier
from lightgbm import LGBMClassifier
from raimitigations.utils import split_data
import raimitigations.dataprocessing as dp
from raimitigations.cohort import DecoupledClass, CohortDefinition, CohortManager, fetch_cohort_results, plot_value_counts_cohort
from sklearn.pipeline import Pipeline
from download import download_datasets
SEED = 100
Load and split the data into train and test sets:
[17]:
data_dir = '../../../../datasets/'
download_datasets(data_dir)
df = pd.read_csv(data_dir + 'hr_promotion/train.csv')
df.drop(columns=['employee_id'], inplace=True)
label_col = 'is_promoted'
X_train, X_test, y_train, y_test = split_data(df, label_col, test_size=0.3, random_state=SEED)
df
[17]:
department | region | education | gender | recruitment_channel | no_of_trainings | age | previous_year_rating | length_of_service | KPIs_met >80% | awards_won? | avg_training_score | is_promoted | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Sales & Marketing | region_7 | Master's & above | f | sourcing | 1 | 35 | 5.0 | 8 | 1 | 0 | 49 | 0 |
1 | Operations | region_22 | Bachelor's | m | other | 1 | 30 | 5.0 | 4 | 0 | 0 | 60 | 0 |
2 | Sales & Marketing | region_19 | Bachelor's | m | sourcing | 1 | 34 | 3.0 | 7 | 0 | 0 | 50 | 0 |
3 | Sales & Marketing | region_23 | Bachelor's | m | other | 2 | 39 | 1.0 | 10 | 0 | 0 | 50 | 0 |
4 | Technology | region_26 | Bachelor's | m | other | 1 | 45 | 3.0 | 2 | 0 | 0 | 73 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
54803 | Technology | region_14 | Bachelor's | m | sourcing | 1 | 48 | 3.0 | 17 | 0 | 0 | 78 | 0 |
54804 | Operations | region_27 | Master's & above | f | other | 1 | 37 | 2.0 | 6 | 0 | 0 | 56 | 0 |
54805 | Analytics | region_1 | Bachelor's | m | other | 1 | 27 | 5.0 | 3 | 1 | 0 | 79 | 0 |
54806 | Sales & Marketing | region_9 | NaN | m | sourcing | 1 | 29 | 1.0 | 2 | 0 | 0 | 45 | 0 |
54807 | HR | region_22 | Bachelor's | m | other | 1 | 27 | 1.0 | 5 | 0 | 0 | 49 | 0 |
54808 rows × 13 columns
[18]:
def get_model():
#model = DecisionTreeClassifier(max_features="sqrt")
model = LGBMClassifier(random_state=SEED)
return model
Baseline Pipeline
This dataset is the same one used in the Cohort Case Study 3 notebook, and as depicted there, the department
column is the feature where we can identify some disparity between cohorts. Here, we’ll once again consider this feature as a sensitive feature.
As our first experiment, let’s create a simple pipeline for our baseline:
[19]:
pipe = Pipeline([
("imputer", dp.BasicImputer(verbose=False)),
("scaler", dp.DataStandardScaler(verbose=False)),
("encoder", dp.EncoderOHE(verbose=False)),
("estimator", get_model())
])
pipe.fit(X_train, y_train)
pred = pipe.predict_proba(X_test)
pred_train = pipe.predict_proba(X_train)
_, th_dict = fetch_cohort_results(X_train, y_train, pred_train, cohort_col=["department"], return_th_dict=True)
fetch_cohort_results(X_test, y_test, pred, cohort_col=["department"], fixed_th=th_dict)
[19]:
cohort | cht_query | roc | precision | recall | f1 | accuracy | threshold | num_pos | %_pos | cht_size | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | all | all | 0.910026 | 0.633798 | 0.808795 | 0.658027 | 0.816092 | 0.122429 | 3864 | 0.234994 | 16443 |
1 | cohort_0 | (`department` == "Analytics") | 0.815071 | 0.615072 | 0.737173 | 0.633686 | 0.803662 | 0.144258 | 357 | 0.225379 | 1584 |
2 | cohort_1 | (`department` == "Finance") | 0.928899 | 0.642252 | 0.818997 | 0.660270 | 0.791501 | 0.099002 | 210 | 0.278884 | 753 |
3 | cohort_2 | (`department` == "HR") | 0.924777 | 0.602336 | 0.845885 | 0.610867 | 0.794034 | 0.096400 | 179 | 0.254261 | 704 |
4 | cohort_3 | (`department` == "Legal") | 0.909836 | 0.579581 | 0.763503 | 0.581716 | 0.787037 | 0.107423 | 78 | 0.240741 | 324 |
5 | cohort_4 | (`department` == "Operations") | 0.899921 | 0.621445 | 0.797078 | 0.632185 | 0.777155 | 0.107618 | 953 | 0.281287 | 3388 |
6 | cohort_5 | (`department` == "Procurement") | 0.914249 | 0.636048 | 0.823387 | 0.654546 | 0.796365 | 0.137644 | 576 | 0.268406 | 2146 |
7 | cohort_6 | (`department` == "R&D") | 0.721571 | 0.551094 | 0.633618 | 0.546714 | 0.768166 | 0.113184 | 66 | 0.228374 | 289 |
8 | cohort_7 | (`department` == "Sales & Marketing") | 0.945638 | 0.658135 | 0.868133 | 0.698209 | 0.860826 | 0.129820 | 999 | 0.193642 | 5159 |
9 | cohort_8 | (`department` == "Technology") | 0.879545 | 0.629866 | 0.778378 | 0.644261 | 0.780057 | 0.132835 | 582 | 0.277672 | 2096 |
Using the CohortManager
Let’s now see if we can improve our results over each cohort by building a separate pipeline for each cohort using the CohortManager
class, similar to what we did in the Cohort Case Study 3 notebook:
[20]:
cht_manager = CohortManager(
transform_pipe=[
dp.BasicImputer(verbose=False),
dp.DataStandardScaler(verbose=False),
dp.EncoderOHE(drop=False, unknown_err=False, verbose=False),
get_model()
],
cohort_col=["department"]
)
cht_manager.fit(X_train, y_train)
pred_cht = cht_manager.predict_proba(X_test)
pred_train = cht_manager.predict_proba(X_train)
metrics_train, th_dict = fetch_cohort_results(X_train, y_train, pred_train, cohort_col=["department"], return_th_dict=True)
fetch_cohort_results(X_test, y_test, pred_cht, cohort_col=["department"], fixed_th=th_dict)
[20]:
cohort | cht_query | roc | precision | recall | f1 | accuracy | threshold | num_pos | %_pos | cht_size | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | all | all | 0.897907 | 0.666922 | 0.740960 | 0.693698 | 0.882807 | 0.178975 | 2123 | 0.129113 | 16443 |
1 | cohort_0 | (`department` == "Analytics") | 0.801956 | 0.610089 | 0.668797 | 0.628000 | 0.839015 | 0.169265 | 243 | 0.153409 | 1584 |
2 | cohort_1 | (`department` == "Finance") | 0.915831 | 0.853684 | 0.744484 | 0.786259 | 0.934927 | 0.500199 | 50 | 0.066401 | 753 |
3 | cohort_2 | (`department` == "HR") | 0.894188 | 0.788518 | 0.682923 | 0.721861 | 0.948864 | 0.420675 | 26 | 0.036932 | 704 |
4 | cohort_3 | (`department` == "Legal") | 0.911130 | 0.894654 | 0.629940 | 0.687961 | 0.953704 | 0.807405 | 6 | 0.018519 | 324 |
5 | cohort_4 | (`department` == "Operations") | 0.894509 | 0.666407 | 0.751623 | 0.695174 | 0.872491 | 0.162195 | 496 | 0.146399 | 3388 |
6 | cohort_5 | (`department` == "Procurement") | 0.898282 | 0.674369 | 0.738740 | 0.698804 | 0.881640 | 0.223448 | 279 | 0.130009 | 2146 |
7 | cohort_6 | (`department` == "R&D") | 0.588664 | 0.463415 | 0.496269 | 0.479279 | 0.920415 | 0.742253 | 2 | 0.006920 | 289 |
8 | cohort_7 | (`department` == "Sales & Marketing") | 0.938696 | 0.670041 | 0.787935 | 0.707151 | 0.893196 | 0.165917 | 674 | 0.130645 | 5159 |
9 | cohort_8 | (`department` == "Technology") | 0.877133 | 0.662127 | 0.727013 | 0.685259 | 0.857824 | 0.201117 | 325 | 0.155057 | 2096 |
We can see that by just training one model for each cohort based on the department
column we can get an improvement over certain metrics.
DecoupledClassifier - Transfer Learning
Let’s see if we can improve the “department” cohorts using the DecoupledClass
. We’ll use transfer learning, but for now we won’t use any fairness optimization.
[9]:
preprocessing = [
dp.BasicImputer(verbose=False),
dp.DataMinMaxScaler(verbose=False),
dp.EncoderOHE(drop=False, unknown_err=False, verbose=False)
]
dec_class = DecoupledClass(
cohort_col=["department"],
transform_pipe=preprocessing,
estimator=get_model(),
minority_min_rate=0.03,
min_cohort_pct=0.05,
theta=[0.2, 0.5, 0.8],
)
dec_class.fit(X_train, y_train)
pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=True)
[9]:
cohort | cht_query | roc | precision | recall | f1 | accuracy | threshold | num_pos | %_pos | cht_size | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | all | all | 0.901364 | 0.885156 | 0.675796 | 0.733884 | 0.939001 | 0.500000 | 607 | 0.036915 | 16443 |
1 | cohort_0 | (`department` == "Analytics") | 0.801923 | 0.648600 | 0.611458 | 0.626030 | 0.888889 | 0.285810 | 108 | 0.068182 | 1584 |
2 | cohort_1 | (`department` == "Finance") | 0.928840 | 0.883305 | 0.760029 | 0.806662 | 0.941567 | 0.237477 | 49 | 0.065073 | 753 |
3 | cohort_2 | (`department` == "HR") | 0.910229 | 0.803801 | 0.649475 | 0.696304 | 0.948864 | 0.237990 | 20 | 0.028409 | 704 |
4 | cohort_3 | (`department` == "Legal") | 0.917688 | 0.733660 | 0.722088 | 0.727695 | 0.941358 | 0.227371 | 18 | 0.055556 | 324 |
5 | cohort_4 | (`department` == "Operations") | 0.894497 | 0.756851 | 0.721753 | 0.737574 | 0.919126 | 0.270537 | 262 | 0.077332 | 3388 |
6 | cohort_5 | (`department` == "Procurement") | 0.898306 | 0.752563 | 0.721805 | 0.735838 | 0.917987 | 0.342704 | 169 | 0.078751 | 2146 |
7 | cohort_6 | (`department` == "R&D") | 0.699893 | 0.545614 | 0.541578 | 0.543401 | 0.882353 | 0.226767 | 19 | 0.065744 | 289 |
8 | cohort_7 | (`department` == "Sales & Marketing") | 0.938689 | 0.777230 | 0.739770 | 0.756811 | 0.939135 | 0.272814 | 319 | 0.061834 | 5159 |
9 | cohort_8 | (`department` == "Technology") | 0.877133 | 0.683276 | 0.678080 | 0.680620 | 0.882156 | 0.299429 | 212 | 0.101145 | 2096 |
DecoupledClassifier - Transfer Learning + Fairness Optimization
For our final experiment, we try adding fairness optimization in our DecoupledClass
object. This time, we’ll optimize the Numerical Parity, which aims to make the number of positive labels for each cohort as equal as possible. We’ll set \(\lambda\) to a small value to give more weight to the fairness metric. This way, we expect the estimator to sacrifice a considerable amount of performance metric (precision, recall, F1, and accuracy) over fairness (equal number of positive labels).
[15]:
preprocessing = [
dp.BasicImputer(verbose=False),
dp.DataMinMaxScaler(verbose=False),
dp.EncoderOHE(drop=False, unknown_err=False, verbose=False)
]
dec_class = DecoupledClass(
cohort_col=["department"],
transform_pipe=preprocessing,
estimator=get_model(),
minority_min_rate=0.03,
min_cohort_pct=0.05,
theta=[0.2, 0.5, 0.8],
fairness_loss="num_parity",
lambda_coef=0.2,
max_joint_loss_time=600,
)
dec_class.fit(X_train, y_train)
pred = dec_class.predict_proba(X_test)
fetch_cohort_results(X_test, y_test, pred, cohort_def=dec_class, fixed_th=True)
[15]:
cohort | cht_query | roc | precision | recall | f1 | accuracy | threshold | num_pos | %_pos | cht_size | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | all | all | 0.901364 | 0.885156 | 0.675796 | 0.733884 | 0.939001 | 0.500000 | 607 | 0.036915 | 16443 |
1 | cohort_0 | (`department` == "Analytics") | 0.801923 | 0.659641 | 0.613547 | 0.630870 | 0.892677 | 0.295972 | 102 | 0.064394 | 1584 |
2 | cohort_1 | (`department` == "Finance") | 0.928840 | 0.665214 | 0.838909 | 0.695979 | 0.827357 | 0.087748 | 183 | 0.243028 | 753 |
3 | cohort_2 | (`department` == "HR") | 0.910229 | 0.591032 | 0.830024 | 0.585841 | 0.764205 | 0.064236 | 200 | 0.284091 | 704 |
4 | cohort_3 | (`department` == "Legal") | 0.917688 | 0.542793 | 0.667213 | 0.329452 | 0.373457 | 0.001128 | 222 | 0.685185 | 324 |
5 | cohort_4 | (`department` == "Operations") | 0.894497 | 0.881685 | 0.701623 | 0.757745 | 0.938312 | 0.436915 | 155 | 0.045750 | 3388 |
6 | cohort_5 | (`department` == "Procurement") | 0.898306 | 0.807227 | 0.719237 | 0.753797 | 0.930103 | 0.402534 | 135 | 0.062908 | 2146 |
7 | cohort_6 | (`department` == "R&D") | 0.699893 | 0.549763 | 0.645522 | 0.315951 | 0.342561 | 0.017962 | 211 | 0.730104 | 289 |
8 | cohort_7 | (`department` == "Sales & Marketing") | 0.938689 | 0.956582 | 0.707043 | 0.778220 | 0.956387 | 0.611425 | 162 | 0.031401 | 5159 |
9 | cohort_8 | (`department` == "Technology") | 0.877133 | 0.735181 | 0.650417 | 0.679839 | 0.901240 | 0.420690 | 134 | 0.063931 | 2096 |
As we can see in the previous results, there was a huge drop in the performance metric of certain cohorts in order to allow the estimator to be as fair as possible according to the Numerical Parity metric. This is an approach that must be used with care, since we should always consider the nature of the dataset, and how important is the performance and fairness. We could also try different values of \(\lambda\) and try to find a better balance between performance and numerical parity.