FeatureBalanceMeasure
These measure whether each combination of sensitive features is receiving the positive outcome (true prediction) at balanced probabilities. Many of these metrics were influenced by the paper, Measuring Model Biases in the Absence of Ground Truth (Osman Aka, Ken Burke, Alex Bäuerle, Christina Greer, Margaret Mitchell).
Association |
Family |
Description |
Interpretation / |
---|---|---|---|
Fairness |
The proportion of each segment |
Parity increases with |
|
Pointwise |
Entropy |
The PMI of a pair of feature |
Range (normalized) |
Sorensen-Dice |
Intersection |
The SDC is used to gauge the |
Equals twice the number of |
Intersection |
Similar to SDC, the Jaccard |
Equals the size of the |
|
Correlation |
This is used to measure the |
High when observations |
|
Correlation |
This metric calculates the |
If likelihoods are similar, |
|
Correlation |
The t-test is used to |
The value that is being |
- class raimitigations.databalanceanalysis.feature_measures.FeatureBalanceMeasure(sensitive_cols: List[str], label_col: str)
Bases:
BalanceMeasure
- CLASS_A = 'ClassA'
- CLASS_B = 'ClassB'
- FEATURE_METRICS: Dict[Measures, Callable[[float, float, float, float], float]] = {Measures.DEMOGRAPHIC_PARITY: <function get_demographic_parity>, Measures.JACCARD_INDEX: <function get_jaccard_index>, Measures.KR_CORRELATION: <function get_kr_correlation>, Measures.LOG_LIKELIHOOD: <function get_log_likelihood_ratio>, Measures.POINTWISE_MUTUAL_INFO: <function get_point_mutual>, Measures.SD_COEF: <function get_sorenson_dice>, Measures.TTEST: <function get_t_test_stat>}
- OVERALL_METRICS: Dict[Tuple[Measures, Measures], Callable[[float, int], float]] = {(<Measures.TTEST_PVALUE: 'ttest_pvalue'>, <Measures.TTEST: 't_test'>): <function get_t_test_p_value>}
- measures(df: DataFrame) DataFrame
The output is a dictionary that maps the sensitive column table to Pandas dataframe containing the following
A feature value within the sensitive feature.
Another feature value within the sensitive feature.
It contains the following measures of the gaps between the two classes
Demographic Parity - https://en.wikipedia.org/wiki/Fairness_(machine_learning)
Pointwise Mutual Information - https://en.wikipedia.org/wiki/Pointwise_mutual_information
Sorensen-Dice Coefficient - https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient
Jaccard Index - https://en.wikipedia.org/wiki/Jaccard_index
Kendall Rank Correlation - https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient
Log-Likelihood Ratio - https://en.wikipedia.org/wiki/Likelihood_function#Likelihood_ratio
This output dataframe contains a row per combination of feature values for each sensitive feature.
- Parameters
df (pd.DataFrame) – the df to calculate all of the feature balance measures on
- Returns
a dataframe that contains 4 columns, first column is the sensitive feature’s name, 2nd column is one possible value of that sensitive feature, the 3rd column is a different possible value of that feature and the last column is a dictionary which indicates
- Return type
pd.DataFrame