DistributionBalanceMeasure
These metrics compare the data with a reference distribution (currently only uniform distribution is supported). They are calculated per sensitive feature column and do not depend on the class label column.
| Measure | Description | Interpretation | 
|---|---|---|
| Kullbeck–Leibler (KL) divergence  | Non-negative.  | |
| The Jensen-Shannon (JS) distance  | Range  | |
| This distance is also known as the  | Non-negative.  | |
| Also known as the Chebyshev distance  | Non-negative.  | |
| The total variation distance is equal  | Non-negative.  | |
| The chi-square test is used to test the  | The p-value gives evidence  | 
- class raimitigations.databalanceanalysis.distribution_measures.DistributionBalanceMeasure(sensitive_cols: List[str])
- Bases: - BalanceMeasure- DISTRIBUTION_METRICS: Dict[Measures, Callable[[array, array], float]] = {Measures.CHISQ_PVALUE: <function get_chisq_pvalue>, Measures.CHISQ: <function get_chi_squared>, Measures.INF_NORM_DISTANCE: <function get_infinity_norm_distance>, Measures.JS_DISTANCE: <function get_js_distance>, Measures.KL_DIVERGENCE: <function get_kl_divergence>, Measures.TOTAL_VARIANCE_DISTANCE: <function get_total_variation_distance>, Measures.WS_DISTANCE: <function get_ws_distance>}
 - measures(df: DataFrame) DataFrame
- The output is a dataframe that maps the sensitive column name to another dictionary:
- the dictionary for each sensitive column contains a mapping of the name of a measure to its value - Kullback-Leibler Divergence - https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence 
- Jensen-Shannon Distance - https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence 
- Wasserstein Distance - https://en.wikipedia.org/wiki/Wasserstein_metric 
- Infinity Norm Distance - https://en.wikipedia.org/wiki/Chebyshev_distance 
- Total Variation Distance - https://en.wikipedia.org/wiki/Total_variation_distance_of_probability_measures 
- Chi-Squared Test - https://en.wikipedia.org/wiki/Chi-squared_test 
 - There is one dictionary for each of the sensitive columns specified 
 - Parameters
- df (pd.DataFrame) – the df to calculate all of the distribution measures on 
- Returns
- a dataframe that has one column with the sensitive column name and column that contains the dictionary that has the mapping of the name of the measure to its value for that sensitive feature. 
- Return type
- pd.DataFrame