AggregateBalanceMeasure

These measures look at the distribution of records across all value combinations of sensitive feature columns. For example, if sex and race are specified as sensitive features, the API tries to quantify imbalance across all combinations of the specified features (e.g., [Male, Black], [Female, White], [Male, Asian Pacific Islander])

Measure

Description

Interpretation

Atkinson index

The Atkinson index presents the
percentage of total income that
a given society would have to
forego in order to have more equal
shares of income among its
citizens. This measure depends on
the degree of societal aversion to
inequality (a theoretical parameter
decided by the researcher), where a
higher value entails greater social
utility or willingness by individuals
to accept smaller incomes in exchange
for a more equal distribution.

An important feature of the Atkinson
index is that it can be decomposed
into within-group and between-group
inequality.

Range [0,1]
0 = perfect equality
1 = maximum inequality

In this case, it is the
proportion of records for a
sensitive column’s combination.

Theil T index

GE(1) = Theil T, which is more
sensitive to differences at the
top of the distribution. The Theil
index is a statistic used to measure
economic inequality. The Theil index
measures an entropic “distance” the
population is away from the “ideal”
egalitarian state of everyone having
the same income.

If everyone has the same income,
then T_T equals 0.

If one person has all the income,
then T_T gives the result ln(N).

0 means equal income and larger
values mean higher level of
disproportion.

Theil L index

GE(0) = Theil L, which is more
sensitive to differences at the
lower end of the distribution.
Thiel L is the logarithm of
(mean income)/(income i), over
all the incomes included in the
summation. It is also referred
to as the mean log deviation
measure. Because a transfer from
a larger income to a smaller one
will change the smaller income’s
ratio more than it changes the
larger income’s ratio, the
transfer-principle is satisfied
by this index.

Same interpretation as
Theil T index.

class raimitigations.databalanceanalysis.aggregate_measures.AggregateBalanceMeasure(sensitive_cols: List[str])

Bases: BalanceMeasure

AGGREGATE_METRICS: Dict[Measures, Callable[[array], float]] = {Measures.ATKINSON_INDEX: <function get_atkinson_index>, Measures.THEIL_L_INDEX: <function get_theil_l_index>, Measures.THEIL_T_INDEX: <function get_theil_t_index>}
measures(df: DataFrame) DataFrame
The output is a dataframe that maps the names of the different aggregate measures to their values:

The following measures are computed:

Parameters

df (pd.DataFrame) – the df to calculate aggregate measures on

Returns

returns a dataframe that has one column that is the name of the aggregate measure, the second column contains the values for each of the metrics of interest.

Return type

pd.DataFrame