vivainsights.create_lorenz

Calculate the Gini coefficient and plot the Lorenz curve for a given metric.

vivainsights.create_lorenz.get_value_proportion(df, population_share)[source]

Look up the cumulative value share for a given population share.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing cum_population and cum_values_prop columns (as produced by create_lorenz).

  • population_share (float) – Cumulative population share, between 0 and 1.

Returns:

The cumulative value proportion at the given population share.

Return type:

float

Raises:

ValueError – If population_share is not between 0 and 1.

Examples

>>> import vivainsights as vi
>>> pq_data = vi.load_pq_data()
>>> lorenz_table = vi.create_lorenz(data=pq_data, metric="Emails_sent", return_type="table")
>>> vi.get_value_proportion(lorenz_table, population_share=0.5)
vivainsights.create_lorenz.compute_gini(x)[source]

Compute the Gini coefficient for a numeric vector.

The Gini coefficient is a measure of statistical dispersion used to represent inequality in a distribution.

Parameters:

x (list, numpy.ndarray, or pandas.Series) – Numeric values (e.g. hours, emails sent).

Returns:

The Gini coefficient.

Return type:

float

Raises:

ValueError – If x is not a numeric vector.

Examples

>>> import vivainsights as vi
>>> pq_data = vi.load_pq_data()
>>> vi.compute_gini(pq_data["Emails_sent"])
vivainsights.create_lorenz.create_lorenz(data, metric, return_type='plot', figsize=None)[source]

Calculate the Lorenz curve and Gini coefficient for a metric.

Parameters:
  • data (pandas.DataFrame) – DataFrame containing the data to analyse.

  • metric (str) – Column name of the numeric values to analyse.

  • return_type (str, default "plot") – "plot" to display a Lorenz curve, "gini" to return the Gini coefficient, or "table" for a DataFrame of cumulative shares.

  • figsize (tuple of float or None, default None) – Figure size (width, height) in inches. Defaults to (8, 6).

Returns:

The Lorenz curve figure, the Gini coefficient, or a table of cumulative population and value shares.

Return type:

matplotlib.figure.Figure, float, or pandas.DataFrame

Raises:

ValueError – If metric is not in the DataFrame or return_type is invalid.

Examples

Compute the Gini coefficient:

>>> import vivainsights as vi
>>> vi.create_lorenz(data=vi.load_pq_data(), metric="Emails_sent", return_type="gini")

Display the Lorenz curve plot:

>>> vi.create_lorenz(data=vi.load_pq_data(), metric="Emails_sent", return_type="plot")

Return a table of cumulative population and value shares:

>>> vi.create_lorenz(data=vi.load_pq_data(), metric="Emails_sent", return_type="table")

Customize the figure size:

>>> vi.create_lorenz(
...     data=vi.load_pq_data(),
...     metric="Collaboration_hours",
...     return_type="plot",
...     figsize=(10, 8),
... )