vivainsights.create_IV

Calculate Information Value (IV) and Weight of Evidence (WOE) for predictors.

vivainsights.create_IV.p_test(data, outcome, behavior)[source]

Perform statistical tests between predictor variables and a binary outcome.

Automatically selects the appropriate test based on variable type: - Mann-Whitney U test for numeric variables - Chi-square test for categorical variables

Note: The test compares two independent groups (outcome=0 vs outcome=1), so the Mann-Whitney U test is always used for numeric variables as these are inherently unpaired/independent samples.

For categorical variables with low expected frequencies (< 5 in any cell), following Cochran’s guideline (1954) for Chi-square test validity: - 2x2 contingency tables: Fisher’s exact test is used instead - Larger tables: Chi-square test is used with a warning about reliability

If a statistical test fails for a variable (e.g., constant values, insufficient data), the p-value will be NaN and a warning will be issued.

Parameters:
  • data (pd.DataFrame) – A Pandas DataFrame.

  • outcome (str) – Name of the outcome variable.

  • behavior (list) – List of behavior variables to test.

Returns:

A DataFrame with variables and corresponding p-values.

Return type:

pd.DataFrame

Examples

Test p-values for numeric predictors:

>>> import vivainsights as vi
>>> import pandas as pd
>>> data = pd.DataFrame({
...     'outcome': [1, 0, 1, 0, 1],
...     'behavior1': [10, 20, 30, 40, 50],
...     'behavior2': [5, 15, 25, 35, 45]
... })
>>> outcome = 'outcome'
>>> behavior = ['behavior1', 'behavior2']
>>> vi.p_test(data, outcome, behavior)

Include a categorical predictor:

>>> data['department'] = ['HR', 'Eng', 'HR', 'Eng', 'HR']
>>> vi.p_test(data, outcome, ['behavior1', 'department'])
vivainsights.create_IV.calculate_IV(data, outcome, predictor, bins)[source]

Calculate Information Value (IV) between a single predictor and the outcome.

For numeric variables, uses quantile-based binning. For categorical variables, uses each category as a bin.

Parameters:
  • data (pd.DataFrame) – A DataFrame containing the data.

  • outcome (str) – Name of the outcome variable.

  • predictor (str) – Name of the predictor variable.

  • bins (int) – Number of bins for binning the predictor variable (only used for numeric variables).

Returns:

A DataFrame with IV calculations for the predictor variable.

Return type:

pd.DataFrame

Raises:

ValueError – If the outcome variable has missing values in the input training data frame.

Notes

Missing values (NaN) in the predictor variable are automatically dropped before processing. A warning is issued if any missing values are found.

Examples

Calculate IV for a numeric predictor:

>>> import vivainsights as vi
>>> import pandas as pd
>>> data = pd.DataFrame({
...     'outcome': [1, 0, 1, 0, 1],
...     'predictor': [10, 20, 30, 40, 50]
... })
>>> outcome = 'outcome'
>>> predictor = 'predictor'
>>> bins = 5
>>> vi.calculate_IV(data, outcome, predictor, bins)

Calculate IV for a categorical predictor:

>>> data['dept'] = ['HR', 'Eng', 'HR', 'Eng', 'HR']
>>> vi.calculate_IV(data, 'outcome', 'dept', bins=5)
vivainsights.create_IV.map_IV(data, outcome, predictors=None, bins=5)[source]

Map Information Value (IV) calculations across multiple predictors.

Calls calculate_IV() for every predictor–outcome pair.

Parameters:
  • data (pandas.DataFrame) – DataFrame containing the data.

  • outcome (str) – Name of the outcome variable.

  • predictors (list of str, optional) – Predictor variables. If None, all numeric columns except outcome are used.

  • bins (int) – Number of bins for numeric predictors.

Returns:

Dictionary with keys "Tables" (per-predictor IV DataFrames) and "Summary" (aggregate IV DataFrame sorted descending).

Return type:

dict

Examples

Map IV across all numeric predictors:

>>> import vivainsights as vi
>>> import pandas as pd
>>> data = pd.DataFrame({
...     'outcome': [1, 0, 1, 0, 1, 0, 1, 0],
...     'hours': [10, 20, 30, 40, 15, 25, 35, 45],
...     'emails': [5, 15, 25, 35, 10, 20, 30, 40],
... })
>>> iv_result = vi.map_IV(data, outcome='outcome', predictors=['hours', 'emails'], bins=3)
>>> iv_result['Summary']  # aggregated IV for each predictor

Let predictors default to all numeric columns:

>>> iv_result = vi.map_IV(data, outcome='outcome', bins=3)
vivainsights.create_IV.plot_WOE(IV, predictor, figsize=None)[source]

Plot Weight of Evidence (WOE) for a predictor variable.

Parameters:
  • IV (dict) – Dictionary returned by map_IV().

  • predictor (str) – Name of the predictor variable to plot.

  • figsize (tuple, optional) – Figure size as (width, height) in inches. Defaults to (8, 6).

Returns:

This function doesn’t return a value; it plots the WOE.

Return type:

None

Examples

Plot WOE for a predictor:

>>> import vivainsights as vi
>>> import pandas as pd
>>> data = pd.DataFrame({
...     'outcome': [1, 0, 1, 0, 1],
...     'predictor': [10, 20, 30, 40, 50]
... })
>>> outcome = 'outcome'
>>> predictor = 'predictor'
>>> bins = 5
>>> IV = vi.map_IV(data, outcome, [predictor], bins)
>>> vi.plot_WOE(IV, predictor)

Customize the figure size:

>>> vi.plot_WOE(IV, predictor, figsize=(10, 6))
vivainsights.create_IV.create_IV(data=<class 'pandas.core.frame.DataFrame'>, predictors=None, outcome=None, bins=5, siglevel=0.05, exc_sig=False, figsize=None, return_type='plot')[source]

Create an Information Value (IV) analysis for predictor variables.

Parameters:
  • data (pandas.DataFrame) – DataFrame containing the data.

  • predictors (list of str, optional) – Predictor variables.

  • outcome (str) – Name of the binary outcome variable.

  • bins (int, optional) – Number of bins for numeric predictors. Defaults to 5.

  • siglevel (float, optional) – Significance level for filtering predictors. Defaults to 0.05.

  • exc_sig (bool, optional) – If True, exclude predictors with p-value above siglevel. Defaults to False.

  • figsize (tuple, optional) – Figure size as (width, height) in inches.

  • return_type (str, optional) –

    Type of output:

    • "plot" (default): bar chart of IV values.

    • "summary": IV summary DataFrame.

    • "list": dict of per-predictor IV tables with ODDS/PROB.

    • "plot-WOE": list of WOE plot Figures.

    • "IV": tuple of (output_list, IV_summary, lnodds).

Returns:

Depends on return_type.

Return type:

matplotlib.figure.Figure, pandas.DataFrame, dict, or list

Notes

When return_type is "list" or "summary", the output is a dictionary — use a for-loop to access keys and values.

Examples

>>> import numpy as np
>>> import vivainsights as vi
>>> pq_data = vi.load_pq_data()
>>> pred_vars = ["Email_hours", "Meeting_hours", "Chat_hours"]
>>> pq_data["outcome_sim"] = np.where(pq_data["Internal_network_size"] > 40, 1, 0)
>>>
>>> # IV tables for all predictors
>>> vi.create_IV(pq_data, predictors=pred_vars, outcome="outcome_sim", return_type="IV")
>>>
>>> # Exclude non-significant predictors and return summary
>>> vi.create_IV(pq_data, predictors=pred_vars, outcome="outcome_sim", exc_sig=True, return_type="summary")
>>>
>>> # IV bar chart
>>> vi.create_IV(pq_data, predictors=pred_vars, outcome="outcome_sim", return_type="plot")