vivainsights.create_IV¶
Calculate Information Value (IV) and Weight of Evidence (WOE) for predictors.
- vivainsights.create_IV.p_test(data, outcome, behavior)[source]¶
Perform statistical tests between predictor variables and a binary outcome.
Automatically selects the appropriate test based on variable type: - Mann-Whitney U test for numeric variables - Chi-square test for categorical variables
Note: The test compares two independent groups (outcome=0 vs outcome=1), so the Mann-Whitney U test is always used for numeric variables as these are inherently unpaired/independent samples.
For categorical variables with low expected frequencies (< 5 in any cell), following Cochran’s guideline (1954) for Chi-square test validity: - 2x2 contingency tables: Fisher’s exact test is used instead - Larger tables: Chi-square test is used with a warning about reliability
If a statistical test fails for a variable (e.g., constant values, insufficient data), the p-value will be NaN and a warning will be issued.
- Parameters:
data (pd.DataFrame) – A Pandas DataFrame.
outcome (str) – Name of the outcome variable.
behavior (list) – List of behavior variables to test.
- Returns:
A DataFrame with variables and corresponding p-values.
- Return type:
pd.DataFrame
Examples
Test p-values for numeric predictors:
>>> import vivainsights as vi >>> import pandas as pd >>> data = pd.DataFrame({ ... 'outcome': [1, 0, 1, 0, 1], ... 'behavior1': [10, 20, 30, 40, 50], ... 'behavior2': [5, 15, 25, 35, 45] ... }) >>> outcome = 'outcome' >>> behavior = ['behavior1', 'behavior2'] >>> vi.p_test(data, outcome, behavior)
Include a categorical predictor:
>>> data['department'] = ['HR', 'Eng', 'HR', 'Eng', 'HR'] >>> vi.p_test(data, outcome, ['behavior1', 'department'])
- vivainsights.create_IV.calculate_IV(data, outcome, predictor, bins)[source]¶
Calculate Information Value (IV) between a single predictor and the outcome.
For numeric variables, uses quantile-based binning. For categorical variables, uses each category as a bin.
- Parameters:
data (pd.DataFrame) – A DataFrame containing the data.
outcome (str) – Name of the outcome variable.
predictor (str) – Name of the predictor variable.
bins (int) – Number of bins for binning the predictor variable (only used for numeric variables).
- Returns:
A DataFrame with IV calculations for the predictor variable.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the outcome variable has missing values in the input training data frame.
Notes
Missing values (NaN) in the predictor variable are automatically dropped before processing. A warning is issued if any missing values are found.
Examples
Calculate IV for a numeric predictor:
>>> import vivainsights as vi >>> import pandas as pd >>> data = pd.DataFrame({ ... 'outcome': [1, 0, 1, 0, 1], ... 'predictor': [10, 20, 30, 40, 50] ... }) >>> outcome = 'outcome' >>> predictor = 'predictor' >>> bins = 5 >>> vi.calculate_IV(data, outcome, predictor, bins)
Calculate IV for a categorical predictor:
>>> data['dept'] = ['HR', 'Eng', 'HR', 'Eng', 'HR'] >>> vi.calculate_IV(data, 'outcome', 'dept', bins=5)
- vivainsights.create_IV.map_IV(data, outcome, predictors=None, bins=5)[source]¶
Map Information Value (IV) calculations across multiple predictors.
Calls
calculate_IV()for every predictor–outcome pair.- Parameters:
data (pandas.DataFrame) – DataFrame containing the data.
outcome (str) – Name of the outcome variable.
predictors (list of str, optional) – Predictor variables. If
None, all numeric columns exceptoutcomeare used.bins (int) – Number of bins for numeric predictors.
- Returns:
Dictionary with keys
"Tables"(per-predictor IV DataFrames) and"Summary"(aggregate IV DataFrame sorted descending).- Return type:
dict
Examples
Map IV across all numeric predictors:
>>> import vivainsights as vi >>> import pandas as pd >>> data = pd.DataFrame({ ... 'outcome': [1, 0, 1, 0, 1, 0, 1, 0], ... 'hours': [10, 20, 30, 40, 15, 25, 35, 45], ... 'emails': [5, 15, 25, 35, 10, 20, 30, 40], ... }) >>> iv_result = vi.map_IV(data, outcome='outcome', predictors=['hours', 'emails'], bins=3) >>> iv_result['Summary'] # aggregated IV for each predictor
Let predictors default to all numeric columns:
>>> iv_result = vi.map_IV(data, outcome='outcome', bins=3)
- vivainsights.create_IV.plot_WOE(IV, predictor, figsize=None)[source]¶
Plot Weight of Evidence (WOE) for a predictor variable.
- Parameters:
IV (dict) – Dictionary returned by
map_IV().predictor (str) – Name of the predictor variable to plot.
figsize (tuple, optional) – Figure size as
(width, height)in inches. Defaults to(8, 6).
- Returns:
This function doesn’t return a value; it plots the WOE.
- Return type:
None
Examples
Plot WOE for a predictor:
>>> import vivainsights as vi >>> import pandas as pd >>> data = pd.DataFrame({ ... 'outcome': [1, 0, 1, 0, 1], ... 'predictor': [10, 20, 30, 40, 50] ... }) >>> outcome = 'outcome' >>> predictor = 'predictor' >>> bins = 5 >>> IV = vi.map_IV(data, outcome, [predictor], bins) >>> vi.plot_WOE(IV, predictor)
Customize the figure size:
>>> vi.plot_WOE(IV, predictor, figsize=(10, 6))
- vivainsights.create_IV.create_IV(data=<class 'pandas.core.frame.DataFrame'>, predictors=None, outcome=None, bins=5, siglevel=0.05, exc_sig=False, figsize=None, return_type='plot')[source]¶
Create an Information Value (IV) analysis for predictor variables.
- Parameters:
data (pandas.DataFrame) – DataFrame containing the data.
predictors (list of str, optional) – Predictor variables.
outcome (str) – Name of the binary outcome variable.
bins (int, optional) – Number of bins for numeric predictors. Defaults to 5.
siglevel (float, optional) – Significance level for filtering predictors. Defaults to 0.05.
exc_sig (bool, optional) – If
True, exclude predictors with p-value abovesiglevel. Defaults toFalse.figsize (tuple, optional) – Figure size as
(width, height)in inches.return_type (str, optional) –
Type of output:
"plot"(default): bar chart of IV values."summary": IV summary DataFrame."list": dict of per-predictor IV tables with ODDS/PROB."plot-WOE": list of WOE plot Figures."IV": tuple of (output_list, IV_summary, lnodds).
- Returns:
Depends on
return_type.- Return type:
matplotlib.figure.Figure, pandas.DataFrame, dict, or list
Notes
When
return_typeis"list"or"summary", the output is a dictionary — use a for-loop to access keys and values.Examples
>>> import numpy as np >>> import vivainsights as vi >>> pq_data = vi.load_pq_data() >>> pred_vars = ["Email_hours", "Meeting_hours", "Chat_hours"] >>> pq_data["outcome_sim"] = np.where(pq_data["Internal_network_size"] > 40, 1, 0) >>> >>> # IV tables for all predictors >>> vi.create_IV(pq_data, predictors=pred_vars, outcome="outcome_sim", return_type="IV") >>> >>> # Exclude non-significant predictors and return summary >>> vi.create_IV(pq_data, predictors=pred_vars, outcome="outcome_sim", exc_sig=True, return_type="summary") >>> >>> # IV bar chart >>> vi.create_IV(pq_data, predictors=pred_vars, outcome="outcome_sim", return_type="plot")