vivainsights.create_survival¶
create_survival: Parameterized Kaplan-Meier survival workflow (calc + viz + wrapper).
Design goals¶
General-purpose: works with any HR attribute column (segments, org, region, etc.).
Uses lifelines.KaplanMeierFitter if available; falls back to a NumPy implementation.
Reuses the figure header styling used in other vivainsights visuals.
Returns either a plot or a table.
The typical workflow starts with create_survival_prep() to convert panel data into the person-level format expected here.
Example
Single overall curve (no grouping):
>>> import vivainsights as vi
>>> from vivainsights.create_survival import create_survival
>>> from vivainsights.create_survival_prep import create_survival_prep
>>>
>>> pq_data = vi.load_pq_data()
>>> surv_data = create_survival_prep(
... data=pq_data,
... metric="Copilot_actions_taken_in_Teams",
... )
>>> fig = create_survival(
... data=surv_data,
... time_col="time",
... event_col="event",
... )
Grouped by HR attribute:
>>> fig = create_survival(
... data=surv_data,
... time_col="time",
... event_col="event",
... hrvar="Organization",
... )
Table output:
>>> tbl = create_survival(
... data=surv_data,
... time_col="time",
... event_col="event",
... hrvar="Organization",
... return_type="table",
... )
- vivainsights.create_survival.create_survival_calc(data, time_col, event_col, hrvar=None, id_col='PersonId', mingroup=5, timeline=None, dropna=True, use_lifelines=True)[source]¶
Name¶
create_survival_calc
Description¶
Compute Kaplan-Meier survival curves per group (segment, org, etc.). Uses lifelines.KaplanMeierFitter when available (and use_lifelines=True), otherwise falls back to a simple NumPy implementation.
The event_col is coerced to integer 0/1 via _coerce_event, which accepts numeric (>0 = event), boolean, or string tokens (“true”/”yes”/”1”).
- param data:
Person-level data frame (one row per subject), as produced by create_survival_prep(), containing time_col, event_col, and optionally hrvar.
- type data:
pd.DataFrame
- param time_col:
Column containing durations to event or censoring (numeric, e.g., weeks).
- type time_col:
str
- param event_col:
Event indicator column. Accepts numeric (>0 = event), boolean, or string tokens (“true”/”yes”/”1”, “false”/”no”/”0”).
- type event_col:
str
- param hrvar:
HR attribute column for grouping. If None, a single overall curve is returned.
- type hrvar:
str or None, default None
- param id_col:
Unique subject identifier used for mingroup counting. If None or not present, the row count per group is used instead.
- type id_col:
str or None, default “PersonId”
- param mingroup:
Minimum unique subjects required per group; groups with fewer are dropped.
- type mingroup:
int, default 5
- param timeline:
Common set of times at which to report survival. If None, per-group unique times are used.
- type timeline:
sequence of float, optional
- param dropna:
Drop rows with NA in required columns before computing curves.
- type dropna:
bool, default True
- param use_lifelines:
If True and lifelines is available, use KaplanMeierFitter; otherwise, use NumPy.
- type use_lifelines:
bool, default True
- returns:
survival_long (pd.DataFrame) – Long-format table with columns [
hrvar(or"group"when ungrouped),"time","survival","at_risk","events"].counts (pd.Series) – Number of unique subjects per group (after filtering).
- vivainsights.create_survival.create_survival_viz(data, hrvar, figsize=(8, 6), title=None, subtitle=None, caption=None, linewidth=2.0)[source]¶
Name¶
create_survival_viz
Description¶
Render Kaplan-Meier survival step curves for each group in data.
- param data:
Output of create_survival_calc, with at least [hrvar, “time”, “survival”].
- type data:
pd.DataFrame
- param hrvar:
Column name identifying the groups to plot.
- type hrvar:
str
- param figsize:
Matplotlib figure size in inches (width, height).
- type figsize:
tuple of float, default (8, 6)
- param title:
Figure-level title.
- type title:
str, optional
- param subtitle:
Smaller line beneath the title.
- type subtitle:
str, optional
- param caption:
Small text near the bottom of the figure (e.g., date range).
- type caption:
str, optional
- param linewidth:
Line width for the step curves.
- type linewidth:
float, default 2.0
- returns:
fig – The constructed matplotlib Figure.
- rtype:
matplotlib.figure.Figure
- vivainsights.create_survival.create_survival(data, time_col, event_col, hrvar=None, id_col='PersonId', mingroup=5, timeline=None, dropna=True, use_lifelines=True, return_type='plot', figsize=(8, 6), title=None, subtitle=None, caption=None)[source]¶
Name¶
create_survival
Description¶
- High-level convenience wrapper to compute Kaplan-Meier curves and either:
return the long survival table (return_type=”table”), or
render the survival plot (return_type=”plot”).
The input data should be a person-level data frame (one row per person) as produced by create_survival_prep().
- param data:
Person-level data frame (one row per person), as produced by create_survival_prep(), containing at least time_col and event_col.
- type data:
pd.DataFrame
- param time_col:
Duration-to-event column.
- type time_col:
str
- param event_col:
Event indicator column. Accepts numeric (>0 = event), boolean, or string tokens (“true”/”yes”/”1”, “false”/”no”/”0”).
- type event_col:
str
- param hrvar:
HR attribute column for separate survival curves. See “Grouping behavior”.
- type hrvar:
str or None, default None
- param id_col:
Unique subject identifier for mingroup counting.
- type id_col:
str, default “PersonId”
- param mingroup:
Minimum number of unique subjects per group.
- type mingroup:
int, default 5
- param timeline:
Times at which to report survival.
- type timeline:
sequence of float, optional
- param dropna:
Drop rows with NAs in required columns prior to calculation.
- type dropna:
bool, default True
- param use_lifelines:
Use lifelines.KaplanMeierFitter when available.
- type use_lifelines:
bool, default True
- param return_type:
“plot”: return a matplotlib Figure.
“table”: return the survival-long DataFrame.
- type return_type:
{“plot”,”table”}, default “plot”
- param figsize:
Figure size in inches (only used when return_type=”plot”).
- type figsize:
tuple of float, default (8, 6)
- param title:
Plot title. If None, a default is used.
- type title:
str, optional
- param subtitle:
Optional subtitle beneath the title.
- type subtitle:
str, optional
- param caption:
Caption text shown at the bottom of the figure. Note: the typical input (output of create_survival_prep) contains no date column, so date ranges cannot be extracted automatically. Pass the date range string manually if needed, e.g. via vi.extract_date_range(raw_data).
- type caption:
str, optional
- returns:
If return_type=”plot”: a Figure containing the survival curves.
If return_type=”table”: the long survival table.
- rtype:
matplotlib.figure.Figure or pd.DataFrame