vivainsights.create_survival¶

create_survival: Parameterized Kaplan-Meier survival workflow (calc + viz + wrapper).

Design goals¶

General-purpose: works with any HR attribute column (segments, org, region, etc.).
Uses lifelines.KaplanMeierFitter if available; falls back to a NumPy implementation.
Reuses the figure header styling used in other vivainsights visuals.
Returns either a plot or a table.

The typical workflow starts with create_survival_prep() to convert panel data into the person-level format expected here.

Example

Single overall curve (no grouping):

>>> import vivainsights as vi
>>> from vivainsights.create_survival import create_survival
>>> from vivainsights.create_survival_prep import create_survival_prep
>>>
>>> pq_data = vi.load_pq_data()
>>> surv_data = create_survival_prep(
...     data=pq_data,
...     metric="Copilot_actions_taken_in_Teams",
... )
>>> fig = create_survival(
...     data=surv_data,
...     time_col="time",
...     event_col="event",
... )

Grouped by HR attribute:

>>> fig = create_survival(
...     data=surv_data,
...     time_col="time",
...     event_col="event",
...     hrvar="Organization",
... )

Table output:

>>> tbl = create_survival(
...     data=surv_data,
...     time_col="time",
...     event_col="event",
...     hrvar="Organization",
...     return_type="table",
... )

vivainsights.create_survival.create_survival_calc(data, time_col, event_col, hrvar=None, id_col='PersonId', mingroup=5, timeline=None, dropna=True, use_lifelines=True)[source]¶

Name¶

create_survival_calc

Description¶

Compute Kaplan-Meier survival curves per group (segment, org, etc.). Uses lifelines.KaplanMeierFitter when available (and use_lifelines=True), otherwise falls back to a simple NumPy implementation.

The event_col is coerced to integer 0/1 via _coerce_event, which accepts numeric (>0 = event), boolean, or string tokens (“true”/”yes”/”1”).

param data:

Person-level data frame (one row per subject), as produced by create_survival_prep(), containing time_col, event_col, and optionally hrvar.

type data:

pd.DataFrame

param time_col:

Column containing durations to event or censoring (numeric, e.g., weeks).

type time_col:

str

param event_col:

Event indicator column. Accepts numeric (>0 = event), boolean, or string tokens (“true”/”yes”/”1”, “false”/”no”/”0”).

type event_col:

str

param hrvar:

HR attribute column for grouping. If None, a single overall curve is returned.

type hrvar:

str or None, default None

param id_col:

Unique subject identifier used for mingroup counting. If None or not present, the row count per group is used instead.

type id_col:

str or None, default “PersonId”

param mingroup:

Minimum unique subjects required per group; groups with fewer are dropped.

type mingroup:

int, default 5

param timeline:

Common set of times at which to report survival. If None, per-group unique times are used.

type timeline:

sequence of float, optional

param dropna:

Drop rows with NA in required columns before computing curves.

type dropna:

bool, default True

param use_lifelines:

If True and lifelines is available, use KaplanMeierFitter; otherwise, use NumPy.

type use_lifelines:

bool, default True

returns:

survival_long (pd.DataFrame) – Long-format table with columns [hrvar (or "group" when ungrouped), "time", "survival", "at_risk", "events"].
counts (pd.Series) – Number of unique subjects per group (after filtering).

vivainsights.create_survival.create_survival_viz(data, hrvar, figsize=(8, 6), title=None, subtitle=None, caption=None, linewidth=2.0)[source]¶

Name¶

create_survival_viz

Description¶

Render Kaplan-Meier survival step curves for each group in data.

param data:: Output of create_survival_calc, with at least [hrvar, “time”, “survival”].
type data:: pd.DataFrame
param hrvar:: Column name identifying the groups to plot.
type hrvar:: str
param figsize:: Matplotlib figure size in inches (width, height).
type figsize:: tuple of float, default (8, 6)
param title:: Figure-level title.
type title:: str, optional
param subtitle:: Smaller line beneath the title.
type subtitle:: str, optional
param caption:: Small text near the bottom of the figure (e.g., date range).
type caption:: str, optional
param linewidth:: Line width for the step curves.
type linewidth:: float, default 2.0
returns:: fig – The constructed matplotlib Figure.
rtype:: matplotlib.figure.Figure

vivainsights.create_survival.create_survival(data, time_col, event_col, hrvar=None, id_col='PersonId', mingroup=5, timeline=None, dropna=True, use_lifelines=True, return_type='plot', figsize=(8, 6), title=None, subtitle=None, caption=None)[source]¶

Name¶

create_survival

Description¶

High-level convenience wrapper to compute Kaplan-Meier curves and either:

return the long survival table (return_type=”table”), or
render the survival plot (return_type=”plot”).

The input data should be a person-level data frame (one row per person) as produced by create_survival_prep().

param data:

Person-level data frame (one row per person), as produced by create_survival_prep(), containing at least time_col and event_col.

type data:

pd.DataFrame

param time_col:

Duration-to-event column.

type time_col:

str

param event_col:

Event indicator column. Accepts numeric (>0 = event), boolean, or string tokens (“true”/”yes”/”1”, “false”/”no”/”0”).

type event_col:

str

param hrvar:

HR attribute column for separate survival curves. See “Grouping behavior”.

type hrvar:

str or None, default None

param id_col:

Unique subject identifier for mingroup counting.

type id_col:

str, default “PersonId”

param mingroup:

Minimum number of unique subjects per group.

type mingroup:

int, default 5

param timeline:

Times at which to report survival.

type timeline:

sequence of float, optional

param dropna:

Drop rows with NAs in required columns prior to calculation.

type dropna:

bool, default True

param use_lifelines:

Use lifelines.KaplanMeierFitter when available.

type use_lifelines:

bool, default True

param return_type:

“plot”: return a matplotlib Figure.
“table”: return the survival-long DataFrame.

type return_type:

{“plot”,”table”}, default “plot”

param figsize:

Figure size in inches (only used when return_type=”plot”).

type figsize:

tuple of float, default (8, 6)

param title:

Plot title. If None, a default is used.

type title:

str, optional

param subtitle:

Optional subtitle beneath the title.

type subtitle:

str, optional

param caption:

Caption text shown at the bottom of the figure. Note: the typical input (output of create_survival_prep) contains no date column, so date ranges cannot be extracted automatically. Pass the date range string manually if needed, e.g. via vi.extract_date_range(raw_data).

type caption:

str, optional

returns:

If return_type=”plot”: a Figure containing the survival curves.
If return_type=”table”: the long survival table.

rtype:

matplotlib.figure.Figure or pd.DataFrame