vivainsights.create_survival

create_survival: Parameterized Kaplan-Meier survival workflow (calc + viz + wrapper).

Design goals

  • General-purpose: works with any HR attribute column (segments, org, region, etc.).

  • Uses lifelines.KaplanMeierFitter if available; falls back to a NumPy implementation.

  • Reuses the figure header styling used in other vivainsights visuals.

  • Returns either a plot or a table.

The typical workflow starts with create_survival_prep() to convert panel data into the person-level format expected here.

Example

Single overall curve (no grouping):

>>> import vivainsights as vi
>>> from vivainsights.create_survival import create_survival
>>> from vivainsights.create_survival_prep import create_survival_prep
>>>
>>> pq_data = vi.load_pq_data()
>>> surv_data = create_survival_prep(
...     data=pq_data,
...     metric="Copilot_actions_taken_in_Teams",
... )
>>> fig = create_survival(
...     data=surv_data,
...     time_col="time",
...     event_col="event",
... )

Grouped by HR attribute:

>>> fig = create_survival(
...     data=surv_data,
...     time_col="time",
...     event_col="event",
...     hrvar="Organization",
... )

Table output:

>>> tbl = create_survival(
...     data=surv_data,
...     time_col="time",
...     event_col="event",
...     hrvar="Organization",
...     return_type="table",
... )
vivainsights.create_survival.create_survival_calc(data, time_col, event_col, hrvar=None, id_col='PersonId', mingroup=5, timeline=None, dropna=True, use_lifelines=True)[source]

Name

create_survival_calc

Description

Compute Kaplan-Meier survival curves per group (segment, org, etc.). Uses lifelines.KaplanMeierFitter when available (and use_lifelines=True), otherwise falls back to a simple NumPy implementation.

The event_col is coerced to integer 0/1 via _coerce_event, which accepts numeric (>0 = event), boolean, or string tokens (“true”/”yes”/”1”).

param data:

Person-level data frame (one row per subject), as produced by create_survival_prep(), containing time_col, event_col, and optionally hrvar.

type data:

pd.DataFrame

param time_col:

Column containing durations to event or censoring (numeric, e.g., weeks).

type time_col:

str

param event_col:

Event indicator column. Accepts numeric (>0 = event), boolean, or string tokens (“true”/”yes”/”1”, “false”/”no”/”0”).

type event_col:

str

param hrvar:

HR attribute column for grouping. If None, a single overall curve is returned.

type hrvar:

str or None, default None

param id_col:

Unique subject identifier used for mingroup counting. If None or not present, the row count per group is used instead.

type id_col:

str or None, default “PersonId”

param mingroup:

Minimum unique subjects required per group; groups with fewer are dropped.

type mingroup:

int, default 5

param timeline:

Common set of times at which to report survival. If None, per-group unique times are used.

type timeline:

sequence of float, optional

param dropna:

Drop rows with NA in required columns before computing curves.

type dropna:

bool, default True

param use_lifelines:

If True and lifelines is available, use KaplanMeierFitter; otherwise, use NumPy.

type use_lifelines:

bool, default True

returns:
  • survival_long (pd.DataFrame) – Long-format table with columns [hrvar (or "group" when ungrouped), "time", "survival", "at_risk", "events"].

  • counts (pd.Series) – Number of unique subjects per group (after filtering).

vivainsights.create_survival.create_survival_viz(data, hrvar, figsize=(8, 6), title=None, subtitle=None, caption=None, linewidth=2.0)[source]

Name

create_survival_viz

Description

Render Kaplan-Meier survival step curves for each group in data.

param data:

Output of create_survival_calc, with at least [hrvar, “time”, “survival”].

type data:

pd.DataFrame

param hrvar:

Column name identifying the groups to plot.

type hrvar:

str

param figsize:

Matplotlib figure size in inches (width, height).

type figsize:

tuple of float, default (8, 6)

param title:

Figure-level title.

type title:

str, optional

param subtitle:

Smaller line beneath the title.

type subtitle:

str, optional

param caption:

Small text near the bottom of the figure (e.g., date range).

type caption:

str, optional

param linewidth:

Line width for the step curves.

type linewidth:

float, default 2.0

returns:

fig – The constructed matplotlib Figure.

rtype:

matplotlib.figure.Figure

vivainsights.create_survival.create_survival(data, time_col, event_col, hrvar=None, id_col='PersonId', mingroup=5, timeline=None, dropna=True, use_lifelines=True, return_type='plot', figsize=(8, 6), title=None, subtitle=None, caption=None)[source]

Name

create_survival

Description

High-level convenience wrapper to compute Kaplan-Meier curves and either:
  1. return the long survival table (return_type=”table”), or

  2. render the survival plot (return_type=”plot”).

The input data should be a person-level data frame (one row per person) as produced by create_survival_prep().

param data:

Person-level data frame (one row per person), as produced by create_survival_prep(), containing at least time_col and event_col.

type data:

pd.DataFrame

param time_col:

Duration-to-event column.

type time_col:

str

param event_col:

Event indicator column. Accepts numeric (>0 = event), boolean, or string tokens (“true”/”yes”/”1”, “false”/”no”/”0”).

type event_col:

str

param hrvar:

HR attribute column for separate survival curves. See “Grouping behavior”.

type hrvar:

str or None, default None

param id_col:

Unique subject identifier for mingroup counting.

type id_col:

str, default “PersonId”

param mingroup:

Minimum number of unique subjects per group.

type mingroup:

int, default 5

param timeline:

Times at which to report survival.

type timeline:

sequence of float, optional

param dropna:

Drop rows with NAs in required columns prior to calculation.

type dropna:

bool, default True

param use_lifelines:

Use lifelines.KaplanMeierFitter when available.

type use_lifelines:

bool, default True

param return_type:
  • “plot”: return a matplotlib Figure.

  • “table”: return the survival-long DataFrame.

type return_type:

{“plot”,”table”}, default “plot”

param figsize:

Figure size in inches (only used when return_type=”plot”).

type figsize:

tuple of float, default (8, 6)

param title:

Plot title. If None, a default is used.

type title:

str, optional

param subtitle:

Optional subtitle beneath the title.

type subtitle:

str, optional

param caption:

Caption text shown at the bottom of the figure. Note: the typical input (output of create_survival_prep) contains no date column, so date ranges cannot be extracted automatically. Pass the date range string manually if needed, e.g. via vi.extract_date_range(raw_data).

type caption:

str, optional

returns:
  • If return_type=”plot”: a Figure containing the survival curves.

  • If return_type=”table”: the long survival table.

rtype:

matplotlib.figure.Figure or pd.DataFrame