Reference - Functions¶

Submodules¶

vivainsights.check_inputs module¶

The function check_inputs checks if the required variables are present in the given data and raises an error if any of them are missing.

vivainsights.check_inputs.check_inputs(data: DataFrame, requirements: str)[source]¶

Name¶

check_inputs

Description¶

The function check_inputs checks if each variable in the requirements list is present as a column in the data object. If any variable is missing, it raises an error.

param data:: The data parameter is expected to be a pandas DataFrame object that contains the data to be checked.
type data:: pandas dataframe
param requirements:: The requirements parameter is a list of variables that are required to be present in the data object.
type requirements:: str list

Example

>>> check_inputs(iris, ["Sepal.Length", "Sepal.Width", "RandomVariable"])

vivainsights.color_codes module¶

This module defines an Enum class for colors and creates two color palettes using the defined colors.

class vivainsights.color_codes.Colors(value)[source]¶

Bases: Enum

An enumeration.

HIGHLIGHT_NEGATIVE = '#fe7f4f'¶

HIGHLIGHT_POSITIVE = '#34b1e2'¶

NEGATIVE_ALT_1 = '#fcf0eb'¶

NEGATIVE_ALT_2 = '#fbdacd'¶

NEGATIVE_ALT_3 = '#facebc'¶

NON_HIGHLIGHT = '#e1e1e1'¶

POSITIVE_ALT_1 = '#bfe5ee'¶

POSITIVE_ALT_2 = '#b4d5dd'¶

POSITIVE_ALT_3 = '#adc0cb'¶

PRIMARY = '#1d627e'¶

vivainsights.create_bar module¶

The code defines a function create_bar that calculates and visualizes the mean of a selected metric, grouped by a selected HR variable.

The metrics are first aggregated at a user-level prior to being aggregated at the level of the HR variable. The function create_bar returns either a plot object or a table, depending on the value passed to return_type.

vivainsights.create_bar.create_bar(data: DataFrame, metric: str, hrvar: str, mingroup: int = 5, percent: bool = False, return_type: str = 'plot', plot_title=None, plot_subtitle=None)[source]¶

Name¶

create_bar

Description¶

The function create_bar calculates and visualizes the mean of a selected metric, grouped by a selected HR variable. The metrics are first aggregated at a user-level prior to being aggregated at the level of the HR variable. create_bar returns either a plot object or a table, depending on the value passed to return_type. Internally, create_bar calls create_bar_viz() and create_bar_calc() to create the plot and calculate the mean of the selected metric, respectively.

param data:: Person query data.
type data:: pd.DataFrame
param metric:: Name of the metric to be analysed.
type metric:: str
param hrvar:: Name of the organizational attribute to be used for grouping.
type hrvar:: str
param mingroup:: Minimum group size. Defaults to 5.
type mingroup:: int, optional
param percent:: Whether to display values as percentages. Defaults to False.
type percent:: bool, optional
param return_type:: The type of output to return. Can be “plot” or “table”. Defaults to “plot”.
type return_type:: str, optional
param plot_title:: Title of the plot. Defaults to None.
type plot_title:: str, optional
param plot_subtitle:: Subtitle of the plot. Defaults to None.
type plot_subtitle:: str, optional
returns:: The output, either a plot or a table, depending on the value passed to return_type.
rtype:: Various

Example

>>> create_bar(pq_data, metric = "Collaboration_hours", hrvar = "LevelDesignation")

vivainsights.create_bar.create_bar_calc(data: DataFrame, metric: str, hrvar: str, mingroup=5, stats=False)[source]¶: Calculate the mean of a selected metric, grouped by a selected HR variable.

vivainsights.create_bar.create_bar_viz(data: DataFrame, metric: str, hrvar: str, mingroup=5, percent: bool = False, plot_title=None, plot_subtitle=None)[source]¶: Visualise the mean of a selected metric, grouped by a selected HR variable.

vivainsights.create_boxplot module¶

The function create_boxplot creates a boxplot visualization and summary table for a given metric and grouping variable in a dataset.

vivainsights.create_boxplot.create_boxplot(data: DataFrame, metric: str, hrvar: str = 'Organization', mingroup=5, return_type: str = 'plot')[source]¶

Name¶

create_boxplot

Description¶

This function creates a boxplot visualization and summary table for a given metric and HR variable in a pandas DataFrame.

param data:: A pandas DataFrame containing the data for analysis.
type data:: pandas dataframe
param metric:: The metric parameter is a string that represents the variable or metric for which you want to create the boxplot visualization and summary table. This variable should be present in the input data` DataFrame.
type metric:: str
param hrvar:: The hrvar parameter is the HR variable that you want to use for grouping the data. By default, it is set to “Organization”, but you can pass a different HR variable if needed.
type hrvar:: str, optional
param mingroup:: The mingroup parameter is an optional parameter that specifies the minimum number of observations required in each group for the boxplot to be created. If a group has fewer observations than the mingroup value, it will be excluded from the boxplot. The default value is 5.
type mingroup:: int, optional
param return_type:: The return_type parameter determines the type of output that the function will return. It can take one of three values:
type return_type:: str, optional
rtype:: The function create_boxplot returns different outputs based on the value of the return_type parameter

Example

>>> import vivainsights as vi
>>> pq_data = vi.load_pq_data()
>>> create_boxplot(pq_data, metric = "Collaboration_hours", hrvar = "Organization", return_type = "plot")

vivainsights.create_boxplot.create_boxplot_calc(data: DataFrame, metric, hrvar, mingroup)[source]¶

vivainsights.create_boxplot.create_boxplot_summary(data: DataFrame, metric, hrvar, mingroup)[source]¶

vivainsights.create_boxplot.create_boxplot_viz(data: DataFrame, metric, hrvar, mingroup)[source]¶

vivainsights.create_inc module¶

This module creates an incidence analysis reflecting the proportion of the population scoring above or below a specified threshold for a metric.

vivainsights.create_inc.create_inc(data: DataFrame, metric: str, hrvar: List, mingroup: int = 5, threshold: float | None = None, position: str | None = None, return_type: str = 'plot')[source]¶

Name¶

create_inc

Description¶

Create an incidence analysis reflecting proportion of population scoring above or below a threshold for a metric. An incidence analysis is generated, with each value in the table reflecting the proportion of the population that is above or below a threshold for a specified metric. There is an option to only provide a single hrvar in which a bar plot is generated, or two hrvar values where an incidence table (heatmap) is generated.

param data:

A Standard Person Query dataset in the form of a Pandas DataFrame.

type data:

pandas dataframe

param metric:

Name of the metric, e.g. “Collaboration_hours”.

type metric:

str

param hrvar:

Name(s) of the HR Variable(s) by which to split metrics.

type hrvar:

str or list

param mingroup:

Privacy threshold / minimum group size. Defaults to 5.

type mingroup:

int

param threshold:

Threshold value to split the data based on the position argument. Defaults to None.

type threshold:

float

param position:

One of the below valid values: - “above”: show incidence of those equal to or above the threshold - “below”: show incidence of those equal to or below the threshold

type position:

str

param return_type:

What to return. This must be one of the following strings: - “plot” - “table”

type return_type:

str

returns:

Output is returned depending on the value passed to the return_type argument
- “plot” (Matplotlib or Seaborn plot object)
- “table” (Pandas DataFrame)

raises ValueError:

If hrvar is not a string or list with at most length 2.:

Example

>>> import vivainsights as vi
>>> pq_data = vi.load_pq_data()
>>> vi.create_inc(
    pq_data,
    metric = 'Collaboration_hours',
    hrvar = 'LevelDesignation',
    mingroup = 5,
    threshold = 10,
    position = 'above',
    return_type = 'plot'
    )

vivainsights.create_inc.create_inc_bar(data: DataFrame, metric: str, hrvar: str, mingroup: int = 5, threshold: float | None = None, position: str | None = None, return_type: str = 'plot')[source]¶

Name¶

create_inc_bar

Description¶

Run create_inc with only single hrvar. Returning a bar chart

param data:

A Standard Person Query dataset in the form of a Pandas DataFrame.

type data:

pandas dataframe

param metric:

Name of the metric, e.g. “Collaboration_hours”.

type metric:

str

param hrvar:

Name of the HR Variable by which to split metrics.

type hrvar:

str

param mingroup:

Privacy threshold / minimum group size. Defaults to 5.

type mingroup:

int

param threshold:

Threshold value to split the data based on the position argument. Defaults to None.

type threshold:

float

param position:

One of the below valid values: - “above”: show incidence of those equal to or above the threshold - “below”: show incidence of those equal to or below the threshold

type position:

str

param return_type:

What to return. This must be one of the following strings: - “plot” - “table”

type return_type:

str

returns:

Output is returned depending on the value passed to the return_type argument
- “plot” (Matplotlib or Seaborn plot object)
- “table” (Pandas DataFrame)

raises ValueError:

If hrvar is not a string.:

Example

>>> create_inc_bar(data = pq_data, metric = "Collaboration_hours", hrvar = "LevelDesignation", threshold = 20, position = "below", return_type = "plot")

vivainsights.create_inc.create_inc_grid(data: DataFrame, metric: str, hrvar: List, mingroup: int = 5, threshold: float | None = None, position: str | None = None, return_type: str = 'plot')[source]¶

Name¶

create_inc_grid

Description¶

Run create_inc with two hrvar. Returning a heatmap

param data:

A Standard Person Query dataset in the form of a Pandas DataFrame.

type data:

pandas dataframe

param metric:

Name of the metric, e.g. “Collaboration_hours”.

type metric:

str

param hrvar:

Names of the HR Variables by which to split metrics.

type hrvar:

list

param mingroup:

Privacy threshold / minimum group size. Defaults to 5.

type mingroup:

int

param threshold:

Threshold value to split the data based on the position argument. Defaults to None.

type threshold:

float

param position:

One of the below valid values: - “above”: show incidence of those equal to or above the threshold - “below”: show incidence of those equal to or below the threshold

type position:

str

param return_type:

What to return. This must be one of the following strings: - “plot” - “table”

type return_type:

str

returns:

Output is returned depending on the value passed to the return_type argument
- “plot” (Matplotlib or Seaborn plot object)
- “table” (Pandas DataFrame)

raises ValueError:

If hrvar is not a list of length 2.:

vivainsights.create_IV module¶

vivainsights.create_IV.calculate_IV(data: DataFrame, outcome: str, predictor: str, bins: int)[source]¶

Name¶

calculate_IV

Description¶

Calculates Information Value (IV) between a single predictor variable and the outcome variable.

param data:: A DataFrame containing the data.
type data:: pd.DataFrame
param outcome:: Name of the outcome variable.
type outcome:: str
param predictor:: Name of the predictor variable.
type predictor:: str
param bins:: Number of bins for binning the predictor variable.
type bins:: int
returns:: A DataFrame with IV calculations for the predictor variable.
rtype:: pd.DataFrame
raises ValueError:: If the outcome variable has missing values in the input training data frame.

Examples

>>> import vivainsights as vi
>>> import pandas as pd
>>> data = pd.DataFrame({
...     'outcome': [1, 0, 1, 0, 1],
...     'predictor': [10, 20, 30, 40, 50]
... })
>>> outcome = 'outcome'
>>> predictor = 'predictor'
>>> bins = 5
>>> vi.calculate_IV(data, outcome, predictor, bins)

vivainsights.create_IV.create_IV(data=<class 'pandas.core.frame.DataFrame'>, predictors=None, outcome: str | None = None, bins: int = 5, siglevel=0.05, exc_sig: bool = False, return_type='plot')[source]¶

Name¶

create_IV

Description¶

Creates Information Value (IV) analysis for predictor variables.

param data:: DataFrame containing the data.
type data:: pd.DataFrame
param predictors:: List of predictor variables.
type predictors:: list, optional
param outcome:: Name of the outcome variable.
type outcome:: str
param bins:: Number of bins for binning the predictor variables. Defaults to 5.
type bins:: int, optional
param siglevel:: Significance level. Defaults to 0.05.
type siglevel:: float, optional
param exc_sig:: Boolean indicating if non-significant predictors should be excluded. Defaults to False.
type exc_sig:: bool, optional
param return_type:: Type of output to return (“plot”, “summary”, “list”, “plot-WOE”, “IV”). Defaults to “plot”.
type return_type:: str, optional
returns:: The type of output to return. Can be “plot”, “summary”, “list”, “plot-WOE”, or “IV”.
rtype:: Various

Note

>>> create_IV function return_type 'list' and 'summary' has output format as a dictionary, kindly use for loop to access the key and values.
>>> create_IV function return_type 'IV' has output format as a tuple, tuple element 'output_list'format is dictionary hence kindly use for loop to access the key and values.

Example

>>> import numpy as np

>>> 1. df["X"] = np.where(df["Internal_network_size"] > 40, 1, 0)
>>>    result = create_IV(df, predictors=["Email_hours",
>>>                            "Meeting_hours",
>>>                            "Chat_hours"
>>>                         ], outcome="X",exc_sig=False, return_type="IV")

>>> 2. df["X"] = np.where(df["Internal_network_size"] > 40, 1, 0)
>>>   result = create_IV(df, predictors=["Email_hours",
>>>                            "Meeting_hours",
>>>                            "Chat_hours"
>>>                         ], outcome="X",exc_sig=False, return_type="summary")

>>> 3. df["X"] = np.where(df["Internal_network_size"] > 40, 1, 0)
>>>   result = create_IV(df, predictors=["Email_hours",
>>>                            "Meeting_hours",
>>>                            "Chat_hours"
>>>                         ], outcome="X",exc_sig=False, return_type="plot")

vivainsights.create_IV.map_IV(data: DataFrame, outcome: str, predictors=None, bins: int = 5)[source]¶

Name¶

map_IV

Description¶

Maps Information Value (IV) calculations for multiple predictor variables. Calls calculate_IV() for every predictor-outcome variable pair.

param - data:

type - data:

DataFrame containing the data

param - outcome:

type - outcome:

Name of the outcome variable

param - predictors:

type - predictors:

List of predictor variables (if None, all numeric variables except outcome are used)

param - bins:

type - bins:

Number of bins for binning the predictor variables

rtype:

Dictionary containing IV calculations for each predictor variable and a summary DataFrame

vivainsights.create_IV.p_test(data: DataFrame, outcome: str, behavior: list, paired=False)[source]¶

Name¶

p_test

Description¶

Performs Wilcoxon signed-rank test or rank-sum test between two groups.

param data:: A Pandas DataFrame.
type data:: pd.DataFrame
param outcome:: Name of the outcome variable.
type outcome:: str
param behavior:: List of behavior variables to test.
type behavior:: list
param paired:: Boolean indicating if the test should be paired or not. Default is False.
type paired:: bool, optional
returns:: A DataFrame with variables and corresponding p-values.
rtype:: pd.DataFrame

Examples

>>> import vivainsights as vi
>>> import pandas as pd
>>> data = pd.DataFrame({
...     'outcome': [1, 0, 1, 0, 1],
...     'behavior1': [10, 20, 30, 40, 50],
...     'behavior2': [5, 15, 25, 35, 45]
... })
>>> outcome = 'outcome'
>>> behavior = ['behavior1', 'behavior2']
>>> vi.p_test(data, outcome, behavior)

vivainsights.create_IV.plot_WOE(IV, predictor)[source]¶

Name¶

plot_WOE

Description¶

Plots Weight of Evidence (WOE) for a predictor variable.

param IV:: Dictionary containing IV calculations for each predictor variable.
type IV:: dict
param predictor:: Name of the predictor variable.
type predictor:: str
returns:: This function doesn’t return a value; it plots the WOE.
rtype:: None

Examples

>>> import pandas as pd
>>> data = pd.DataFrame({
...     'outcome': [1, 0, 1, 0, 1],
...     'predictor': [10, 20, 30, 40, 50]
... })
>>> outcome = 'outcome'
>>> predictor = 'predictor'
>>> bins = 5
>>> IV = map_IV(data, outcome, [predictor], bins)
>>> plot_WOE(IV, predictor)

vivainsights.create_line module¶

This module visualizes the average of metric by sub-population over time. Returns a line plot showing the average of a selected metric by default. Additional options available to return a summary table.

vivainsights.create_line.create_line(data: DataFrame, metric: str, hrvar: str, mingroup=5, return_type: str = 'plot')[source]¶

Name¶

create_line

Description¶

Provides a week by week view of a selected metric, visualised as line charts.

param data:: person query data
type data:: pandas dataframe
param metric:: name of the metric to be analysed
type metric:: str
param hrvar:: name of the organizational attribute to be used for grouping
type hrvar:: str
param mingroup:: Numeric value setting the privacy threshold / minimum group size, by default 5
type mingroup:: int, optional
param return_type:: type of output to return. Defaults to “plot”.
type return_type:: str, optional
returns:: The output, either a plot or a table, depending on the value passed to return_type.
rtype:: Various

Example

>>> import vivainsights as vi
>>> pq_data = vi.load_pq_data()
>>> create_line(pq_data, metric = "Collaboration_hours", hrvar = "LevelDesignation")

vivainsights.create_line.create_line_calc(data: DataFrame, metric: str, hrvar: str, mingroup=5)[source]¶

vivainsights.create_line.create_line_viz(data: DataFrame, metric: str, hrvar: str, mingroup=5)[source]¶

vivainsights.create_lorenz module¶

This module calculates the Gini coefficient and plots the Lorenz curve for a given metric.

vivainsights.create_lorenz.compute_gini(x)[source]¶

Compute the Gini coefficient, a measure of statistical dispersion to represent inequality.

Parameters: x (list, np.ndarray, pd.Series): A numeric vector representing values (e.g., income, emails sent).

Returns: float: The Gini coefficient for the given vector of values.

Raises: ValueError: If input is not a numeric vector.

vivainsights.create_lorenz.create_lorenz(data, metric, return_type='plot')[source]¶

Name¶

create_lorenz

Description¶

Calculate and return the Lorenz curve and Gini coefficient for a given metric.

param data (pd.DataFrame):

type data (pd.DataFrame):

DataFrame containing the data to analyze.

param metric (str):

type metric (str):

The column name in the DataFrame representing the values to analyze.

param return_type (str):

“gini”: returns the Gini coefficient.
“table”: returns a DataFrame of cumulative population and value shares.
“plot” (default): displays a Lorenz curve plot with the Gini coefficient.

type return_type (str):

The type of output to return:

returns:

float/pd.DataFrame/None –

“gini”: returns the Gini coefficient.
“table”: returns a DataFrame of population and value shares.
“plot”: displays the Lorenz curve plot

rtype:

Depending on return_type:

raises ValueError:

If the metric is not found in the DataFrame, or if an invalid return_type is specified.:

Examples

Using pq_data from vi.load_pq_data(), which returns a DataFrame with an “Emails_sent” column.

>>> # Compute the Gini coefficient:
>>> vi.create_lorenz(data=vi.load_pq_data(), metric="Emails_sent", return_type="gini")

>>> # Compute the underlying table for the Lorenz curve:
>>> vi.create_lorenz(data=vi.load_pq_data(), metric="Emails_sent", return_type="table")

>>> # Plot the Lorenz curve
>>> vi.create_lorenz(data=vi.load_pq_data(), metric="Emails_sent", return_type="plot")

vivainsights.create_lorenz.get_value_proportion(df, population_share)[source]¶

Calculate the proportion of total values (e.g., income, email sent) that corresponds to a given cumulative share of the population.

Parameters: df (pd.DataFrame): DataFrame containing cumulative population and value proportions. population_share (float): The cumulative share of the population (between 0 and 1).

Returns: float: The proportion of total values corresponding to the given population share.

Raises: ValueError: If population_share is not between 0 and 1.

vivainsights.create_rank module¶

This module performs a rank operation on all groups across HR attributes for a selected Viva Insights metric.

vivainsights.create_rank.create_rank(data: DataFrame, metric: str, hrvar: str, mingroup=5, return_type: str = 'plot')[source]¶

Name¶

create_rank

Description¶

This function performs a rank operation on all groups across HR attributes for a specified metric.

param data:: person query data
type data:: pandas dataframe
param metric:: name of the metric to be analysed
type metric:: str
param hrvar:: name(s) of the organizational attribute(s) to be used for grouping
type hrvar:: str
param return_type:: type of output to return. Defaults to “plot”.
type return_type:: str or optional
rtype:: A plot or a table depending on the return_type argument.

Example

>>> import vivainsights as vi
>>> pq_data = vi.load_pq_data()
>>> create_rank(data = pq_data_small, hrvar = "FunctionType", metric = "Emails_sent", return_type = "plot")

vivainsights.create_rank.create_rank_calc(data: DataFrame, metric: str, hrvar=['Organization', 'FunctionType'], mingroup=5, stats=False)[source]¶

vivainsights.create_rank.create_rank_viz(data: DataFrame, metric, hrvar=['Organization', 'FunctionType', 'LevelDesignation', 'SupervisorIndicator'], mingroup=5)[source]¶

vivainsights.create_sankey module¶

vivainsights.create_sankey.create_sankey(data, var1, var2, count='n')[source]¶

Name¶

create_sankey

Description¶

Create a ‘networkD3’ style sankey chart based on a long count table with two variables. The input data should have three columns, where each row is a unique group: 1. Variable 1 2. Variable 2 3. Count

param data:: Data frame of the long count table.
type data:: dataframe
param var1:: String containing the name of the variable to be shown on the left.
type var1:: str
param var2:: String containing the name of the variable to be shown on the right.
type var2:: str
param count:: String containing the name of the count variable.
type count:: str
rtype:: A ‘sankeyNetwork’ and ‘htmlwidget’ object containing a two-tier sankey plot. The output can be saved locally with htmlwidgets::saveWidget().

Example

>>> create_sankey(data = pq_data, var1 = "Organization", var2 = "FunctionType")

vivainsights.create_trend module¶

The create_trend function provides a week by week view of a selected Viva Insights metric, allowing you to either return a week by week heatmap bar plot or a summary table. By default, create_trend returns a week by week heatmap bar plot, highlighting the points intime with most activity. Additional options available to return a summary table.

vivainsights.create_trend.create_trend(data: DataFrame, metric: str, palette=['#0c3c44', '#1d627e', '#34b1e2', '#bfe5ee', '#fcf0eb', '#fbdacd', '#facebc', '#fe7f4f'], hrvar: str = 'Organization', mingroup=5, return_type: str = 'plot', legend_title: str = 'Hours', date_column: str = 'MetricDate', date_format: str = '%Y-%m-%d')[source]¶

Name¶

create_trend

Description¶

This module provides a week by week view of a selected Viva Insights metric. By default returns a week by week heatmap bar plot, highlighting the points intime with most activity. Additional options available to return a summary table.

param data:: The input data as a pandas DataFrame.
type data:: panda dataframe
param metric:: The metric parameter is a string that represents the column name in the data DataFrame that contains the values to be plotted or analyzed. This could be any numerical metric such as sales, revenue, or number of hours worked.
type metric:: str
param palette:: The palette parameter is a list of colors that will be used to represent different groups in the trend plot. Each color in the list corresponds to a different group. By default, the palette includes 8 colors, but you can modify it to include more or fewer colors if needed.
type palette:: list
param hrvar:: hrvar is a string parameter that represents the variable used for grouping the data. In this case, it is used to group the data by organization. Defaults to Organization mingroup: The mingroup parameter is used to specify the minimum number of groups that should be present in the data for the trend analysis. If the number of unique values in the hrvar column is less than mingroup, the function will raise an error. Defaults to 5
type hrvar:: str
param return_type:: The return_type parameter determines the type of output that the function will return. It can have two possible values:. Defaults to plot
type return_type:: str
param legend_title:: The title for the legend in the plot. It is used to label the different categories or groups in the data. Defaults to Hours
type legend_title:: str
param date_column:: The name of the column in the DataFrame that contains the dates for the trend analysis. Defaults to MetricDate
type date_column:: str
param date_format:: The date_format parameter is used to specify the format of the dates in the date_column of the input data. It should be a string that follows the syntax of the Python datetime module’s strftime function. This allows you to specify how the dates are formatted in the. Defaults to %Y-%m-%d
type date_format:: str
rtype:: The function create_trend returns either a table or a plot, depending on the value of the return_type parameter.

Example

>>> import vivainsights as vi
>>> pq_data = vi.load_pq_data()
>>> create_trend(data = pq_data, metric = "Collaboration_hours", hrvar = "LevelDesignation")

vivainsights.create_trend.create_trend_calc(data, metric, hrvar, mingroup, date_column, date_format)[source]¶: Name¶

create_trend_calc

Description¶

This function creates a trend calculation by grouping data by a specified variable and calculating the mean of a specified metric over time.

vivainsights.create_trend.create_trend_viz(data: DataFrame, metric: str, palette, hrvar: str, mingroup, legend_title: str, date_column: str, date_format: str)[source]¶: Name¶

create_trend_viz

Description¶

This function creates a heatmap visualization of trends in a given metric by a specified variable over time.

vivainsights.export module¶

This module accepts a data frame or matplotlib figure object and exports it using the specified method/format. By default, a data frame is copied to the clipboard, and matplotlib objects are saved as PNG files.

vivainsights.export.export(x, file_format='clipboard', path='insights export', timestamp=True)[source]¶

Name¶

Description¶

Exports the data to the specified file format and saves it to the specified filename. A general use function to export ‘vivainsights’ outputs to CSV, clipboard, or save as images. By default, export() copies a data frame to the clipboard.

param x:

The object to export, which can be a data frame or a matplotlib figure object.

type x:

dataframe or matplotlib figure object

param file_format:

Character string specifying the method of export.

type file_format:

csv/png/svg/jpeg/pdf/clipboard

param path:

If exporting a file, enter the path and the desired file name. Defaults to “insights export”.

type path:

str or optional

param timestamp:

Logical vector specifying whether to include a timestamp in the file name. Defaults to True.

type timestamp:

bool or optional

returns:

A different output is returned depending on the value passed to the file_format
Argument
- `”clipboard”` (no return - data frame is saved to clipboard.)
- `”csv”` (CSV file containing data frame is saved to specified path.)
- `”png”` (PNG file containing ‘’ object is saved to specified path.)
- `”svg”` (SVG file containing ‘’ object is saved to specified path.)
- `”jpeg”` (JPEG file containing ‘’ object is saved to specified path.)
- `”pdf”` (PDF file containing ‘’ object is saved to specified path.)

vivainsights.extract_date_range module¶

vivainsights.extract_date_range.extract_date_range(data: DataFrame, return_type: str = 'table')[source]¶

Name¶

extract_date_range

Description¶

The function extract_date_range extracts the date range from a dataframe and returns it either as a table or as a text string.

param data:

The data parameter is a pandas DataFrame that contains the data from which you want to extract the date range. It should have at least one column that represents the date

type data:

pandas dataframe

param return_type:

The return_type parameter is a string that specifies the format in which the date range should be returned. It has two possible values:, defaults to table

type return_type:

str

returns:

The function extract_date_range returns either a pandas DataFrame or a string, depending
on the value of the return_type parameter.

vivainsights.extract_hr module¶

This module extracts HR attributes (organizational data) through a combination of detecting variable class, number of unique values, regular expressions. There is an option to return either just a list of the variable names or a DataFrame containing only the variables themselves.

vivainsights.extract_hr.extract_hr(data: DataFrame, max_unique: int = 50, exclude_constants: bool = True, return_type: str = 'names')[source]¶

Name¶

extract_hr

Description¶

The function extract_hr extracts HR attributes (organizational data) through a combination of detecting variable class,

param data:: Contains the data to extract HR (highly-recurring) variables from
type data:: pandas dataframe
param max_unique:: The maximum number of unique values a column can have to be included in the output, defaults to 50 (optional)
type max_unique:: int
param exclude_constants:: A boolean value (True/False) indicating whether to exclude columns with constant values or not. If True, columns with constant values will be excluded. If False, all columns will be included regardless of whether they have constant values or not, defaults to True (optional)
type exclude_constants:: boolean
param return_type:: The type of output to be returned, either “names” or “vars”. If “names”, the function will return the names of the columns that meet the specified criteria. If “vars”, the function will return the actual columns that meet the specified criteria, defaults to names (optional)
type return_type:: str
rtype:: The function is not returning anything. It is printing the column names of the object columns in the filtered dataframe.

Example

>>> vi.extract_hr(
data = pq_data
)

>>> vi.extract_hr(
data = pq_data,
return_type = "vars"
)

vivainsights.g2g_data module¶

This module returns a data frame containing a group-to-group query.

vivainsights.g2g_data.load_g2g_data()[source]¶

vivainsights.hrvar_count module¶

This module generates a count of the distinct persons in the data population. Returns a bar plot of the counts by default, with an option to return a summary table.

vivainsights.hrvar_count.hrvar_count(data: DataFrame, hrvar: str = 'Organization', return_type: str = 'plot')[source]¶

Name¶

hrvar_count

Description¶

This function generates a count of the distinct persons in the data population, grouped by a selected HR variable.

param data:: person query data
type data:: ppandas dataframe
param hrvar:: name of the organizational attribute to be used for grouping
type hrvar:: str
param return_type:: type of output to return. Defaults to “plot”.
type return_type:: str or optional

Example

>>> hrvar_count(pq_data, hrvar = "LevelDesignation")

vivainsights.hrvar_count.hrvar_count_calc(data: DataFrame, hrvar: str)[source]¶: Calculate the number of distinct persons in the data population, grouped by a selected HR variable.

vivainsights.hrvar_count.hrvar_count_viz(data: DataFrame, hrvar: str)[source]¶: Visualise the number of distinct persons in the data population, grouped by a selected HR variable.

vivainsights.identify_churn module¶

This module identifies and counts the number of employees who have churned from the dataset. This is done by measuring whether an employee who is present in the first n (n1) weeks of the data, is also present in the last n (n2) weeks of the data. An additional use case of this function is the ability to identify “new-joiners” by using the argument flip.

vivainsights.identify_churn.identify_churn(data: DataFrame, n1=6, n2=6, return_type: str = 'message', flip=False, date_column: str = 'MetricDate', date_format='%Y-%m-%d')[source]¶

Name¶

identify_churn

Description¶

This module identifies and counts the number of employees who have churned from the dataset.

param data:

The dataframe to export

type data:

pandas dataframe

param n1:

First n weeks of data to check for the person’s presence

type n1:

int

param n2:

Last n weeks of data to check for the person’s presence

type n2:

int

param return_type:

Type of return expected

type return_type:

str

param flip:

Flag to switch between identifying churned users vs new users

type flip:

boolean

param date_column:

DateTime column based on which churn is calculated, defaults to MetricDate for Nova

type date_column:

str

param date_format:

DateTime format in input file, defaults to YYYY-mm-dd

type date_format:

datetime

returns:

A different output is returned depending on the value passed to the return_type argument
- “message”` (Message on console. A diagnostic message.)
- “text”` (String. A diagnostic message.)
- “data”` (Character vector containing the the PersonId of employees who have been identified as churned.)

vivainsights.identify_daterange module¶

Takes a vector of dates and identify whether the frequency is ‘daily’, ‘weekly’, or ‘monthly’. The primary use case for this function is to provide an accurate description of the query type used and for raising errors should a wrong date grouping be used in the data input.

vivainsights.identify_daterange.identify_datefreq(x)[source]¶

vivainsights.identify_holidayweeks module¶

This function scans a standard query output for weeks where collaboration hours is far outside the mean. Returns a list of weeks that appear to be holiday weeks and optionally an edited dataframe with outliers removed. By default, missing values are excluded.

vivainsights.identify_holidayweeks.identify_holidayweeks(data: DataFrame, sd=1, return_type='text')[source]¶

” Name —– identify_holidayweeks

Description¶

Identify Holiday Weeks based on outliers. This function scans a standard query output for weeks where collaboration hours is far outside the mean. Returns a list of weeks that appear to be holiday weeks and optionally an edited dataframe with outliers removed. By default, missing values are excluded.

As best practice, run this function prior to any analysis to remove atypical collaboration weeks from your dataset.

param data:

A Standard Person Query dataset in the form of a data frame.

type data:

pandas dataframe

param sd:

The standard deviation below the mean for collaboration hours that should define an outlier week. Enter a positive number. Default is 1 standard deviation.

type sd:

int

param return_type:

String specifying what to return. This must be one of the following strings: - “text” (default) - “labelled_data” or “dirty_data” or “data_dirty” - “cleaned_data” or “data_cleaned” - “holidayweeks_data” - “plot”

type return_type:

str

returns:

A different output is returned depending on the value passed to return_type
text (str) – A message is printed identifying holiday weeks.
data_cleaned / cleaned_data (pandas dataframe) – A dataset with outlier weeks removed is returned.
data_dirty / dirty_data / labelled_data (pandas dataframe) – A dataset with only outlier weeks is returned.
holidayweeks_data (pandas dataframe) – A dataset with only outlier weeks is returned.
plot (matplotlib plot) – A line plot of Collaboration Hours with holiday weeks highlighted.

Examples

>>> identify_holidayweeks(pq_data, sd = .75, return_type = "text")
"The weeks where collaboration was 0.75 standard deviations below the mean (18.7) are: `05/22/2022`"

>>> identify_holidayweeks(pq_data, sd = .75, return_type = "plot")

>>> identify_holidayweeks(pq_data, sd = .75, return_type = "cleaned_data")

>>> identify_holidayweeks(pq_data, sd = .75, return_type = "holidayweeks_data")

vivainsights.identify_inactiveweeks module¶

The function identify_inactiveweeks identifies weeks where collaboration hours are more than a specified number of standard deviations below the mean and returns the result in the specified format.

vivainsights.identify_inactiveweeks.identify_inactiveweeks(data: DataFrame, sd=2, return_type='text')[source]¶

Name¶

identify_inactiveweeks

Description¶

The function identify_inactiveweeks identifies weeks where collaboration hours are more than a specified number of standard deviations below the mean and returns the result in the specified format.

param data:

The data parameter is a pandas DataFrame that contains the following columns:

type data:

pandas dataframe

param sd:

The sd parameter stands for the number of standard deviations below the mean that is considered as inactive. In this code, it is used to identify weeks where the collaboration hours are more than sd standard deviations below the mean, defaults to 2 (optional)

type sd:

int

param return_type:

The return_type parameter determines the type of output that the function will return.

It can have the following values:, defaults to text (optional)

‘text’: Returns a string with the number of inactive weeks.
‘data_dirty’ or ‘dirty_data’: Returns a Pandas DataFrame with the rows that are inactive.
‘data_cleaned’ or ‘cleaned_data’: Returns a Pandas DataFrame with the rows that are not inactive.
‘plot’: Returns a plot showing the number of inactive weeks for each user.
‘data’: Returns a Pandas DataFrame with the number of inactive weeks for each user.

The default value is ‘text’.

type return_type:

str

rtype:

The function identify_inactiveweeks returns different outputs based on the value of the return_type parameter.

vivainsights.identify_nkw module¶

vivainsights.identify_nkw.identify_nkw(data: DataFrame, collab_threshold=5, return_type='data_summary')[source]¶

Name¶

identify_nkw

Description¶

Identifies non-knowledge workers based on their average collaboration hours. This function groups the input data by ‘PersonId’ and ‘Organization’, calculates the mean collaboration hours for each group, and flags those with average collaboration hours below a specified threshold as non-knowledge workers. It then calculates the proportion of non-knowledge workers in each organization.

param data (pd.DataFrame):
type data (pd.DataFrame):: The input data. Must contain the columns ‘PersonId’, ‘Organization’, and ‘Collaboration_hours’.
param collab_threshold (int:
type collab_threshold (int:: The threshold for average collaboration hours below which a person is considered a non-knowledge worker. Defaults to 5.
param optional):
type optional):: The threshold for average collaboration hours below which a person is considered a non-knowledge worker. Defaults to 5.
param return_type (str:
type return_type (str:: Specifies the type of data to return.
param optional):
type optional):: Specifies the type of data to return.
param - If ‘data_with_flag’:
param returns the input data with an additional ‘flag_nkw’ column indicating whether each person is a non-knowledge worker.:
param - If ‘data_summary’:
param returns a summary of the number and proportion of non-knowledge workers in each organization.:
param - If ‘text’:
param returns a text summary of the number and proportion of non-knowledge workers in each organization.:
param Defaults to ‘data_summary’.:
returns:: pd.DataFrame
rtype:: The output data, as specified by the ‘return_type’ parameter.

Example

>>> vi.identify_nkw(
        data = pq_data,
        collab_threshold=15,
        return_type = 'text'
    )

vivainsights.identify_outlier module¶

This function takes in a selected metric and uses the z-score (number of standard deviations) to identify outliers across time. There are applications in this for identifying weeks with abnormally low collaboration activity, e.g. holidays. Time as a grouping variable can be overridden with the group_var argument.

vivainsights.identify_outlier.identify_outlier(data: DataFrame, group_var='MetricDate', metric='Collaboration_hours')[source]¶

Name¶

identify_outlier

Description¶

param data:

A Standard Person Query dataset in the form of a pandas dataframe.

type data:

pandas dataframe

param group_var:

A string with the name of the grouping variable. Default: MetricDate.

type group_var:

str

param metric:

A string containing the name of the metric (e.g., “Collaboration_hours”)

type metric:

str

returns:

A dataframe with MetricDate (if grouping variable is not set),
the metric, and the corresponding z-score.

Example

>>> identify_outlier(data, group_var = "MetricDate", metric = "Collaboration_hours")

vivainsights.identify_tenure module¶

The identify_tenure function calculates and summarizes employee tenure based on hire and metric dates, and provides various options for returning the results.

vivainsights.identify_tenure.identify_tenure(data: DataFrame, beg_date='HireDate', end_date='MetricDate', maxten=40, return_type='message', date_format='%Y-%m-%d')[source]¶

Name¶

identify_tenure

Description¶

The function identify_tenure calculates and summarizes employee tenure based on hire and metric dates, and provides various options for returning the results.

param data:

The data parameter is a pandas DataFrame that contains the employee data. It should have columns for the hire date (beg_date) and the metric date (end_date).

type data:

pandas dataframe

param beg_date:

The beg_date parameter is the name of the column in the DataFrame that represents the start date of employment for each employee. By default, it is set to “HireDate”.

type beg_date:

optional

param end_date:

The end_date parameter is the name of the column in the data DataFrame that represents the end date of the tenure period for each employee.

type end_date:

optional

param maxten:

The maxten parameter is used to specify the maximum tenure in years. Employees with a tenure greater than or equal to maxten will be considered as “odd” employees.

type maxten:

optional

param return_type:

The return_type parameter determines the type of output that the function will return. It can have the following values: - “message” (default) - “plot” - “data_cleaned” - “data_dirty” - “data” - “text”

type return_type:

optional

param date_format:

The date_format parameter is used to specify the format of the date strings in the beg_date and end_date columns of the input DataFrame. It is set to “%Y-%m-%d” by default, which represents the format “YYYY-MM-DD”.

type date_format:

optional

returns:

The function identify_tenure returns different outputs based on the value of the return_type
parameter. The possible return values are

vivainsights.import_query module¶

This function imports a Viva Insights Query from a .csv file and optimizes the variable classifications for other functions in the package. The function takes in a file path (x) and an optional encoding parameter (default is ‘utf-8’). It checks if the file is a .csv file, reads in the file using pandas, cleans the column names by removing spaces and special characters, and returns the resulting data as a pandas dataframe. If there is an error reading the file, the function prints an error message.

vivainsights.import_query.import_query(x, encoding: str = 'utf-8')[source]¶

Name¶

import_query

Description¶

The function import_query reads a CSV file, removes leading and trailing spaces from column names, and replaces spaces and special characters with underscores in column names.

param x:: The parameter x is the input file name or path. It should be a string representing the file name or path of the CSV file you want to import.
type x:: str
param encoding:: The encoding parameter specifies the character encoding to be used when reading the CSV file. The default value is ‘utf-8’, which is a widely used encoding for text files. However, you can specify a different encoding if needed.
type encoding:: str, optional
rtype:: The variable data if the input file is a valid CSV file. If the input file is not a valid CSVfile, the function will print an error message and return None.

vivainsights.mt_data module¶

This module returns a data frame containing a meeting query.

vivainsights.mt_data.load_mt_data()[source]¶

vivainsights.network_g2g module¶

This module returns a network plot given a data frame containing a group-to-group query.

vivainsights.network_g2g.network_g2g(data, primary=None, secondary=None, metric='Group_collaboration_time_invested', algorithm='fr', node_colour='lightblue', exc_threshold=0.1, org_count=None, node_scale=1, edge_scale=10, subtitle='Collaboration Across Organizations', return_type='plot')[source]¶

Name¶

network_g2g

Description¶

This function returns a network plot given a data frame containing a group-to-group query.

param data:

Data frame containing a group-to-group query.

type data:

data frame

param primary:

String containing the variable name for the Primary Collaborator column.

type primary:

str

param secondary:

String containing the variable name for the SecondaryCollaborator column.

type secondary:

str

param metric:

String containing the variable name for metric. Defaults to Meeting_Count.

type metric:

str

param algorithm:

String to specify the node placement algorithm to be used. - Defaults to “fr” for the force-directed algorithm of Fruchterman and Reingold. - See <https://rdrr.io/cran/ggraph/man/layout_tbl_graph_igraph.html> for a full list of options.

type algorithm:

str

param node_colour:

String or named vector to specify the colour to be used for displaying nodes. - Defaults to “lightblue”. - If “vary” is supplied, a different colour is shown for each node at random. - If a named dictionary is supplied, the names must match the values of the variable provided for the primary and secondary columns. - See example section for details.

type node_colour:

str or dictionary

param exc_threshold:

Defaults to 0.1, which means that the plot will only display collaboration above 10% of a node’s total collaboration.
This argument has no impact on “data” or “table” return.

type exc_threshold:

Numeric value between 0 and 1 specifying the exclusion threshold to apply.

param org_count:

Optional data frame to provide the size of each organizationin the secondary attribute. - The data frame should contain only two columns: - Name of the secondary attribute excluding any prefixes, e.g. “Organization”. - Must be of character or factor type. “n”. Must be of numeric type. - Defaults to None, where node sizes will be fixed.

type org_count:

optional

param node_scale:

type node_scale:

Numeric value controlling the size of the nodes. 1 keeps the size of the nodes as is.

param edge_scale:

type edge_scale:

Numeric value controlling the width of the edges. 1 keeps the size of the edges as is. Defaults to 10.

param subtitle:

String to override default plot subtitle.

type subtitle:

str

param return_type:

String specifying what to return. This must be one of the following strings: - “plot” - “table” - “network” - “data” - Defaults to “plot”.

type return_type:

str

returns:

A different output is returned depending on the value passed to the return argument
- `”plot”` (‘ggplot’ object. A group-to-group network plot.)
- `”table”` (data frame. An interactive matrix of the network.)
- `”network` (‘igraph’ object used for creating the network plot.)
- `”data”` (data frame. A long table of the underlying data.)

Example

>>> network_g2g(data = vi.load_g2g_data(), metric = "Group_meeting_count")
# Return a network visual

>>> network_g2g(data = vi.load_g2g_data(), return_type = "table")
# Return the interaction matrix

>>> network_g2g(data = vi.load_g2g_data(), exc_threshold = 0)
# Return a network visual with no exclusion threshold

vivainsights.network_g2g.setColor(node_colour, org)[source]¶

vivainsights.network_p2p module¶

This module performs network analysis with a person-to-person query

vivainsights.network_p2p.network_p2p(data, hrvar='Organization', return_type='plot', centrality=None, community=None, weight=None, comm_args=None, layout='mds', path='', style='igraph', bg_fill='#FFFFFF', font_col='grey20', legend_pos='best', palette='rainbow', node_alpha=0.7, edge_alpha=1, edge_col='#777777', node_sizes=[1, 20], node_scale=1, seed=1, legend_ncols=0)[source]¶

Name¶

network_p2p

Description¶

This function returns a network plot given a data frame containing a person-to-person query.

param data:: Data frame containing a person-to-person query.
type data:: dataframe
param hrvar:: String containing the label for the HR attribute.
type hrvar:: str
param return_type:: A different output is returned depending on the value passed to the return_type argument: - ‘plot’ (default) - ‘plot-pdf’ - ‘sankey’ - ‘table’ - ‘data’ - ‘network’
type return_type:: str
param centrality:: string to determines which centrality measure is used to scale the size of the nodes. All centrality measures are automatically calculated when it is set to one of the below values, and reflected in the ‘network’ and ‘data’ outputs. Measures include: - betweenness - closeness - degree - eigenvector - pagerank When centrality is set to None, no centrality is calculated in the outputs and all the nodes would have the same size.
type centrality:: str
param community:: String determining which community detection algorithms to apply. Valid values include: - None (default): compute analysis or visuals without computing communities. - “multilevel” (a version of louvain) - “leiden” - “edge_betweenness” - “fastgreedy” - “infomap” - “label_propagation” - “leading_eigenvector” - “optimal_modularity” - “spinglass” - “walk_trap”
type community:: str
param weight:: String to specify which column to use as weights for the network. To create a graph without weights, supply None to this argument.
type weight:: str
param comm_args:: list containing the arguments to be passed through to igraph’s clustering algorithms. Arguments must be named. See examples section on how to supply arguments in a named list.
type comm_args:: list
param layout:: String to specify the node placement algorithm to be used. Defaults to “mds” for the deterministic multi-dimensional scaling of nodes. See <https://rdrr.io/cran/ggraph/man/layout_tbl_graph_igraph.html> for a full list of options.
type layout:: str
param path:: File path for saving the PDF output. Defaults to a timestamped path based on current parameters.
type path:: str (file path)
param bg_fill:: String to specify background fill color.
type bg_fill:: str
param font_col:: String to specify font color.
type font_col:: str
param legend_pos:: String to specify position of legend. Valid values include: String to specify position of legend. Valid values include: - “best” - “upper right” - “upper left” - “lower left” - “right” - “center left” - “center right” - “lower center” - “upper center” - “center”
type legend_pos:: str
param palette:: String specifying the function to generate a color palette with a single argument n. Uses “rainbow” by default.
type palette:: str
param node_alpha:: A numeric value between 0 and 1 to specify the transparency of the nodes. Defaults to 0.7.
type node_alpha:: int

:param : A numeric value between 0 and 1 to specify the transparency of the edges (only for ‘ggraph’ mode). Defaults to 1. :type : param edge_alpha : int :param edge_col: :type edge_col: String to specify edge link color. :param node_sizes: Numeric vector of length two to specify the range of node sizes to rescale to, when centrality is set to a non-null value. :type node_sizes: int :param node_scale: A numeric value to multiply or divide the size of the nodes.

This is applied to the ‘node_size’ attribute in the graph to increase or decrease the size of the nodes.

type node_scale:

int

param seed:

Seed for the random number generator passed to either set.seed() when the louvain or leiden community detection algorithm is used, to ensure consistency. Only applicable when community is set to one of the valid non-null values.

type seed:

int

param legend_ncols:

Value is either 0 or 1, Parameter to change the orientation horizontal to vertical of legend in the plot.

type legend_ncols:

int

returns:

A different output is returned depending on the value passed to the return_type argument
- `’plot’` (return a network plot, interactively within R.)
- `’plot-pdf’` (save a network plot as PDF. This option is recommended when the graph is large, which make take a long time to run if return_type = ‘plot’ is selected. Use this together with path to control the save location.)
- `’sankey’` (return a sankey plot combining communities and HR attribute. This is only valid if a community detection method is selected at community`.)
- `’table’` (return a vertex summary table with counts in communities and HR attribute. When centrality is non-NULL, the average centrality values are calculated per group.)
- `’data’` (return a vertex data file that matches vertices with communities and HR attributes.)
- `’network’` (return ‘igraph’ object.)

Examples

>>> vi.network_p2p(data = p2p_data, return_type = "plot")
# Return a network visual

>>> vi.network_p2p(data = p2p_data, community = "leiden", comm_args = {"resolution": 0.01}, return_type = "table")
# Return the vertex table with counts in communities and HR attribute
# Resolution is set to a low value to yield fewer communities

>>> vi.network_p2p(data = p2p_data, centrality = "betweenness", return_type = "table")
# Return the vertex table with centrality calculations

>>> vi.network_p2p(
    data = p2p_data, # or whatever your query is stored
    node_scale = 50, # adjust this parameter to make nodes bigger/smaller
    return_type = "plot"
    )

>>> vi.network_p2p(
    data=p2p_data, # or whatever your query is stored
    return_type = "sankey", # another return type for visualization
    centrality = "betweenness", # centrality can be set as per requirement
    community = "leiden" # Adjust community
    )
# Return the sankey output based on centrality and community

>>> vi.network_p2p(
    data=p2p_data, # or whatever your query is stored
    return_type = "plot",
    font_col = "grey20", # Color change option for fonts in chart
    legend_pos = "upper left", # Adjust the legend position using this parameter
    legend_ncols = 1 # Adjust this parameter to 0 or 1 to change legend orientation from vertical to horizontal
    )
# Return the plot output based on different color scheme, legend orientation and position, font color change

vivainsights.network_summary module¶

This module summarises node centrality statistics with an igraph object

vivainsights.network_summary.network_summary(graph, hrvar=None, return_type='table')[source]¶

Name¶

network_summary

Description¶

This function summarises node centrality statistics with an igraph object.

param graph:

‘igraph’ object that can be returned from network_g2g() or network_p2p() when the return argument is set to “network”.

type graph:

igraph object

param hrvar:

String containing the name of the HR Variable by which to split metrics. Defaults to None.

type hrvar:

str

param return_type:

String specifying what output to return. Valid inputs include: - “table” - “network” - “plot”

type return_type:

str

returns:

By default, a data frame containing centrality statistics. Available statistics include
- `betweenness` (number of shortest paths going through a node.)
- `closeness` (number of steps required to access every other node from a given node.)
- `degree` (number of connections linked to a node.)
- `eigenvector` (a measure of the influence a node has on a network.)
- `pagerank` (calculates the PageRank for the specified vertices.)

Examples

>>> graph = network_g2g(data = vi.load_g2g_data(), return_type = "network")
>>> network_summary(graph, hrvar = "Organization", return_type = "table")

vivainsights.p2g_data module¶

This module returns a data frame containing a person-to-group query.

vivainsights.p2g_data.load_p2g_data()[source]¶

vivainsights.p2p_data module¶

This module returns a data frame containing a person-to-person query.

vivainsights.p2p_data.load_p2p_data()[source]¶

vivainsights.p2p_data_sim module¶

Generate an person-to-person query / edgelist based on the graph according to the Watts-Strogatz small-world network model. Organizational data fields are also simulated for Organization, LevelDesignation, and City data frame with the same column structure as a person-to-person flexible query. This has an edgelist structure and can be used directly as an input to network_p2p().

vivainsights.p2p_data_sim.p2p_data_sim(dim=1, size=300, nei=5, p=0.05)[source]¶

vivainsights.pq_data module¶

This module returns a data frame containing a person query.

vivainsights.pq_data.load_pq_data()[source]¶

vivainsights.totals_col module¶

The function totals_col adds a new column with a specified total value to a given pandas DataFrame.

vivainsights.totals_col.totals_col(data: DataFrame, total_value: str = 'Total')[source]¶

Name¶

totals_col

Description¶

The function totals_col adds a new column with a specified total value to a given pandas DataFrame.

param data:: A pandas DataFrame that represents the data you want to add a totals column to.
type data:: pandas dataframe
param total_value:: The total_value parameter is a string that represents the name of the new column that will be
type total_value:: optional
param added to the DataFrame. By default:
param it is set to ‘Total’.:
rtype:: The function totals_col returns the modified DataFrame data with a new column added.

vivainsights.us_to_space module¶

The function replaces underscores with spaces in a given string.

param string:: A string that may contain underscores that need to be replaced with spaces
return:: The function us_to_space takes a string as input and replaces all underscores with spaces

using the replace method. It then returns the modified string with spaces instead of underscores.

vivainsights.us_to_space.us_to_space(string)[source]¶