Skip to contents

This function takes in a selected metric and uses z-score (number of standard deviations) to identify outliers across time. There are applications in this for identifying weeks with abnormally low collaboration activity, e.g. holidays. Time as a grouping variable can be overridden with the group_var argument.

Usage

identify_outlier(
  data,
  group_var = "MetricDate",
  metric = "Collaboration_hours"
)

Arguments

data

A Standard Person Query dataset in the form of a data frame.

group_var

A string with the name of the grouping variable. Defaults to Date.

metric

Character string containing the name of the metric, e.g. "Collaboration_hours"

Value

Returns a data frame with MetricDate (if grouping variable is not set), the metric, and the corresponding z-score.

Examples

identify_outlier(pq_data, metric = "Collaboration_hours")
#> # A tibble: 10 × 3
#>    MetricDate Collaboration_hours   zscore
#>    <date>                   <dbl>    <dbl>
#>  1 2022-05-01                19.4  0.601  
#>  2 2022-05-08                18.0 -0.621  
#>  3 2022-05-15                19.9  1.04   
#>  4 2022-05-22                17.6 -0.997  
#>  5 2022-05-29                18.0 -0.645  
#>  6 2022-06-05                18.7 -0.00241
#>  7 2022-06-12                21.1  2.20   
#>  8 2022-06-19                17.9 -0.736  
#>  9 2022-06-26                18.1 -0.541  
#> 10 2022-07-03                18.4 -0.299