This function takes in a selected metric and uses z-score (number of standard deviations) to identify outliers across time. There are applications in this for identifying weeks with abnormally low collaboration activity, e.g. holidays. Time as a grouping variable can be overridden with the group_var argument.

identify_outlier(data, group_var = "Date", metric = "Collaboration_hours")

Arguments

data

A Standard Person Query dataset in the form of a data frame.

group_var

A string with the name of the grouping variable. Defaults to Date.

metric

Character string containing the name of the metric, e.g. "Collaboration_hours"

Value

Returns a data frame with Date (if grouping variable is not set), the metric, and the corresponding z-score.

Examples

identify_outlier(sq_data, metric = "Collaboration_hours")
#> # A tibble: 7 × 3
#>   Date       Collaboration_hours  zscore
#>   <chr>                    <dbl>   <dbl>
#> 1 1/12/2020                 22.6  1.04  
#> 2 1/19/2020                 23.1  1.32  
#> 3 1/26/2020                 20.3 -0.205 
#> 4 1/5/2020                  17.6 -1.66  
#> 5 12/15/2019                20.8  0.0627
#> 6 12/22/2019                20.8  0.0705
#> 7 12/29/2019                19.5 -0.628