This function takes in a selected metric and uses z-score (number of standard deviations) to identify outliers across time. There are applications in this for identifying weeks with abnormally low collaboration activity, e.g. holidays. Time as a grouping variable can be overridden with the group_var argument.

identify_outlier(data, group_var = "Date", metric = "Collaboration_hours")

Arguments

data

A Standard Person Query dataset in the form of a data frame.

group_var

A string with the name of the grouping variable. Defaults to Date.

metric

Character string containing the name of the metric, e.g. "Collaboration_hours"

Value

Returns a data frame with Date (if grouping variable is not set), the metric, and the corresponding z-score.

Examples

identify_outlier(sq_data, metric = "Collaboration_hours")
#> # A tibble: 13 × 3
#>    Date       Collaboration_hours  zscore
#>    <chr>                    <dbl>   <dbl>
#>  1 1/12/2020                 22.6  1.02  
#>  2 1/19/2020                 22.8  1.09  
#>  3 1/26/2020                 20.0  0.0377
#>  4 1/5/2020                  17.5 -0.932 
#>  5 11/10/2019                21.9  0.766 
#>  6 11/17/2019                20.2  0.112 
#>  7 11/24/2019                19.6 -0.127 
#>  8 11/3/2019                 22.3  0.917 
#>  9 12/1/2019                 12.8 -2.69  
#> 10 12/15/2019                20.3  0.143 
#> 11 12/22/2019                20.5  0.214 
#> 12 12/29/2019                19.4 -0.190 
#> 13 12/8/2019                 19.0 -0.359