This function takes in a selected metric and uses
z-score (number of standard deviations) to identify outliers
across time. There are applications in this for identifying
weeks with abnormally low collaboration activity, e.g. holidays.
Time as a grouping variable can be overridden with the group_var
argument.
identify_outlier(data, group_var = "Date", metric = "Collaboration_hours")Returns a data frame with Date (if grouping variable is not set),
the metric, and the corresponding z-score.
Other Data Validation:
check_query(),
extract_hr(),
flag_ch_ratio(),
flag_em_ratio(),
flag_extreme(),
flag_outlooktime(),
hr_trend(),
hrvar_count(),
hrvar_count_all(),
hrvar_trend(),
identify_churn(),
identify_holidayweeks(),
identify_inactiveweeks(),
identify_nkw(),
identify_privacythreshold(),
identify_query(),
identify_shifts(),
identify_shifts_wp(),
identify_tenure(),
remove_outliers(),
standardise_pq(),
subject_validate(),
subject_validate_report(),
track_HR_change(),
validation_report()
identify_outlier(sq_data, metric = "Collaboration_hours")
#> # A tibble: 7 × 3
#>   Date       Collaboration_hours  zscore
#>   <chr>                    <dbl>   <dbl>
#> 1 1/12/2020                 22.6  1.04  
#> 2 1/19/2020                 23.1  1.32  
#> 3 1/26/2020                 20.3 -0.205 
#> 4 1/5/2020                  17.6 -1.66  
#> 5 12/15/2019                20.8  0.0627
#> 6 12/22/2019                20.8  0.0705
#> 7 12/29/2019                19.5 -0.628