This function takes in a selected metric and uses
z-score (number of standard deviations) to identify outliers
across time. There are applications in this for identifying
weeks with abnormally low collaboration activity, e.g. holidays.
Time as a grouping variable can be overridden with the group_var
argument.
identify_outlier(data, group_var = "Date", metric = "Collaboration_hours")
A Standard Person Query dataset in the form of a data frame.
A string with the name of the grouping variable.
Defaults to Date
.
Character string containing the name of the metric, e.g. "Collaboration_hours"
Returns a data frame with Date
(if grouping variable is not set),
the metric, and the corresponding z-score.
Other Data Validation:
check_query()
,
extract_hr()
,
flag_ch_ratio()
,
flag_em_ratio()
,
flag_extreme()
,
flag_outlooktime()
,
hr_trend()
,
hrvar_count_all()
,
hrvar_count()
,
hrvar_trend()
,
identify_churn()
,
identify_holidayweeks()
,
identify_inactiveweeks()
,
identify_nkw()
,
identify_privacythreshold()
,
identify_query()
,
identify_shifts_wp()
,
identify_shifts()
,
identify_tenure()
,
remove_outliers()
,
standardise_pq()
,
subject_validate_report()
,
subject_validate()
,
track_HR_change()
,
validation_report()
identify_outlier(sq_data, metric = "Collaboration_hours")
#> # A tibble: 13 × 3
#> Date Collaboration_hours zscore
#> <chr> <dbl> <dbl>
#> 1 1/12/2020 22.6 1.02
#> 2 1/19/2020 22.8 1.09
#> 3 1/26/2020 20.0 0.0377
#> 4 1/5/2020 17.5 -0.932
#> 5 11/10/2019 21.9 0.766
#> 6 11/17/2019 20.2 0.112
#> 7 11/24/2019 19.6 -0.127
#> 8 11/3/2019 22.3 0.917
#> 9 12/1/2019 12.8 -2.69
#> 10 12/15/2019 20.3 0.143
#> 11 12/22/2019 20.5 0.214
#> 12 12/29/2019 19.4 -0.190
#> 13 12/8/2019 19.0 -0.359