[Experimental]

Apply hierarchical clustering to emails sent by hour of day. The hierarchical clustering uses cosine distance and the ward.D method of agglomeration.

workpatterns_hclust(
  data,
  k = 4,
  return = "plot",
  values = "percent",
  signals = "email",
  start_hour = "0900",
  end_hour = "1700"
)

Arguments

data

A data frame containing data from the Hourly Collaboration query.

k

Numeric vector to specify the k number of clusters to cut by.

return

String specifying what to return. This must be one of the following strings:

  • "plot"

  • "data"

  • "table"

  • "plot-area"

  • "hclust"

  • "dist"

See Value for more information.

values

Character vector to specify whether to return percentages or absolute values in "data" and "plot". Valid values are:

  • "percent": percentage of signals divided by total signals (default)

  • "abs": absolute count of signals

signals

Character vector to specify which collaboration metrics to use:

  • "email" (default) for emails only

  • "IM" for Teams messages only

  • "unscheduled_calls" for Unscheduled Calls only

  • "meetings" for Meetings only

  • or a combination of signals, such as c("email", "IM")

start_hour

A character vector specifying starting hours, e.g. "0900"

end_hour

A character vector specifying starting hours, e.g. "1700"

Value

A different output is returned depending on the value passed to the return

argument:

  • "plot": ggplot object of a bar plot (default)

  • "data": data frame containing raw data with the clusters

  • "table": data frame containing a summary table. Percentages of signals are shown, e.g. x% of signals are sent by y hour of the day.

  • "plot-area": ggplot object. An overlapping area plot

  • "hclust": hclust object for the hierarchical model

  • "dist": distance matrix used to build the clustering model

Details

The hierarchical clustering is applied on the person-average volume-based (pav) level. In other words, the clustering is applied on a dataset where the collaboration hours are averaged by person and calculated as % of total daily collaboration.

Examples

# Run clusters, returning plot
workpatterns_hclust(em_data, k = 5, return = "plot")


# Run clusters, return raw data
workpatterns_hclust(em_data, k = 4, return = "data") %>% head()
#>                                                           PersonId
#> 1 0248F8070D89F3CCDCA7274006516ECC173437A839531652A0BF1F41087A224E
#> 2 024BDDF8FB7C74EBC74C2A97AF92B3285E2DB672A62361E08B6D1892BE9BACCE
#> 3 0250022146CE0A7AC68A3B0C9DFE91E29234ECBB8609F752CC2EE1C3ADBEFF90
#> 4 0358DEF2813F6911B73AEE141837040511768BA66E3B0EA5BF3347C93D319213
#> 5 041A56AAD54F0690E37F0A35AFF71AD59DD46634D15B192EB9BAC36E70132EEC
#> 6 0421E39EEB90A61A3FC504FC8DF33B15A47AD5228713C9AFBA8DB575C654C145
#>   Emails_sent_00_01 Emails_sent_01_02 Emails_sent_02_03 Emails_sent_03_04
#> 1       0.007575758       0.003787879       0.011363636        0.01136364
#> 2       0.000000000       0.000000000       0.000000000        0.00000000
#> 3       0.000000000       0.000000000       0.000000000        0.00000000
#> 4       0.000000000       0.000000000       0.000000000        0.00000000
#> 5       0.031620553       0.027667984       0.003952569        0.01976285
#> 6       0.037267081       0.049689441       0.049689441        0.03105590
#>   Emails_sent_04_05 Emails_sent_05_06 Emails_sent_06_07 Emails_sent_07_08
#> 1       0.003787879       0.018939394       0.003787879       0.011363636
#> 2       0.000000000       0.000000000       0.000000000       0.034246575
#> 3       0.000000000       0.000000000       0.000000000       0.009803922
#> 4       0.000000000       0.000000000       0.000000000       0.020618557
#> 5       0.015810277       0.007905138       0.015810277       0.023715415
#> 6       0.018633540       0.012422360       0.037267081       0.037267081
#>   Emails_sent_08_09 Emails_sent_09_10 Emails_sent_10_11 Emails_sent_11_12
#> 1        0.08333333        0.06060606        0.10227273        0.07954545
#> 2        0.08219178        0.10273973        0.08219178        0.11643836
#> 3        0.11764706        0.08823529        0.09803922        0.10294118
#> 4        0.10309278        0.08247423        0.09793814        0.10824742
#> 5        0.04347826        0.03557312        0.05533597        0.11462451
#> 6        0.08074534        0.05590062        0.09316770        0.11801242
#>   Emails_sent_12_13 Emails_sent_13_14 Emails_sent_14_15 Emails_sent_15_16
#> 1        0.10984848        0.09469697        0.09090909        0.11363636
#> 2        0.11643836        0.06164384        0.11643836        0.11643836
#> 3        0.12254902        0.07352941        0.10784314        0.09803922
#> 4        0.11855670        0.08247423        0.04639175        0.12371134
#> 5        0.11067194        0.08695652        0.09090909        0.10671937
#> 6        0.08695652        0.05590062        0.06832298        0.05590062
#>   Emails_sent_16_17 Emails_sent_17_18 Emails_sent_18_19 Emails_sent_19_20
#> 1        0.12500000        0.06818182                 0                 0
#> 2        0.11643836        0.05479452                 0                 0
#> 3        0.10784314        0.07352941                 0                 0
#> 4        0.08762887        0.12886598                 0                 0
#> 5        0.07509881        0.12648221                 0                 0
#> 6        0.08074534        0.02484472                 0                 0
#>   Emails_sent_20_21 Emails_sent_21_22 Emails_sent_22_23 Emails_sent_23_24
#> 1                 0                 0                 0       0.000000000
#> 2                 0                 0                 0       0.000000000
#> 3                 0                 0                 0       0.000000000
#> 4                 0                 0                 0       0.000000000
#> 5                 0                 0                 0       0.007905138
#> 6                 0                 0                 0       0.006211180
#>   cluster
#> 1       1
#> 2       2
#> 3       2
#> 4       3
#> 5       1
#> 6       4


# Run clusters for instant messages only, return hclust object
workpatterns_hclust(em_data, k = 4, return = "hclust", signals = c("IM"))
#> 
#> Call:
#> stats::hclust(d = ., method = "ward.D")
#> 
#> Cluster method   : ward.D 
#> Distance         : cosine 
#> Number of objects: 500 
#>