Count top words in subject lines grouped by a custom attribute

This function generates a matrix of the top occurring words in meetings, grouped by a specified attribute such as organisational attribute, day of the week, or hours of the day.

subject_scan(
  data,
  hrvar,
  mode = NULL,
  top_n = 10,
  token = "words",
  return = "plot",
  weight = NULL,
  stopwords = NULL,
  ...
)

tm_scan(
  data,
  hrvar,
  mode = NULL,
  top_n = 10,
  token = "words",
  return = "plot",
  weight = NULL,
  stopwords = NULL,
  ...
)

Arguments

data

A Meeting Query dataset in the form of a data frame.

hrvar

String containing the name of the HR Variable by which to split metrics. Note that the prefix 'Organizer_' or equivalent will be required.

mode

String specifying what variable to use for grouping subject words. Valid values include:

"hours"
"days"
NULL (defaults to hrvar) When the value passed to mode is not NULL, the value passed to hrvar will be discarded and instead be over-written by setting specified in mode.

top_n

Numeric value specifying the top number of words to show.

token

A character vector accepting either "words" or "ngrams", determining type of tokenisation to return.

return

String specifying what to return. This must be one of the following strings:

"plot"
"table"
"data"

See Value for more information.

weight

String specifying the column name of a numeric variable for weighting data, such as "Invitees". The column must contain positive integers. Defaults to NULL, where no weighting is applied.

stopwords

A character vector OR a single-column data frame labelled 'word' containing custom stopwords to remove.

...

Additional parameters to pass to tm_clean().

Value

A different output is returned depending on the value passed to the return

argument:

"plot": 'ggplot' object. A heatmapped grid.
"table": data frame. A summary table for the metric.
"data": data frame.

Examples

# return a heatmap table for words
mt_data %>% subject_scan(hrvar = "Organizer_Organization")


# return a heatmap table for ngrams
mt_data %>%
  subject_scan(
    hrvar = "Organizer_Organization",
    token = "ngrams",
    n = 2)


# return raw table format
mt_data %>% subject_scan(hrvar = "Organizer_Organization", return = "table")
#> # A tibble: 10 × 16
#>    Biz D…¹ CEO   Custo…² Facil…³ Finan…⁴ Finan…⁵ Finan…⁶ Finan…⁷ Finan…⁸ G&A C…⁹
#>    <chr>   <chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 update  annu… updated update  weekly  weekly  weekly  volome… weekly  volome…
#>  2 project board update  updated updated transi… meeting weekly  interv… update 
#>  3 review  disc… meeting interv… meeting visit   updated update  updated discus…
#>  4 weekly  netw… interv… team    interv… review  volome… interv… messag… enterp…
#>  5 meeting nick  plan    volo    volome… update  plan    project staff   report 
#>  6 plan    plan  volome… weekly  project updates test    confer… update  updated
#>  7 status  pred… todd    recurr… update  interv… update  discus… volome… weekly 
#>  8 updated repo… transi… review  review  lunch   visit   product review  product
#>  9 visit   spar… visit   sales   traini… messag… volo    traini… direct  board  
#> 10 volome… stra… apple   testing chris   service market… visit   extrac… confer…
#> # … with 6 more variables: `G&A East` <chr>, `G&A South` <chr>,
#> #   `Human Resources` <chr>, `IT-Corporate` <chr>, `IT-East` <chr>,
#> #   `Inventory Management` <chr>, and abbreviated variable names ¹`Biz Dev`,
#> #   ²`Customer Service`, ³Facilities, ⁴`Finance-Corporate`, ⁵`Finance-East`,
#> #   ⁶`Finance-South`, ⁷`Finance-West`, ⁸`Financial Planning`, ⁹`G&A Central`

# grouped by hours
mt_data %>% subject_scan(mode = "hours")


# grouped by days
mt_data %>% subject_scan(mode = "days")