This function processes the Subject
column in a Meeting Query by applying
tokenisation usingtidytext::unnest_tokens()
, and removing any stopwords
supplied in a data frame (using the argument stopwords
). This is a
sub-function that feeds into tm_freq()
, tm_cooc()
, and tm_wordcloud()
.
The default is to return a data frame with tokenised counts of words or
ngrams.
tm_clean(data, token = "words", stopwords = NULL, ...)
A Meeting Query dataset in the form of a data frame.
A character vector accepting either "words"
or "ngrams"
,
determining type of tokenisation to return.
A character vector OR a single-column data frame labelled
'word'
containing custom stopwords to remove.
Additional parameters to pass to tidytext::unnest_tokens()
.
data frame with two columns:
line
word
Other Text-mining:
meeting_tm_report()
,
pairwise_count()
,
subject_validate_report()
,
subject_validate()
,
tm_cooc()
,
tm_freq()
,
tm_wordcloud()
# words
tm_clean(mt_data)
#> # A tibble: 6,520 × 2
#> line word
#> <int> <chr>
#> 1 1 planning
#> 2 1 core
#> 3 2 agile
#> 4 2 officer
#> 5 3 setup
#> 6 3 performance
#> 7 3 ryan
#> 8 3 friday
#> 9 3 consumer
#> 10 4 volometrix
#> # … with 6,510 more rows
# ngrams
tm_clean(mt_data, token = "ngrams")
#> # A tibble: 7,688 × 2
#> line word
#> <int> <chr>
#> 1 1 planning will and
#> 2 1 will and core
#> 3 1 and core r
#> 4 1 core r d
#> 5 1 r d from
#> 6 2 the agile officer
#> 7 3 setup performance and
#> 8 3 performance and the
#> 9 3 and the ryan
#> 10 3 the ryan for
#> # … with 7,678 more rows