This function processes the Subject
column in a Meeting Query by applying
tokenisation usingtidytext::unnest_tokens()
, and removing any stopwords
supplied in a data frame (using the argument stopwords
). This is a
sub-function that feeds into tm_freq()
, tm_cooc()
, and tm_wordcloud()
.
The default is to return a data frame with tokenised counts of words or
ngrams.
Arguments
- data
A Meeting Query dataset in the form of a data frame.
- token
A character vector accepting either
"words"
or"ngrams"
, determining type of tokenisation to return.- stopwords
A character vector OR a single-column data frame labelled
'word'
containing custom stopwords to remove.- ...
Additional parameters to pass to
tidytext::unnest_tokens()
.
See also
Other Text-mining:
meeting_tm_report()
,
pairwise_count()
,
tm_cooc()
,
tm_freq()
,
tm_wordcloud()
Examples
# words
tm_clean(mt_data)
#> # A tibble: 1,039 × 2
#> line word
#> <int> <chr>
#> 1 1 focus
#> 2 1 time
#> 3 2 focus
#> 4 2 time
#> 5 3 focus
#> 6 3 time
#> 7 4 focus
#> 8 4 time
#> 9 5 focus
#> 10 5 time
#> # ℹ 1,029 more rows
# ngrams
tm_clean(mt_data, token = "ngrams")
#> # A tibble: 692 × 2
#> line word
#> <int> <chr>
#> 1 1 NA
#> 2 2 NA
#> 3 3 NA
#> 4 4 NA
#> 5 5 NA
#> 6 6 NA
#> 7 7 NA
#> 8 8 NA
#> 9 9 NA
#> 10 10 NA
#> # ℹ 682 more rows