datamations mutation api • datamations

{datamations} supports the limited definition of mutations in a pipeline. It is capable of showing a single mutation involving multiple variables in a scatterplot or grid fashion.

We can define new data to use in some example mutations.

New data with added variable

library(dplyr)
library(datamations)

small_salary <- dplyr::mutate(
  small_salary, 
  supplementalIncome = runif(nrow(small_salary), min = 60, max = 110),
  logNorm = rlnorm(nrow(small_salary), meanlog = 0, sdlog = 1)
  )

{datamations} can visualize mutations to help one understand mathematical distributions, scales, and relationships.

Log normal mutation


"small_salary %>%
  mutate(logged = log10(logNorm)) %>%
  group_by(Degree) %>%
  summarize(mean = mean(logged))" %>%
  datamation_sanddance()
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#> ℹ The deprecated feature was likely used in the datamations package.
#>   Please report the issue to the authors.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

Mathematical Notation


"small_salary %>%
  mutate(salarySquared = Salary^2) %>%
  group_by(Degree) %>%
  summarize(mean = mean(salarySquared))" %>%
  datamation_sanddance()

"small_salary %>% mutate(inverseSalary = 1 / Salary) %>% group_by(Degree) %>% summarize(mean = mean(inverseSalary))" %>% datamation_sanddance()

Multivariate mutates

{datamations} can also showcase the relationship between more than variable in your data pipelines. We can see below the relationship between our Salary variable and a new variable and use the mutation in grouping, filtering, and summarization.


"small_salary %>%
  mutate(totalIncome = Salary + supplementalIncome) %>%
  group_by(Degree) %>%
  summarize(mean = mean(totalIncome))" %>%
  datamation_sanddance()

"small_salary %>% mutate(incomePer = Salary / supplementalIncome) %>% group_by(Degree) %>% summarize(mean = mean(incomePer))" %>% datamation_sanddance()

Two variable mutates

{datamations} will allow the definition of a mutate statement with multiple mutates, but it will ignore anything after the first defined mutate. Two variable mutates results in a warning.

"small_salary %>%
  mutate(totalIncome = Salary + supplementalIncome, squaredIncome = Salary^2) %>%
  group_by(Degree) %>%
  summarize(mean = mean(totalIncome))" %>%
  datamation_sanddance()
#> Error in generate_mapping(data_states, tidy_function_args, plot_mapping): Datamations currently only supports a single mutation call for visualization. Edit your pipeline to only include a single mutation necessary for the visualization.