datamations constructs one or more frames for each step of a pipeline. For example, in the following pipeline:

library(datamations)
library(dplyr)

"small_salary %>% 
  group_by(Degree) %>%
  summarize(mean = mean(Salary))" %>%
  datamation_sanddance()
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#>  Please use `reframe()` instead.
#>  When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#>  The deprecated feature was likely used in the datamations package.
#>   Please report the issue to the authors.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

there are three steps:

  1. The initial data (small_salary)

    An information grid is shown, laying out the number of points in the data set.

  2. The grouped data (grouped by Degree)

    The data is separated into groups, retaining the informaton grid structure.

  3. The summarized data (mean of Salary)

    The distribution of Salary within the groups is shown, then the summary function (mean) is applied. Error bars are added to the mean and the final frame zooms in on the data.

group_by() frames

datamations supports up to three grouping variables, showing one frame per variable. The placement of the variables is as follows:

  • One variable: On the x-axis
  • Two variables: The first variable in column facets, the second on the x-axis
  • Three variables: The first variable in column facets, the second in row facets, the third in on the x-axis

summarize() frames

datamations supports summarizing one variable. The summarize() section of a pipeline will have the following frames:

  1. Distribution of the variable to be summarized
  2. Summarized variable
  3. Summarized variable with standard error (only if summary function is mean)
  4. Zoomed version of summarized variable

count() frames

datamations treats count() equivalently to group_by() + summarize(n = n()). It supports up to three “grouping” variables.

filter() frames

datamation supports filter() at any point in the pipeline, whether it comes after the initial data, while the data is grouped, or after it has been summarized.