details.Rmd
datamations constructs one or more frames for each step of a pipeline. For example, in the following pipeline:
library(datamations)
library(dplyr)
"small_salary %>%
group_by(Degree) %>%
summarize(mean = mean(Salary))" %>%
datamation_sanddance()
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#> always returns an ungrouped data frame and adjust accordingly.
#> ℹ The deprecated feature was likely used in the datamations package.
#> Please report the issue to the authors.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
there are three steps:
The initial data (small_salary
)
An information grid is shown, laying out the number of points in the data set.
The grouped data (grouped by Degree
)
The data is separated into groups, retaining the informaton grid structure.
The summarized data (mean of Salary
)
The distribution of Salary
within the groups is shown, then the summary function (mean) is applied. Error bars are added to the mean and the final frame zooms in on the data.
group_by()
frames
datamations supports up to three grouping variables, showing one frame per variable. The placement of the variables is as follows:
summarize()
frames
datamations supports summarizing one variable. The summarize()
section of a pipeline will have the following frames:
count()
frames
datamations treats count()
equivalently to group_by()
+ summarize(n = n())
. It supports up to three “grouping” variables.
filter()
frames
datamation supports filter()
at any point in the pipeline, whether it comes after the initial data, while the data is grouped, or after it has been summarized.