R is an open-source statistical programming language and one of the most popular toolkits for data analysis and data science. There are four key reasons why a Workplace Analytics analyst might choose to use R instead of other alternatives:
install.packages()
. Why is that?Since 6th April 2021, install.packages()
should work with wpa as it is made available on CRAN on that date. If you continue to experience issues, please create an issue at https://github.com/microsoft/wpa/issues/, and install with the development version in the mean time:
remotes::install_github(repo = "microsoft/wpa", upgrade = "never")
If the above does not work, please also try:
remotes::install_git(url = "https://github.com/microsoft/wpa.git", upgrade = "never")
For more information regarding installation, please see our Getting Started page.
You may see the following message during installation:
WARNING: Rtools is required to build R packages, but is not currently installed.
This message appears when R is trying to install or update packages that require compilation from binaries, which requires the installation of Rtools. To get around this, we recommend selecting No
when prompted the question Do you want to install from sources the packages which needs compilation?
when updated the packages. Alternatively, you can also choose to install Rtools.
The reason why you may be prompted the install from sources question is because one of the packages that wpa is dependent on has updated recently on CRAN, but the binary is not available for your operating system (yet). If you choose No
, you will not get the most recent version, which in most cases will not be a problem. If you choose Yes
, the package will be built from source locally, and you will need Rtools to do that. 1
The most common cause for package installation failures is when users try to install the package when some of the dependent packages are loaded. As a best practice, you should always install or update packages in a fresh R Session (no packages loaded and a clear workspace). In RStudio, you can refresh your session with the shortcut Ctrl + Shift + F10
. You can try the installation command again in a new R Session.
Make sure you follow the recommend installation steps listed on our Getting Started page. If installation problems persist, please create an issue at https://github.com/microsoft/wpa/issues/ and describe the error message that you see on your console.
You are recommended to use the latest stable version of R. You can find the latest version on https://www.r-project.org/.
If you wish to install a version of the package from any branch other than the main branch, you can run the following code:
devtools::install_git(url = "https://github.com/microsoft/wpa.git",
branch = "<BRANCH-NAME>", # Replace
build_vignettes = TRUE)
You should replace the <BRANCH-NAME>
with the name of your target branch, such as "feature/network-functions"
.
No, you should not. You should never install the default directory for R to OneDrive, as R is a program and you are likely to experience significant cloud syncing issues if you do so. If you use Windows and have experienced this issue, please uninstall and follow the below steps for re-installation:
C:\Program Files\R\R-4.X.X\library\base\R\Rprofile
C:
drive instead to prevent this.File Explorer
-> C:
drive -> Create New Folder
-> Name it "R"
C:\Program Files\R\R-4.X.X\library\base\R\Rprofile
C:\Program Files\R\R-4.X.X\library\base\
Security
tab.Edit
button.Permissions for users
pane, click on the Full Control
checkbox under Allow
.Rprofile
file:C:\Program Files\R\R-4.X.X\library\base\R\
Rprofile
file with your preferred text editor."RProfileBackup.txt"
.Rprofile
file with your preferred text editor.Sys.getenv("R_USER")
with the string "C:/R"
Sys.getenv("R_USER")
and NOT the string Sys.getenv("R_LIBS_USER")
"C:/R"
should align with the directory where you created the folder in step 3aRProfileBackup.txt
as a backup if OneDrive uploads the package files after installationYou should then be able to install the wpa library by opening RStudio and running the following code:
# Check if remotes is installed, if not then install it
if(!"remotes" %in% installed.packages()){
install.packages("remotes")
}
remotes::install_github(repo = "microsoft/wpa", upgrade = "never")
You can then restart RStudio with Ctrl
+ Shift
+ F10
in Windows.
Currently, you can use the following flexible queries with the R package: - Standard Person Query (includes Ways of Working Assessment Query) - Standard Meeting Query - Hourly Collaboration Query - Group-to-Group Query
You can explore functions in wpa with demo datasets.
The export()
function allows you to export the outputs of your analysis to Excel in two ways.
1. By setting method
to "clipboard"
and passing a data frame through to export()
, the results will be copied to your clipboard and you can paste it through to Excel. 2. By setting method
to "csv"
, a CSV file will be saved in the specified path relative to your current working directory. The CSV file can then be opened in Excel.
If you would like to export a list of data frames to Excel where (i) each data frame in the list corresponds to a Worksheet and (ii) the name of each list member corresponds to the Sheet name, you can use writexl::write_xlsx()
. wpa does not depend on functions from writexl, so you may need to load and install it separately.
We recommend using dplyr (which is loaded in as part of tidyverse) for data manipulation. To filter down to a specific date range in a Person Query, you can run:
library(wpa)
library(tidyverse) # Or load dplyr
# Read Standard Person Query from local directory
# Assign it to `raw_spq`
raw_spq <- import_wpa("data/standard person query.csv")
# Assign filtered data frame to `clean_spq`
clean_spq <-
raw_spq %>%
filter(Date >= as.Date("08/30/2020", "%m/%d/%Y")) %>%
filter(Date <= as.Date("12/19/2020", "%m/%d/%Y"))
The above example filters the date range from the week commencing 30th of August 2020 to the week ending 19th of November 2020 inclusive. Note that the first date is a Sunday (beginning of the week) and the second date is a Saturday (end of the week). If you query is run by day, you should specify the after filter as the exact last day, rather than the Saturday of the week.
In some scenarios, you may also want to exclude a particular week from the data. You can use a similar approach:
The above line of code excludes the week of 22nd of November 2020, using the operator !=
to denote not equal to. Conversely, you can isolate that single week by replacing !=
with ==
.
The most common way to create a ‘custom HR variable’ is to create a categorical variable from one or more numeric variables. You may want to do this if you are trying to bin a numeric variable or create a custom rule-based segmentation with Workplace Analytics metrics.
Here is an example of how to create a categorical variable (N_DirectReports_NET
) with a numeric variable representing the number of direct reports, using dplyr::mutate()
and dplyr::case_when()
.
library(wpa)
library(tidyverse) # Or load dplyr
clean_spq_with_new_var <-
clean_spq %>% # Standard Person Query
mutate(N_DirectReports_NET =
case_when(NumberofDirectReports == 0 ~ "0",
NumberofDirectReports == 1 ~ "1",
NumberofDirectReports <= 5 ~ "2 to 5",
NumberofDirectReports <= 10 ~ "6 to 10",
NumberofDirectReports <= 20 ~ "Up to 20",
NumberofDirectReports >= 21 ~ "21 +",
TRUE ~ "Not classified"))
dplyr::mutate()
creates a new column, whereas dplyr::case_when()
runs an if-else operation to classify numeric ranges to the right hand side of the ~
symbol. When the expression on the left hand side evaluates to TRUE
, the value on the right hand side is assigned to the new column. At the end of the code, you will see that anything that doesn’t get classified gets an ‘error handling’ value. If a value ends up as “Not classified”, you should check whether there may be gaps in your dplyr::case_when()
chunk that is not capturing all the values.
Once you have created this new variable and checked that the classifications are correct, you can further your analysis by using it as an HR attribute, such as:
clean_spq_with_new_var %>%
keymetrics_scan(hrvar = "N_DirectReports_NET")
With a few exceptions, most plot outputs returned from wpa functions are ggplot
objects. What this means is that you can edit or add layers to the outputs. See here for an example of how you can make customizations.
Sometimes you may wish to create an equivalent bar chart of email_sum()
, but for a custom metric or another Workplace Analytics metric that does not form part of the *_sum()
family. This is where flexible functions are helpful, where you can use create_bar()
to produce the same visualization by supplying the custom metric name to the metric
argument. For instance, a bar chart for "Generate_workload_email_hours"
could be run with:
create_bar(sq_data, metric = "Generate_workload_email_hours")
The same person-average computation is used and the same minimum group threshold argument is also available with create_bar()
. The same is equivalent for the other visualizations:
email_line()
–> create_line()
email_trend()
–> create_trend()
email_dist()
–> create_dist()
email_fizz()
–> create_fizz()
email_rank()
–> create_rank()
Some functions also act as wrappers around ggplot2 code where the data is directly plotted without additional person average computation. Examples include create_bar_asis()
and create_line_asis()
.
For analysis inspirations, we recommend having a look through our R playbook identifying useful analyses or functions for exploring a particular area of Workplace Analytics.
If you have an idea for an analysis and would like this to be documented in the Playbook, you are very welcome to create an issue at https://github.com/microsoft/wpa/issues/ where we can add your idea and we will credit you as a contributor. You can also use the issue to invite discussion on your analysis idea, although we ask to please keep the discussion relevant to analysis via the R package.
Some functions, such as create_bar()
, offer an argument (rank
) for ordering the categorical variable, i.e. the HR attribute when creating the visualisation. Other functions by default perform no ordering, so they show up in the same order as the data is presented.
For instance, the following code would yield a stacked bar plot with no ordering applied:
library(wpa)
library(tidyverse)
#### NO REORDERING ####
sq_data %>%
create_dist(hrvar = "LevelDesignation",
metric = "Email_hours")
To apply an ordering manually, the best way is to convert the HR attribute into a factor variable, where you can specify the levels. You can understand levels as a way to specify the order in which the values of your HR attribute should be ranked. The following code will yield an ordered stacked bar plot:
#### ORDERING APPLIED ####
ld_order <-
c( # Order levels
"Executive",
"Director",
"Manager",
"Senior IC",
"Junior IC",
"Support") %>%
rev() # Reverse if necessary
sq_data %>%
mutate(LevelDesignation = factor(
LevelDesignation,
levels = ld_order # Specify order
)) %>%
create_dist(hrvar = "LevelDesignation",
metric = "Email_hours")
Note that the levels
argument within factor()
accepts a character vector which matches the values in your HR attribute.
Note: this is not a Power BI documentation and therefore will not contain the most recent information on Power BI’s capabilities. This information is accurate to the extent of the author’s knowledge at the time of writing.
To use visuals from the wpa R library within Power BI, there are two pre-requisites beyond having R and the package installed:
Only visuals generated from the package of the ggplot
object type can be embedded within Power BI. In most cases, what this means is that the visual will only display if the value passed to the return
argument in your function is "plot"
. For instance, the following is code returns a valid object:
sq_data %>% email_dist(return = "plot")
Note that there are some exceptions to the "plot"
rule, as some functions return a HTML widget, such as create_sankey()
. You will also not be able to return data frames, HTML reports, or DataTables objects (the output of create_dt()
).
The documentation here provides more details on the limitations of the R visuals.
Here are a series of steps, slightly abridged, taken from the Power BI documentation, last updated July 6th 2021:
Enable script visuals
window that appears, select Enable
.Values
section of the Visualization
pane, drag fields from the Fields
pane that you want to consume in your R script, just as you would with any other Power BI Desktop visual.Fields
pane.Values
section are available to your R script.Values
section while working on your R script in the R script editor. Power BI Desktop automatically detects which fields you’ve added or removed.dataset
, and you access selected columns by their respective names.PersonId
, Date
, any relevant metrics, and HR attributes included as required as inputs to the R function."# Paste or type your script code here:,"
paste or type: library(wpa)
= "plot"
. Otherwise, there will be an error in the R visual placeholder on the report canvas when you run the script.Run script
icon in the script editor.A minimal example that you could try on your R script editor is:
You can create an issue at https://github.com/microsoft/wpa/issues/ if you have questions about using the R package. There are no silly questions: chances are that other users have the same question as you, and creating an issue helps us make the package documentation more user-friendly and help other users such as yourself.
You can create a bug report at https://github.com/microsoft/wpa/issues/ if you think you have found a bug with our package. We really appreciate user feedback and we will credit all contributions on our package website.
If you have an idea in which you would like to collaborate, you can create an issue at https://github.com/microsoft/wpa/issues/. If there is code that is ready to be reviewed, you can also fork our repository and create a pull request. We’d like to ask you to read our Contributor Guide and Developer Guide prior to contributing. We will credit your contributions accordingly when your contribution is successfully merged with the package.