Introduction

This report allows you to review the quality of the Workplace Analytics data available and highlights specific issues that may require your attention before starting analysis. This report is structured in three sections:

  1. Workplace Analytics Settings
  2. Organizational Data Quality
  3. M365 Data Quality

The Microsoft Team behind Workplace Analytics has developed a series of data checks for each section. For the areas that have issues, we also provide you with suggestions to further clean up the data before performing additional analysis.

This report will automatically conduct certain data quality tests. Results will be indicated as a [Pass] or [Warning] throughout the report. [Warning] messages will direct you to items that need your attention and potential action.

For additional information about Workplace Analytics, including metric definitions, please visit our official documentation.


Data Available

Query Check

There are 69 employees in this dataset.

Date ranges from 2019-11-03 to 2020-01-26.

There are 11 (estimated) HR attributes in the data: PersonId, Domain, FunctionType, LevelDesignation, Region, Organization, zId, attainment, TimeZone, IsInternal, IsActive

There are 69 active employees out of all in the dataset.

1. Workplace Analytics Settings

1.1 Outlook Settings

Workplace Analytics uses the working days and hours settings from each measured Microsoft 365 Exchange mailbox to calculate collaboration metrics. This data allows the system to distinguish between collaboration activity (email, meetings, and Teams calls & IMs) that takes place during and outside of working hours.

The most frequent working hours set in Outlook in this dataset are the following:

Abnormal Outlook settings (i.e. significant number of users defining very short or very long working days) may skew analysis results, making after-hours collaboration look particularly high or small.

[Warning] 94.9% (832) of the person-date rows in the data have extreme Outlook settings. 0% (0) have an Outlook workday shorter than 4 hours, while 94.9% (832) have a workday longer than 15 hours.

[Pass] The ratio of after-hours collaboration to total collaboration hours is outside the expected threshold for only 0 employees (0 % of the total).

  • 0 employees (0 %) have an unusually high after-hours collaboration (relative to weekly collaboration hours)
  • 0 employees (0 %) have an unusually low after-hours collaboration

If you believe abnormal Outlook settings are distorting your results, consider standardizing working hours across your analysis population. You can override Outlook settings by setting a global parameter in the Dependencies section of the Person query.

1.2 Meeting Exclusion Rules

Workplace Analytics uses email and calendar activities that are stored in a person's Office 365 account to reveal internal and external collaboration trends. However, a person's calendar and email can contain a diverse set of activities (such as personal meetings or appointments, social activities, all-day training meetings, and so forth) that are not relevant to work-related collaboration, and, if included in the metrics, would skew query results.

This section analyses the subject lines from the supplied meeting query, to identify if common exclusion terms are present in your data (e.g. happy hour, yoga class, team dinner, etc.). For more information, please visit meeting exclusion rules.

[Warning] 41 meetings ( 2.1% of 2000 ) require your attention as they contain common exclusion terms.

If you believe that your meeting data requires further cleanup, please consider defining a new meeting exclusion rule under Settings in Workplace Analytics and re-running your queries. If you want to further investigate this issue, you can flag these meetings in your dataset using subject_validate(data, return = "data"). You can also generate a more detailed report using subject_validate_report().


2. Organizational Data Quality

Organizational data is descriptive information about the employees in your organization, such as the employee's organization, job function, level, etc. This data has been uploaded by your organization's Workplace Analytics Administrator. The quality of this information is important as it enables Workplace Analytics to attribute Office 365 data to specific groups, and slice the collaboration data in different ways to uncover relevant trends for your organization.

2.1 Attributes Available

The table below shows the organizational attributes available in this dataset. Use this table to understand the data's quality and completeness.

  • Be mindful of attributes that have many unique values, as this may limit data aggregation and filtering (For example, if a job function or code is too narrowly defined, it might not give you a useful view of the overall group).
  • Additionally, review missing values as some attributes may only be partially available for the population in this sample:

2.2 Groups Under Privacy Threshold

To minimize privacy risk, queries from Workplace analytics are anonymized. We also recommend that during analysis, collaboration patterns from teams or departments are reported in an aggregated way, respecting a minimum-group-size privacy threshold (your Workplace Analytics Administration has already defined a minimum-group-size rule that affects Explore charts and in Plans within Workplace Analytics).

The default minimum-group setting in this report is five, but this setting can be changed according to the privacy requirements of your organization. Re-run this report using the mingroup parameter to use a custom minimum group size setting.

[Warning] There are 8 groups under the minimum group size privacy threshold of 5.

The following groups are available in this dataset:

2.3 Distribution of Employees in Key Attributes

This section can help you understand the population scope for your analysis and validate the size of your selected grouping based on your business knowledge. Please note that this report will use Organization as the default grouping, but you can specify another attribute of your choosing as the hrvar input of the function validation_report(). Please note that a minimum threshold has not been applied to this section, providing a full list of attributes for your review.

2.4 Updates to Organizational Data

It is recommended for Workplace Analytics administrators to keep up to date the organizational data that is uploaded into Workplace Analytics. These updates help guarantee that relevant changes in the organizational structure are reflected in the system, that the collaboration data of new joiners is captured, and that all Office 365 data flows are attributed to the right teams and departments (even when some employees may change roles or be promoted internally).

The following chart shows the observed mobility of employees between teams in your organization. Lack of changes could indicate that the organizational data has not been frequently updated during the period under consideration.

2.5 Quality of Tenure Data

When the employee's HireDate is available as an organizational data attribute, it can be used to calculate tenure of employees. This section does a quality check on the calculated tenure field, calculated as the employee's last weekly collaboration date in your query minus the HireDate. The findings of the plot below will shed light on your analysis population's tenure distribution.

The mean tenure is 24.7 years. The max tenure is 49. There are 10 employees with a tenure greater than 40 years.

3. M365 Data Quality

This section evaluates the quality of the collaboration data available in your tenant, that is calculated from email, meeting, calls and IM flows within your company (all gathered from M365 Exchange and Teams). In general, collaboration data provides a very accurate description of the digital habits of employees and their daily experience as they interact with peers and other teams. However, data may not be available for all employees, or may only partially capture their day to day experience (for example, in teams that use other communication platforms or that interact face-to-face without planning meetings in advance).

3.1 Population Over Time

This section provides a view of the licensed population available in the query over time. Note that the values seen on this plot could differ from the actual licensed population due to filters on Activeness, holiday weeks, and non-knowledge workers.

3.2 Non-knowledge Workers

Non-knowledge workers refer to persons with unusually low average collaboration hours. These may represent individuals who are not required to collaborate via Outlook and Teams as part of their role or shift or may be part-time staff. Workplace Analytics data may not be representative of these individuals' workday experience.

For this reason, we suggest excluding non-knowledge workers from your analysis. You can easily remove them from your dataset by using the function identify_nkw(return = "data_clean").

[Warning] Out of a population of 69, there are 5 employees who may be non-knowledge workers (average collaboration hours below 5 hours).

3.3 Company Holiday Weeks

Holiday weeks: these refer to weeks in the data where the collaboration hours of the sample are unusually low. These are typically removed from analysis as they represent public holidays where the patterns of collaboration are not representative of the norm. Note that this applies to weeks, i.e. the data of the week is removed for all employees in the sample.

The weeks where collaboration was 1 standard deviations below the mean (18.3) are: 2019-12-01

You can easily remove holiday weeks from your dataset by using the function identify_holidayweeks(return = "data_cleaned").
​

3.4 Inactive Weeks

Inactive weeks are person-weeks in the data where the collaboration hours of the sample are unusually low. These are typically removed as they represent individual holidays where the patterns of collaboration are not representative of the norm. Note that this applies to person-weeks, i.e. the data is only removed for an individual for a given week if it is low for that employee.

There are 14 rows of data with weekly collaboration hours more than 2 standard deviations below the mean (17.9).

You can easily remove inactive weeks from your dataset by using the function identify_inactiveweeks(return = "data_cleaned").

3.5 Extreme Values

This section runs checks against the core collaboration metrics (Email, Meeting, Teams Call, and Teams Instant Message hours) to flag any extreme values. If a significant number of extreme high or low values is identified, the Analyst is recommended to investigate the cause before proceeding further with the analysis.

3.5.1 Extreme values: Email

[Pass] There are no persons where their average Email hours exceeds 80.

[Pass] There are no rows where their value of Email hours exceeds 80.

3.5.2 Extreme values: Meeting

[Pass] There are no persons where their average Meeting hours exceeds 80.

[Pass] There are no rows where their value of Meeting hours exceeds 80.

3.5.3 Extreme values: Calls

[Pass] There are no persons where their average Call hours exceeds 40.

[Pass] There are no rows where their value of Call hours exceeds 40.

3.5.4 Extreme values: IM

[Pass] There are no persons where their average Instant Message hours exceeds 40.

[Pass] There are no rows where their value of Instant Message hours exceeds 40.

3.5.5 Extreme values: Conflicting Meetings

[Pass] There are no persons where their average Conflicting meeting hours exceeds 70.

[Pass] There are no rows where their value of Conflicting meeting hours exceeds 70.