Causal Inference: Experimentation Guide for Copilot Analytics

This is a comprehensive guide on how to run a causal inference (or treatment effect estimation) analysis with Copilot Analytics.

What is causal inference?

Causal inference is a statistical methodology that helps us determine whether one event actually causes another, rather than just observing that they happen together. In the context of organizational data, it allows us to distinguish between correlation and causation by controlling for confounding variables and establishing the direction of causality.

Unlike simple comparisons that might show “users with Copilot are more productive,” causal inference answers the more precise question: “how much more productive would users become if they were given Copilot access?” This distinction is crucial for making informed business decisions about technology investments.

Why run causal inference for Copilot Analytics?

Causal inference is essential for Copilot Analytics because:

  1. Investment Justification: Provide robust evidence of Copilot’s ROI by isolating its true impact from other factors like training, team composition, or seasonal trends.

  2. Targeted Deployment: Identify which employee segments (e.g., senior developers, specific functions, regions) benefit most from Copilot, enabling strategic rollout decisions.

  3. Policy Optimization: Understand whether observed productivity gains come from Copilot licensing itself, training programs, or their interaction, informing enablement strategies.

  4. Confounding Control: Account for selection bias where high-performing teams might be more likely to adopt new tools, ensuring accurate impact measurement.

  5. Temporal Dynamics: Distinguish between immediate adoption effects and sustained productivity improvements over time.

Outcomes of the causal inference analysis

At the end of the causal inference analysis, we will be able to understand:

Experiment design

In causal inference, it is important to identify three sets of variables as part of the design:

The outcome variable

In our analysis, we will be using external collaboration hours as the outcome variable. This variable is selected as it is a good success measure proxy for external-facing or sales-focused employees, and it is also available as a native Viva Insights metric that doesn’t require further importing.

External collaboration hours measures the time employees spend in meetings, emails, and other collaborative activities with people outside their immediate organization. This metric is particularly valuable because:

For non-sales focused scenarios, other outcome variables such as number of tickets closed, total hours, projects closed, coding productivity, or meeting efficiency may be used instead.

The treatment variable

The treatment variable represents Copilot usage intensity, measured as continuous variables such as:

The full list of these M365 Copilot metrics can be found in the advanced analysis metric descriptions.

We model treatment as continuous rather than binary (user vs. non-user) because:

Confounder variables and their importance

Confounders are variables that influence both who gets treated (Copilot access/usage) and the outcome (external collaboration hours). Controlling for confounders is critical because without them, we might incorrectly attribute changes to Copilot when they’re actually due to other factors.

Key confounding categories include:

Without proper confounder control, we risk measuring “who gets Copilot” effects rather than “what Copilot does” effects.

Choosing the right organizational attributes

Selecting appropriate organizational attributes for the analysis is crucial for obtaining valid causal estimates. The choice of variables should be guided by:

Importance of validating the data

Before running any causal analysis, it’s essential to validate that your data meets the necessary requirements:

Techniques

There are many different types of causal inference techniques. In this analysis, we will be focusing on:

  1. Difference in differences (DiD)
  2. Interrupted time-series analysis (ITSA)
  3. Double machine learning causal forest (CausalForestDML)

Why these techniques?

These three methods complement each other and provide increasing levels of sophistication:

DiD (Difference-in-Differences)

What it is

DiD compares changes in outcomes between treatment and control groups over time, controlling for both group-specific and time-specific factors.

How it works

The method estimates the treatment effect as: (Treatment Group Post - Treatment Group Pre) - (Control Group Post - Control Group Pre).

Key assumptions

Why use DiD for Copilot

ITSA (Interrupted Time Series Analysis)

What it is

ITSA models the outcome trend before intervention and detects changes in level and slope after intervention, using control series for comparison.

How it works

Fits separate regression lines to pre- and post-intervention periods, testing for:

Key assumptions

Why use ITSA for Copilot

CausalForestDML (Double Machine Learning + Causal Forest)

What it is

A machine learning approach that estimates heterogeneous treatment effects using random forests while maintaining statistical rigor through double machine learning.

How it works

  1. Double ML: Uses machine learning to model both outcome and treatment, removing bias from model misspecification
  2. Causal Forest: Builds an ensemble of trees to estimate treatment effects that vary by individual characteristics
  3. Tree Interpretation: Applies decision tree algorithms to discover which subgroups have high vs. low treatment effects

Key assumptions

Why use CausalForestDML for Copilot

We will use CausalForestDML for estimating Conditional Average Treatment Effects (CATEs), enabling us to detect heterogeneous impacts of Copilot usage at personal level. To support interpretability and subgroup discovery, we will also apply SingleTreeCateInterpreter algorithm, which extracts representative cohorts based on HR attributes and/or collaboration metrics. This combined approach allows us to identify where Copilot usage has the most meaningful effect and supports targeted enablement strategies.

Running the analysis

Pre-requisites

Required Software

Python Packages

The analysis requires several specialized packages. Install them using:

pip install -r requirements.txt

Key dependencies include:

What is a Jupyter Notebook?

Jupyter notebooks are interactive documents that combine code, visualizations, and explanatory text. They’re ideal for data analysis because you can:

Jupyter notebooks have the extension .ipynb.

Data requirements

This section outlines the data requirements for a quasi-experimental analysis using the CausalForestDML methodology to estimate heterogeneous effects of Copilot usage on External Collaboration Hours.

All relevant confounding variables (covariates) must be collected to control for other factors that could influence the outcome. This data will be measured on a weekly basis where applicable. The required data fall into three categories:

  1. HR attributes
  2. Collaboration metrics (Viva Insights)
  3. Copilot training information

💡 Note: Each HR attribute is a potential confounder to account for differences in workforce composition (e.g. seniority, region) between groups. The Viva Insights (VI) metrics are weekly collaboration measures that capture work patterns (network sizes, after-hours work, focus time, etc.) including Copilot usage activity. The Copilot training data indicates which users received Copilot training and how many hours, providing context on enablement efforts.

How to run

Step 1: Download the Person Query from Viva Insights

  1. Navigate to Viva Insights Person Query
    • Access your Viva Insights workspace
    • Go to Analyze > Query designer > Person query
  2. Set the time range
    • Select 1 month before the intervention (pre-intervention baseline) and 3 months after the intervention
    • The intervention date should be when Copilot licenses were first assigned to users
    • This 4-month window provides sufficient data for before/after comparison
  3. Select required metrics by category
    Click “Add metrics” and choose from these specific groupings:

    Metric Category Required Fields
    Collaboration network Internal Network Size; External Network Size; Strong Ties; Diverse Ties
    After hours collaboration After-hours Meeting Hours; After-hours Email Hours; Available-to-focus Hours
    Collaboration by day of week Weekend Collaboration Hours
    Learning time Calendared Learning Time
    Collaboration activity Active Connected Hours
    External collaboration External 1:1 Meeting Hours
    Focus metrics Uninterrupted Hours
    Microsoft 365 Copilot Select all metrics (this captures all Copilot usage data)
  4. Configure analysis attributes
    • Set IsActive = True to include only active employees
    • Select the following employee attributes:
    Attribute Category Required Fields
    HR Attributes Level Designation; Tenure; Region; Function; Org; SupervisorIndicator (Manager vs. Individual Contributor)
    Additional Data (Copilot training) Copilot Training Participation (Yes/No); Copilot Training Duration (hours per week, if available)
  5. Export and save the data as a CSV file

Step 2: Update the Jupyter notebook with new file paths

  1. Open the provided .ipynb in VS Code or Jupyter
  2. Locate the data loading cell (typically near the top):
    data = pd.read_csv("data/synthetic_employees_data_v32.csv")
    
  3. Replace the file path with your exported Viva Insights data:
    data = pd.read_csv("path/to/your/viva_insights_export.csv")
    
  4. Update any outcome or treatment variable names to match your data column names
  5. Run all cells in sequence to perform the analysis

Alternative: Using Command-Line Tools

For more advanced users, you can use the specialized command-line tools in the vi_ate_cate folder:

# For Average Treatment Effect analysis
python main.py ate --data-file "your_data.csv" --treatment-var "Teams Copilot Usage"

# For subgroup analysis (CATE)  
python main.py cate --data-file "your_data.csv" --treatment-var "Teams Copilot Usage"

Evaluating the outputs

The analysis generates several types of outputs across the different methodologies. Understanding these outputs is crucial for making informed decisions about Copilot deployment and optimization.

Average Treatment Effect (ATE)

What is it?

The Average Treatment Effect represents the expected change in outcome (external collaboration hours) for a randomly selected individual if they were to increase their Copilot usage from a baseline level (typically 0) to a specific treatment level.

Key output files:

How to interpret:

Individual vs. Average vs. Conditional Effects:

DiD Analysis Outputs

The DiD analysis provides several specifications with increasing levels of control:

Key metrics to examine:

ITSA Analysis Outputs

ITSA reveals the temporal dynamics of treatment effects:

Interpretation guidelines:

CATE Analysis Outputs

CATE analysis identifies heterogeneous treatment effects across subgroups:

Key output files:

Subgroup interpretation:

Best practices

Statistical Rigor

Domain Validation

Implementation Considerations