OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?

OpenRCA includes 335 failures from three enterprise software systems, along with over 68 GB of telemetry data (logs, metrics, and traces). Given a failure case and its associated telemetry, the LLM is tasked to identify the root cause of the failure, requiring comprehension of software dependencies and reasoning over heterogeneous, long-context telemetry data.
Microsoft LogoCUHK-SZ LogoTsinghua Logo
News

2025/1/23 Our paper has been accepted by ICLR 2025.

2025/1/23 Released OpenRCA dataset with 335 failure cases.

Leaderboard
Method Name
Model
Org.CorrectPartialDate
RCA-Agent
Claude 3.5 Sonnet
Microsoft Logo11.34%17.31%2025/1/23
RCA-Agent
GPT-4o
Microsoft Logo8.96%17.91%2025/1/23
Prompting (Oracle)
Gemini 1.5 Pro
Microsoft Logo7.16%23.58%2025/1/23
Prompting (Balanced)
Gemini 1.5 Pro
Microsoft Logo6.27%24.18%2025/1/23
Prompting (Oracle)
GPT-4o
Microsoft Logo6.27%15.82%2025/1/23
Prompting (Oracle)
Claude 3.5 Sonnet
Microsoft Logo5.37%17.61%2025/1/23
Prompting (Oracle)
Command R+
Microsoft Logo4.78%7.46%2025/1/23
Prompting (Oracle)
Mistral Large 2
Microsoft Logo4.48%10.45%2025/1/23
Prompting (Balanced)
Command R+
Microsoft Logo4.18%8.96%2025/1/23
Prompting (Balanced)
Claude 3.5 Sonnet
Microsoft Logo3.88%18.81%2025/1/23
Prompting (Oracle)
Llama 3.1 Instruct
Microsoft Logo3.88%14.93%2025/1/23
Prompting (Balanced)
Mistral Large 2
Microsoft Logo3.58%6.40%2025/1/23
Prompting (Balanced)
GPT-4o
Microsoft Logo3.28%14.33%2025/1/23
RCA-Agent
Llama 3.1 Instruct
Microsoft Logo3.28%5.67%2025/1/23
Prompting (Balanced)
Llama 3.1 Instruct
Microsoft Logo2.99%14.63%2025/1/23
RCA-Agent
Gemini 1.5 Pro
Microsoft Logo2.69%6.87%2025/1/23
Is your model or agent up to the challenge? Submit your results here!
Submit
Submission Guidelines

If you want to have your results included, please include the following in your email:

Name of your method

Inference results in valid format (see GitHub repository)

Accuracy of your method tested in your own environment

(Optional) Link to your repository

(Optional) Execution trajectory of your method

(Optional) Reproduction guidelines of your method

(Optional) Docker image of your method and environment

Note: Inclusion in the leaderboard will be attempted on a best-effort basis. We cannot guarantee the timely processing of requests.

What is the task in OpenRCA?
Identify the root cause of the failure!
OpenRCA Task

Each OpenRCA task is based on a real-world failure case from a software system and its associated telemetry data. Given the failure case and its associated telemetry, the task is to identify the root cause of the failure, requiring comprehension of software dependencies and reasoning over heterogeneous, long-context telemetry data.

Check out our paper for more details!
OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?

Junjielong Xu1,2, Qinan Zhang1, Zhiqing Zhong1, Shilin He2, Chaoyun Zhang2, Qingwei Lin2, Dan Pei3, Pinjia He1, Dongmei Zhang2, Qi Zhang2

1School of Data Science, The Chinese University of Hong Kong, Shenzhen 2Microsoft 3Tsinghua University

If you have any remaining questions, please feel free to contact us at openrcanon@gmail.com

Citing this work

If you use this benchmark, please cite:

@inproceedings{xu2025openrca,
title={OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?},
author={Junjielong Xu and Qinan Zhang and Zhiqing Zhong and Shilin He and Chaoyun Zhang and Qingwei Lin and
Dan Pei and Pinjia He and Dongmei Zhang and Qi Zhang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=M4qNIzQYpd}
}