CodeXGLUE stands for General Language Understanding Evaluation benchmark for CODE. It includes 14 datasets for 10 diversified programming language tasks covering code-code (clone detection, defect detection, cloze test, code completion, code refinement, and code-to-code translation), text-code (natural language code search, text-to-code generation), code-text (code summarization) and text-text (documentation translation) scenarios. We provide three baseline models to support these tasks, including BERT-style pre-trained model (i.e. CodeBERT) which is good at understanding problems, GPT-style pre-trained model which we call CodeGPT to support completion and generation problems, and Encoder-Decoder framework that supports sequence-to-sequence generation problems.

Overall Leaderboard

Code-Code
Text-Code
Code-Text
Text-Text
Rank
Model
Organization
Date
clone detection
defect detections
cloze test
code completion
code refinement
code translation
natural language code search
code generation
code summarization
documentation translation
CodeXGLUE Score

Clone Detection (Code-Code)

BigCloneBench
POJ-104
Rank
Model
Organization
Date
Precision
Recall
F1
MAP@R(%)

Defect Detection (Code-Code)

Rank
Model
Organization
Date
Accuracy

Cloze Test (Code-Code)

ClozeTest-all
ClozeTest-maxmin
Rank
Model
Organization
Date
All
Ruby
JavaScript
Go
Python
Java
PHP
All
Ruby
JavaScript
Go
Python
Java
PHP

Code Completion (Code-Code)

CodeCompletion-line
CodeCompletion-token
Rank
Model
Organization
Date
py150-EM
py150-ES
java-EM
java-ES
py150-Acc
java-Acc

Code Refinement (Code-Code)

small test set
medium test set
Rank
Model
Organization
Date
BLEU
Acc(%)
CodeBLEU
BLEU
Acc(%)
CodeBLEU

Code Translation (Code-Code)

Java to C#
C# to Java
Rank
Model
Organization
Date
BLEU
Acc(%)
CodeBLEU
BLEU
Acc(%)
CodeBLEU

Type Prediction (Code-Code)

Top 100
Overall
Rank
Model
Organization
Date
Precision
Recall
F1
Accuracy
Precision
Recall
F1
Accuracy

Natural Language Code Search (Text-Code)

Rank
Model
Organization
Date
Adv Test (MRR)
WebQuery Test (Accuracy)

Code Generation (Text-Code)

Text2Code Generation
Rank
Model
Organization
Date
EM
BLEU
CodeBLEU

Code Summarization (Code-Text)

Rank
Model
Organization
Date
All
Ruby
JavaScript
Go
Python
Java
PHP

Documentation Translation (Text-Text)

Translation Direction
Rank
Model
Organization
Date
EN->DA
EN->LV
EN->NO
EN->ZH
DA->EN
LV->EN
NO->EN
ZH->EN

CodeXGLUE Submission Instructions

Once you have built a model that meets your expectations on evaluation with the dev set, you can submit your test results to get official evaluation on the test set. To ensure the integrity of the official test results, we do not release the correct answers for test set to the public. To submit your model for official evaluation on the test set, follow the below steps:

  1. 1. Generate your prediction output for the dev set.
  2. 2. Run the official evaluation methodologies found in the task specific git repo and verify your systems are running as expected.
  3. 3. Generate your prediction output for the test set.
  4. 4. Submit the following information by emailing to codexglue@microsoft.com

Your email should include:

  1. 1. Prediction results on test set. [Required]
  2. 2. Prediction results on dev set. [Recommended]
  3. 3. Individual/Team Name: Name of the individual or the team to appear in the leaderboard. [Required]
  4. 4. Individual/Team Institution: Name of the institution of the individual or the team to appear in the leaderboard. [Optional]
  5. 5. Model code: Training code for the model. [Recommended]
  6. 6. Model information: Name of the model/technique to appear in the leaderboard. [Required]
  7. 7. Paper Information: Name, Citation, URL of the paper if model is from a published work to appear in the leaderboard. [Optional]

To avoid "P-hacking" we discourage too many submissions from the same group in a short period of time.

How to cite

To cite CodeXGLUE:

@article{DBLP:journals/corr/abs-2102-04664,
  author = {Shuai Lu and
    Daya Guo and
    Shuo Ren and
    Junjie Huang and
    Alexey Svyatkovskiy and
    Ambrosio Blanco and
    Colin B. Clement and
    Dawn Drain and
    Daxin Jiang and
    Duyu Tang and
    Ge Li and
    Lidong Zhou and
    Linjun Shou and
    Long Zhou and
    Michele Tufano and
    Ming Gong and
    Ming Zhou and
    Nan Duan and
    Neel Sundaresan and
    Shao Kun Deng and
    Shengyu Fu and
    Shujie Liu},
  title = {CodeXGLUE: {A} Machine Learning Benchmark Dataset for Code Understanding
    and Generation},
  journal = {CoRR},
  volume = {abs/2102.04664},
  year = {2021}
}

If you only use part of the CodeXGLUE dataset, please also cite each seperated dataset.