CodeXGLUE

CodeXGLUE stands for General Language Understanding Evaluation benchmark for CODE. It includes 14 datasets for 10 diversified programming language tasks covering code-code (clone detection, defect detection, cloze test, code completion, code refinement, and code-to-code translation), text-code (natural language code search, text-to-code generation), code-text (code summarization) and text-text (documentation translation) scenarios. We provide three baseline models to support these tasks, including BERT-style pre-trained model (i.e. CodeBERT) which is good at understanding problems, GPT-style pre-trained model which we call CodeGPT to support completion and generation problems, and Encoder-Decoder framework that supports sequence-to-sequence generation problems.

Overall Leaderboard

				Code-Code						Text-Code		Code-Text	Text-Text
Rank	Model	Organization	Date	clone detection	defect detections	cloze test	code completion	code refinement	code translation	natural language code search	code generation	code summarization	documentation translation	CodeXGLUE Score

Clone Detection (Code-Code)

				BigCloneBench			POJ-104
Rank	Model	Organization	Date	Precision	Recall	F1	MAP@R(%)

Defect Detection (Code-Code)

Rank	Model	Organization	Date	Accuracy

Cloze Test (Code-Code)

				ClozeTest-all							ClozeTest-maxmin
Rank	Model	Organization	Date	All	Ruby	JavaScript	Go	Python	Java	PHP	All	Ruby	JavaScript	Go	Python	Java	PHP

Code Completion (Code-Code)

				CodeCompletion-line				CodeCompletion-token
Rank	Model	Organization	Date	py150-EM	py150-ES	java-EM	java-ES	py150-Acc	java-Acc

Code Refinement (Code-Code)

				small test set			medium test set
Rank	Model	Organization	Date	BLEU	Acc(%)	CodeBLEU	BLEU	Acc(%)	CodeBLEU

Code Translation (Code-Code)

				Java to C#			C# to Java
Rank	Model	Organization	Date	BLEU	Acc(%)	CodeBLEU	BLEU	Acc(%)	CodeBLEU

Type Prediction (Code-Code)

				Top 100				Overall
Rank	Model	Organization	Date	Precision	Recall	F1	Accuracy	Precision	Recall	F1	Accuracy

Natural Language Code Search (Text-Code)

Rank	Model	Organization	Date	Adv Test (MRR)	WebQuery Test (Accuracy)

Code Generation (Text-Code)

				Text2Code Generation
Rank	Model	Organization	Date	EM	BLEU	CodeBLEU

Code Summarization (Code-Text)

Rank	Model	Organization	Date	All	Ruby	JavaScript	Go	Python	Java	PHP

Documentation Translation (Text-Text)

				Translation Direction
Rank	Model	Organization	Date	EN->DA	EN->LV	EN->NO	EN->ZH	DA->EN	LV->EN	NO->EN	ZH->EN

CodeXGLUE Submission Instructions

Once you have built a model that meets your expectations on evaluation with the dev set, you can submit your test results to get official evaluation on the test set. To ensure the integrity of the official test results, we do not release the correct answers for test set to the public. To submit your model for official evaluation on the test set, follow the below steps:

1. Generate your prediction output for the dev set.
2. Run the official evaluation methodologies found in the task specific git repo and verify your systems are running as expected.
3. Generate your prediction output for the test set.
4. Submit the following information by emailing to codexglue@microsoft.com

Your email should include:

1. Prediction results on test set. [Required]
2. Prediction results on dev set. [Recommended]
3. Individual/Team Name: Name of the individual or the team to appear in the leaderboard. [Required]
4. Individual/Team Institution: Name of the institution of the individual or the team to appear in the leaderboard. [Optional]
5. Model code: Training code for the model. [Recommended]
6. Model information: Name of the model/technique to appear in the leaderboard. [Required]
7. Paper Information: Name, Citation, URL of the paper if model is from a published work to appear in the leaderboard. [Optional]

To avoid "P-hacking" we discourage too many submissions from the same group in a short period of time.

How to cite

To cite CodeXGLUE:

@article{DBLP:journals/corr/abs-2102-04664,

							  author    = {Shuai Lu and

							    Daya Guo and

							    Shuo Ren and

							    Junjie Huang and

							    Alexey Svyatkovskiy and

							    Ambrosio Blanco and

							    Colin B. Clement and

							    Dawn Drain and

							    Daxin Jiang and

							    Duyu Tang and

							    Ge Li and

							    Lidong Zhou and

							    Linjun Shou and

							    Long Zhou and

							    Michele Tufano and

							    Ming Gong and

							    Ming Zhou and

							    Nan Duan and

							    Neel Sundaresan and

							    Shao Kun Deng and

							    Shengyu Fu and

							    Shujie Liu},

							  title     = {CodeXGLUE: {A} Machine Learning Benchmark Dataset for Code Understanding

							    and Generation},

							  journal   = {CoRR},

							  volume    = {abs/2102.04664},

							  year      = {2021}

						  }

If you only use part of the CodeXGLUE dataset, please also cite each seperated dataset.