GLUECoS

What is GLUECoS?

GLUECoS is an evaluation benchmark for code-switched NLP. The current version of the benchmark has eleven datasets, spanning six tasks and two language pairs (English-Hindi and English-Spanish). The tasks included in the benchmark are :

Language Identification (LID)
POS Tagging (POS)
Named Entity Recognition (NER)
Sentiment Analysis (SA)
Question Answering (QA)
Natural Language Inference (NLI)

GLUECoS is a continuous effort and we hope to make this an ever-growing test-bed for code-mixed language understanding, and plan to include more tasks and diverse language pairs in future versions.

Getting Started

Scripts for data preprocessing and evaluation for the benchmark can be found on our Github page which can be used to download all the datasets, process them and evaluate models on the benchmark.

To be included in the leaderboard, please refer to the README in the above link for submission instructions.

Leaderboard

Rank	Team	Model	Average
1 July 18, 2020	Microsoft Research	mBERT	72.80

Rank	Team	Model	EN-ES	EN-HI
1 July 18, 2020	Microsoft Research	mBERT	96.44	95.18

Rank	Team	Model	EN-ES	EN-HI FG	EN-HI UD
1 July 18, 2020	Microsoft Research	mBERT	93.98	64.14	87.68

Rank	Team	Model	EN-ES	EN-HI
1 July 18, 2020	Microsoft Research	mBERT	60.66	76.95

Rank	Team	Model	EN-ES	EN-HI
1 July 18, 2020	Microsoft Research	mBERT	63.02	57.51

Rank	Team	Model	EN-HI
1 July 18, 2020	Microsoft Research	mBERT	62.23

Rank	Team	Model	EN-HI
1 July 18, 2020	Microsoft Research	mBERT	57.74

Acknowledgments

Simran Khanuja, Microsoft Research India (work done during internship at MSRI)
Anirudh Srinivasan, Microsoft Research India
Sandipan Dandapat, Microsoft IDC
Sunayana Sitaram, Microsoft Research India
Monojit Choudhury, Microsoft Research India
Tanuja Ganu, Microsoft Research India
Kalika Bali, Microsoft Research India

Contact us: If you have any questions, or have worked with code-mixed datasets that can be a part of this benchmark, mail us at sunayana.sitaram@microsoft.com