Starting with a paper released at NIPS 2016, MS MARCO is a collection of datasets focused on deep learning in search.
The first dataset was a question answering dataset featuring 100,000 real Bing questions and a human generated answer. Since then we released a 1,000,000 question dataset, a natural langauge generation dataset, a passage ranking dataset, keyphrase extraction dataset, crawling dataset, and a conversational search.
The NLGEN and QnA Leaderboard will close on 10/23/2020. see DataSet Retirement for details. If you would like to evaluate a model please submit before then
The MS MARCO datasets are intended for non-commercial research purposes only to promote advancement in the field of artificial intelligence and related areas, and is made available free of charge without extending any license or other intellectual property rights. The dataset is provided “as is” without warranty and usage of the data has risks since we may not own the underlying rights in the documents. We are not be liable for any damages related to use of the dataset. Feedback is voluntarily given and can be used as we see fit. Upon violation of any of these terms, your rights to use the dataset will end automatically.
Please contact us at ms-marco@microsoft.com if you own any of the documents made available but do not want them in this dataset. We will remove the data accordingly. If you have questions about use of the dataset or any research outputs in your products or services, we encourage you to undertake your own independent legal review. For other questions, please feel free to contact us.
Based the questions in the Question Answering Dataset and the documents which answered the questions a document ranking task was formulated. There are 3.2 million documents and the goal is to rank based on their relevance.
Relevance labels are derived from what passages was marked as having the answer in the QnA dataset making this one of the largest relevance datasets ever.
This dataset is the focus of the 2020 and 2019 TREC Deep Learning Track and has been used as a teaching aid for ACM SIGIR/SIGKDD AFIRM Summer School on Machine Learning for Data Mining and Search.
In 2020 we release a set of cleaned and formated clicks for all documents in the collection. This collection of 20 million clicks is called ORCAS.
Based on the passages and questions in the Question Answering Dataset, a passage ranking task was formulated. There are 8.8 million passages and the goal is to rank based on their relevance.
Relevance labels are derived from what passages was marked as having the answer in the QnA dataset making this one of the largest relevance datasets ever.
This dataset is the focus of the 2020 and 2019 TREC Deep Learning Track and has been used as a teaching aid for ACM SIGIR/SIGKDD AFIRM Summer School on Machine Learning for Data Mining and Search.
In 2020 we release a set of cleaned and formated clicks for all documents in the collection. This collection of 20 million clicks is called ORCAS.
Rank | Model | Ranking Style | Submission Date | MRR@10 On Eval | MRR@10 On Dev |
---|---|---|---|---|---|
RocketQA + ERNIE Baidu NLP - [Qu et al.] | Full Ranking | September 18th, 2020 | 0.426 | 0.439 | |
UED-Large Anonymous | Full Ranking | August 12th, 2020 | 0.424 | 0.436 | |
DR-BERT X.W. S of Meituan-Dianping NLP-KG Group | Full Ranking | May 20th, 2020 | 0.419 | 0.420 | |
expando-mono-duo-T5 Ronak Pradeep, Rodrigo Nogueira, Zhiying Jiang, and Jimmy Lin - University of Waterloo | Full Ranking | May 19th, 2020 | 0.408 | 0.420 | |
DeepCT + TF-Ranking Ensemble of BERT, ROBERTA and ELECTRA (1) Shuguang Han, (2) Zhuyun Dai, (1) Xuanhui Wang, (1) Michael Bendersky and (1) Marc Najork - 1) Google Research, (2) Carnegie Mellon - Paper and Code | Full Ranking | June 2nd, 2020 | 0.407 | 0.421 | |
UED Anonymous | Full Ranking | May 5th, 2020 | 0.405 | 0.414 | |
UED-Large Anonymous | Full Ranking | August 11th, 2020 | 0.405 | ||
TABLE Model X.W. S of Meituan-Dianping NLP-KG Group | Full Ranking | May 11th, 2020 | 0.401 | 0.412 | |
TABLE Model X.W. S of Meituan-Dianping NLP-KG Group | Full Ranking | January 21th, 2020 | 0.400 | 0.401 | |
TABLE Model X.W. S of Meituan-Dianping NLP-KG Group | Full Ranking | May 8th, 2020 | 0.400 | 0.401 | |
Knowledge Distilled Student + Teacher Ensemble Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, Allan Hanbury of TU Wien - [Hofstätter et al., '20] and [Code] | Full Ranking | November 30th, 2020 | 0.399 | 0.407 | |
DeepCT Retrieval + TF-Ranking BERT Ensemble 1) Shuguang Han, (2) Zhuyun Dai, (1) Xuanhui Wang, (1) Michael Bendersky and (1) Marc Najork - (1) Google Research, (2) Carnegie Mellon University - Paper [Han, et al. '20]Code | Full Ranking | April 10th, 2020 | 0.395 | 0.405 | |
DeepCT + Bart Binsheng Liu - RMIT University | Full Ranking | May 6th, 2020 | 0.394 | 0.408 | |
Enriched BERT base + AOA index + CAS Ming Yan of Alibaba Damo NLP | Full Ranking | August 20th, 2019 | 0.393 | 0.408 | |
TF-Ranking Ensemble of BERT, ROBERTA and ELECTRA (1) Shuguang Han, (2) Zhuyun Dai, (1) Xuanhui Wang, (1) Michael Bendersky and (1) Marc Najork - 1) Google Research, (2) Carnegie Mellon - Paper and Code | ReRanking | June 2nd, 2020 | 0.391 | 0.405 | |
BM25 + Bert-C sookienlane | Full Ranking | February 21st,2019 | 0.388 | 0.394 | |
W-Index retrieval + BERT-F re-rank Zhuyun Dai of Carnegie Mellon University | Full Ranking | September 12th,2019 | 0.388 | 0.394 | |
BM25 + monoT5-3B Ronak Pradeep, Rodrigo Nogueira, Zhiying Jiang, and Jimmy Lin of University of Waterloo | Full Ranking | October 2nd,2020 | 0.388 | 0.398 | |
Enriched BERT base + AOA index V1 Ming Yan of Alibaba Damo NLP | Full Ranking | May 13th, 2019 | 0.383 | 0.397 | |
BERTter pretraining (1)Rodrigo Nogueira, (2)Wei Yang, (3)Jimmy Lin, (4)Kyunghyun Cho - New York University(1,4), University of Waterloo(2,3), Facebook AI Research(4) | Full Ranking | May 21st, 2019 | 0.383 | 0.395 | |
R-Index and R-BERT X.W. S | Full Ranking | Jan 14th,2020 | 0.382 | 0.429 | |
Enriched BERT base + AOA index V2 Ming Yan of Alibaba Damo NLP | Full Ranking | May 13th, 2019 | 0.380 | 0.389 | |
BM25 + monoBERT + duoBERT + TCP (1)Rodrigo Nogueira, (2)Wei Yang, (3)Jimmy Lin, (4)Kyunghyun Cho - New York University(1,4), University of Waterloo(2,3), Facebook AI Research(4) [Nogueira, et al. '19] and Code | Full Ranking | June 26th, 2019 | 0.379 | 0.390 | |
BM25 + Electra Large OpenMatch - THU-MSR - [Code] | ReRanking | August 13th, 2020 | 0.376 | 0.388 | |
BERT^2 (1)Rodrigo Nogueira, (2)Wei Yang, (3)Jimmy Lin, (4)Kyunghyun Cho - New York University(1,4), University of Waterloo(2,3), Facebook AI Research(4) | Full Ranking | May 13th, 2019 | 0.375 | 0.386 | |
TF-Ranking + BERT(Ensemble of pointwise, pairwise and listwise losses)TF-Ranking team (Shuguang Han, Xuanhui Wang, Michael Bendersky and Marc Najork) of Google Research - Paper [Han, et al. '20] and [Code] | ReRanking | March 30th, 2020 | 0.375 | 0.388 | |
BM25 + Roberta Large OpenMatch - THU-MSR - [Code] | ReRanking | August 13th, 2020 | 0.375 | 0.386 | |
Enriched BERT base + AOA index Ming Yan of Alibaba Damo NLP | Full Ranking | May 6th, 2019 | 0.373 | 0.387 | |
BM25 + monoBERT + duoBERT (1)Rodrigo Nogueira, (2)Wei Yang, (3)Jimmy Lin, (4)Kyunghyun Cho - New York University(1,4), University of Waterloo(2,3), Facebook AI Research(4) [Nogueira, et al. '19] | Full Ranking | June 26th, 2019 | 0.370 | 0.382 | |
ReinforcedQGen+BERTRank Rajarshee Mitra of Microsoft STCI | Full Ranking | August 5th, 2019 | 0.369 | - | |
BERTter Indexing (1)Rodrigo Nogueira, (2)Wei Yang, (3)Jimmy Lin, (4)Kyunghyun Cho - New York University(1,4), University of Waterloo(2,3), Facebook AI Research(4) [Nogueira et al. '19] and [Code] | Full Ranking | April 8th, 2019 | 0.368 | 0.375 | |
Enriched BERT base + AOA index Ming Yan of Alibaba Damo NLP | ReRanking | May 6th, 2019 | 0.368 | 0.373 | |
ELECTRA-Large Jheng-Hong Yang(1), Sheng-Chieh Lin(1), Rodrigo Nogueira(2), Jimmy Lin(2) - Academia Sinica(1), University of Waterloo(2) | ReRanking | March 23rd, 2020 | 0.367 | 0.376 | |
TF-Ranking + BERT(Softmax Loss, List size 6, 200k steps)TF-Ranking team (Shuguang Han, Xuanhui Wang, Michael Bendersky and Marc Najork) of Google Research - Paper [Han, et al. '20] and [Code] | Re Ranking | March 16th, 2020 | 0.366 | 0.378 | |
BM25 + monoBERT (1)Rodrigo Nogueira, (2)Wei Yang, (3)Jimmy Lin, (4)Kyunghyun Cho - New York University(1,4), University of Waterloo(2,3), Facebook AI Research(4) [Nogueira, et al. '19] | Full Ranking | June 26th, 2019 | 0.365 | 0.372 | |
BERT base + attention ranking anonymous | ReRanking | August 26th, 2019 | 0.364 | 0.377 | |
BERT + Small Training Rodrigo Nogueira(1) and Kyunghyun Cho(2) - New York University(1,2), Facebook AI Research(2) [Nogueira, et al. '19] and [Code] | ReRanking | January 7th, 2019 | 0.359 | 0.365 | |
SAN + BERT base Yu Wang, Xiaodong Liu, Jianfeng Gao - Deep Learning Group, Microsoft Research AI [Xiaodong, et al. '18] | ReRanking | January 22th, 2019 | 0.359 | 0.370 | |
BERT + Projected Matching Yifan Qiao(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Lui(4) - Tsinghua University(1,3,4), Microsoft Research(2) [ Qiao et al. '19] | ReRanking | February 7th,2019 | 0.356 | - | |
BERT base + L2R Ming Yan of Alibaba Damo NLP | ReRanking | March 16th,2019 | 0.356 | 0.364 | |
LBERT-base anonymous | ReRanking | March 1st, 2019 | 0.349 | - | |
BERT-Base Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma of the Information Retrieval Group, Tsinghua University | ReRanking | April 8th, 2020 | 0.349 | 0.358 | |
BERT base + attention ranking anonymous | ReRanking | March 1st, 2019 | 0.347 | 0.317 | |
BERT + Small Training Xue-He Wang, Chia-Hung Yuan, Bing-Han Chiang, Dong-Ze Wu, Lu-Dan Ruan, Shan-Hung Wu of National Tsing Hua University | ReRanking | June 20th, 2019 | 0.347 | 0.361 | |
WAND (BM25) retrieval (text only), re-ranking 1K with ColBERT (bert medium, dim=32, cosine) using Vespa.ai Jo Kristian Bergum - Vespa.ai - [Code] | Full Ranking | January 13th, 2021 | 0.347 | 0.354 | |
BERT-base +ranking loss + horovod Milk&Cereal | ReRanking | May 6th, 2019 | 0.346 | 0.352 | |
BERT-base fine-tune ICT-NLU | ReRanking | May 23rd, 2019 | 0.346 | 0.349 | |
BERT, Roberta, Electra, Anserini, DeepCT retrieval models (ensembled) Leonid Pugachev, DeepPavlov- Moscow Institute of Physics and Technology | ReRanking | July 20th, 2020 | 0.346 | 0.394 | |
BM25 + BERT Base OpenMatch - THU-MSR - [Code] | ReRanking | August 7th, 2020 | 0.345 | 0.349 | |
BERT base + attention ranking anonymous | ReRanking | March 11th, 2019 | 0.344 | - | |
BM25 + Electra Base OpenMatch - THU-MSR - [Code] | ReRanking | August 7th, 2020 | 0.344 | 0.352 | |
BERT base + attention ranking anonymous | ReRanking | March 4th, 2019 | 0.343 | - | |
Bert-base + hinge ranking loss Milk&Cereral | ReRanking | April 24th, 2019 | 0.342 | 0.345 | |
BERT + L2R ICT-NLU | ReRanking | June 11th, 2019 | 0.342 | 0.348 | |
BERT-LLR Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma of the Information Retrieval Group, Tsinghua University | ReRanking | April 6th, 2020 | 0.342 | 0.352 | |
BERT-RI Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma of the Information Retrieval Group, Tsinghua University | ReRanking | April 7th, 2020 | 0.340 | 0.352 | |
BERT+ENA Di Zhao, Hui Fang, UD Infolab | ReRanking | August 11th, 2019 | 0.339 | - | |
BERT Base + Highway + Cross Entropy Loss + Axioms Di Zhao, Hui Fang, UD Infolab | ReRanking | August 9th, 2019 | 0.336 | 0.340 | |
BERT Base + Highway+Cross Entropy Loss + Axioms Di Zhao, Hui Fang, UD Infolab | ReRanking | August 11th, 2019 | 0.336 | - | |
BERT Base OpenMatch of THU-MSR - Code | ReRanking | July 28th, 2020 | 0.336 | - | |
ME-Hybrid Google Research | Full Ranking | August 18th, 2020 | 0.336 | 0.343 | |
BERT base + attention ranking anonymous | ReRanking | March 2nd, 2019 | 0.335 | - | |
BERT Base Finetuned 400k steps Chaitanya Sai Alaparthi of IIIT-Hyderabad | ReRanking | February 13th, 2020 | 0.335 | - | |
BERT + CNN Chia-Hung Yuan, Bing-Han Chiang, Xue-He Wang, Dong-Ze Wu, Lu-Dan Ruan, Shan-Hung Wu of National Tsing Hua University | ReRanking | June 15th, 2019 | 0.333 | 0.346 | |
BERT + Multilayer Interaction Yifan Qiao(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Lui(4) - Tsinghua University(1,3,4), Microsoft Research(2) [ Qiao et al. '19] | ReRanking | February 19th,2019 | 0.329 | 0.311 | |
BERT base + ranking Yifan Qiao(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Lui(4) - Tsinghua University(1,3,4), Microsoft Research(2) [ Qiao et al. '19] | ReRanking | February 8th, 2019 | 0.326 | 0.316 | |
BERT Base + Highway + Ranking Loss Di Zhao, Hui Fang, UD Infolab | ReRanking | August 9th, 2019 | 0.323 | - | |
ME-BERT Google Research | Full Ranking | August 10th, 2020 | 0.323 | 0.334 | |
RDBERT-Embedding Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma - Information Retrieval Group, Tsinghua University | ReRanking | May 13th, 2020 | 0.313 | 0.320 | |
FastText + Conv-KNRM (Ensemble) Sebastian Hofstätter (1), Navid Rekabsaz (2), Carsten Eickhoff (3), and Allan Hanbury (1) - TU Wien(1), Idiap Research Institute(2), Brown University(3) [ Hofstätter et al. '19] and [Code] | ReRanking | May 8th, 2019 | 0.309 | 0.318 | |
biLSTM + Co-attention on n-grams + query-based scorer Chaitanya Sai Alaparthi-IIIT-Hyderabad | ReRanking | June 16th,2020 | 0.309 | 0.319 | |
DE-Hybrid Google Research | Full Ranking | September 18th, 2020 | 0.306 | 0.309 | |
BiLSTM + Co-attention on n-grams Chaitanya Sai Alaparthi of IIIT-Hyderabad | ReRanking | May 14th, 2020 | 0.299 | 0.310 | |
n-gram co-attention Yon | ReRanking | July 23rd, 2020 | 0.299 | 0.303 | |
DE-BERT Google Research | Full Ranking | July 31st, 2020 | 0.295 | 0.302 | |
RepBERT Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma - IR Group, Tsinghua University - [Zhan et al '20] and Code | Full Ranking | June 23rd, 2020 | 0.294 | 0.304 | |
BiLSTM + Co-Attention + self attention based document scorer Chaitanya Sai Alaparthi - IIIT-Hyderabad - [Alaparthi et al '19] | ReRanking | April 29th, 2020 | 0.291 | 0.298 | |
docTTTTTquery + T5QLM IELab - The University of Queensland | Full Ranking | September 3rd, 2020 | 0.289 | 0.300 | |
BiLSTM + CoAttention Chaitanya Sai Alaparthi - IIIT-Hyderabad [Alaparthi et al '19] | ReRanking | April 13th, 2020 | 0.286 | 0.288 | |
IRNet (Deep CNN/IR Hybrid Network) Dave DeBarr, Navendu Jain, Robert Sim, Justin Wang, Nirupama Chandrasekaran – Microsoft | ReRanking | January 2nd, 2019 | 0.281 | 0.278 | |
FastText + Conv-KNRM (Single) Sebastian Hofstätter (1), Navid Rekabsaz (2), Carsten Eickhoff (3), and Allan Hanbury (1) - TU Wien(1), Idiap Research Institute(2), Brown University(3) [ Hofstätter et al. '19] and [Code] | ReRanking | May 8th, 2019 | 0.277 | 0.290 | |
docTTTTTquery Rodrigo Nogueira (Epistemic AI), Jimmy Lin (University of Waterloo) [Paper] and [Code] | Full Ranking | November 27th, 2019 | 0.272 | 0.277 | |
Neural Kernel Match IR (Conv-KNRM) (Ensembled)(1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) [Dai et al. '18] | ReRanking | Novmeber 28th, 2018 | 0.271 | 0.290 | |
Axiom-Regularized Conv-KNRM Corby Rosset, Bhaskar Mitra, Chenyan Xiong, Nick Craswell, Xia Song, Saurabh Tiwary - Microsoft AI & Research[Rosset et al. '19] | ReRanking | February 19, 2019 | 0.263 | 0.262 | |
α-SVS NTT Media Intelligence Laboratories | Full Ranking | June 29th, 2020 | 0.262 | 0.259 | |
Encoder-Decoder model with attention + multi loss Youngjin Jang | ReRanking | June 3rd, 2020 | 0.261 | 0.273 | |
R3D anonymous | ReRanking | March 4th, 2020 | 0.260 | 0.243 | |
BERT, Roberta, Electra, Anserini, DeepCT retrieval models (ensembled) Leonid Pugachev, DeepPavlov- Moscow Institute of Physics and Technology | ReRanking | June 23rd, 2020 | 0.259 | 0.263 | |
[Official Baseline] Duet V2 (Ensembled) Bhaskar Mitra, Fernando Diaz, Nick Craswell - Microsoft AI & Research [Mitra et al. '19] and [Code] | ReRanking | February 19, 2019 | 0.253 | 0.252 | |
Duet with query term independence assumption (Single) Bhaskar Mitra (1, 2), Corby Rosset (1), David Hawking (1), Nick Craswell (1), Fernando Diaz (1), and Emine Yilmaz (2) of (1) Microsoft & (2) UCL Paper | ReRanking | March 14th, 2019 | 0.252 | 0.254 | |
Neural Kernel Match IR (Conv-KNRM) (Single)(1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) [Dai et al. '18] | ReRanking | February 19, 2019 | 0.247 | 0.247 | |
BM25 (Anserini) + ALBERT Bi-encoder for First-stage Ranking Marco Wrzalik of the LAVIS Group at RheinMain University of Applied Sciences | Full Ranking | April 24th, 2020 | 0.247 | 0.249 | |
[Official Baseline] Duet V2 (Single) Bhaskar Mitra, Fernando Diaz, Nick Craswell - Microsoft AI & Research [Mitra et al. '19s] and [Code] | ReRanking | February 20, 2019 | 0.245 | 0.243 | |
DW Index + BM25 anonymous | Full Ranking | April 29th, 2019 | 0.239 | 0.243 | |
BERT Base + Highway + Cross Entropy Loss + Axioms Di Zhao, Hui Fang, UD Infolab | ReRanking | August 5th, 2019 | 0.223 | 0.340 | |
BERT Base + Highway + Ranking Loss Di Zhao, Hui Fang, UD Infolab | ReRanking | August 5th, 2019 | 0.222 | 0.340 | |
BM25 (Anserini) + doc2query (1)Rodrigo Nogueira, (2)Wei Yang, (3)Jimmy Lin, (4)Kyunghyun Cho - New York University(1,4), University of Waterloo(2,3), Facebook AI Research(4) [Nogueira et al. '19] and [Code] | Full Ranking | April 10th, 2019 | 0.218 | 0.215 | |
Neural Kernel Match IR (Conv-KNRM) (Ensembled)(1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) [Dai et al. '18] | ReRanking | Novmeber 26th, 2018 | 0.199 | 0.199 | |
Neural Kernel Match IR (KNRM) ((1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) [ Xiong et al. '17] | ReRanking | December 10th, 2018 | 0.198 | 0.218 | |
Feature-based LeToR: simple-feature based RankSVM(1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) | ReRanking | December 10th, 2018 | 0.191 | 0.195 | |
BM25 (Lucene8, tuned) (1)Rodrigo Nogueira, (2)Wei Yang, (3)Jimmy Lin, (4)Kyunghyun Cho - New York University(1,4), University of Waterloo(2,3), Facebook AI Research(4) [Nogueira, et al. '19] | Full Ranking | June 26th, 2019 | 0.190 | 0.187 | |
BM25 (Anserini) (1)Rodrigo Nogueira, (2)Wei Yang, (3)Jimmy Lin, (4)Kyunghyun Cho - New York University(1,4), University of Waterloo(2,3), Facebook AI Research(4) [Nogueira et al. '19] and [Code] | Full Ranking | April 10th, 2019 | 0.186 | 0.184 | |
Unnamed Hongyin Zhu | ReRanking | June 26th, 2019 | 0.174 | - | |
[Official Baseline]BM25 Stephen E. Robertson; Steve Walker; Susan Jones; Micheline Hancock-Beaulieu & Mike Gatford (Implemented by MSMARCO Team) [ Robertson et al. '94] | Full Ranking | Novmeber 1st, 2018 | 0.165 | 0.167 | |
FastMatch Anonymous | ReRanking | August 17th, 2020 | 0.154 | 0.329 | |
BERT Represenatation Yifan Qiao(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Lui(4) - Tsinghua University(1,3,4), Microsoft Research(2) [Qiao et al. '19] | ReRanking | February 19th,2019 | 0.015 | 0.043 |
Keyphrase extraction on open domain document is an up and coming area that can be used for many NLP tasks like document ranking, Topic Clusetring, etc. To enable the research community to build performant KeyPhrase Extraction systems we have build OpenKP a human annotated extraction of Keyphrases on a wide variety of documents.
The dataset features 148,124 real world web documents along with a human annotation indicating the 1-3 most relevant keyphrases. More information about the dataset and our initial experiments can be found in the paper Open Domain Web Keyphrase Extraction Beyond Language Modeling which was an oral presentation at EMNLP-IJCNLP 2019. It is part of the MSMARCO dataset family and research projects like this power the core document understanding pipeline that Bing uses.
Rank | Model | Submission Date | F1 @1,@3,@5 |
---|---|---|---|
ETC-large anonymous | May31 st, 2020 | 0.393, 0.420, 0.360 | |
RoBERTa-JointKPE (Base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.364, 0.391, 0.338 | |
RoBERTa-RankKPE (Base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.361, 0.390, 0.337 | |
SpanBERT-JointKPE (Base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.359, 0.385, 0.335 | |
RoBERTa-TagKPE (Base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.356, 0.381, 0.332 | |
SpanBERT-RankKPE (Base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.355, 0.380, 0.331 | |
BERT-JointKPE (Base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.349, 0.376, 0.325 | |
SpanBERT-TagKPE (Base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.351, 0.374, 0.325 | |
BERT-RankKPE (Base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.342, 0.374, 0.325 | |
RoBERTa-ChunkKPE (Base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.355, 0.373, 0.324 | |
SpanBERT-ChunkKPE (Base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.348, 0.372, 0.324 | |
BERT-TagKPE (Base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.343, 0.364, 0.318 | |
BERT (Base) Sequence Tagging Baseline Si Sun (Tsinghua University), Chenyan Xiong (MSR AI), Zhiyuan Liu (Tsinghua University) [Code] | November 5th, 2019 | 0.321, 0.361, 0.314 | |
BERT-ChunkKPE (base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.340, 0.355, 0.311 | |
SpanBERT-SpanKPE (base)Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.329, 0.351, 0.304 | |
RoBERTa-SpanKPE (base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.330, 0.350, 0.305 | |
LLbeBack Rodrigo Nogueira (Epistemic AI), Jimmy Lin (University of Waterloo) | November 19th, 2019 | 0.349, 0.341, 0.246 | |
BERT-SpanKPE (base) Si Sun(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Liu(4), Jie Bao(5) - Tsinghua University(1,3,4,5), MSR AI(2)- [Sun et al '20] and [Code] | February 6th, 2020 | 0.317, 0.332, 0.289 | |
Baseline finetuned on Bing Queries MSMARCO Team [Xiong, et al. '19] | October 19th, 2019 | 0.267, 0.292, 0.209 | |
Baseline MSMARCO Team [Xiong, et al. '19] | October 19th, 2019 | 0.244, 0.277, 0.198 |
The original focus of MSMARCO was to provide a corpus for training and testing systems which given a real domain user query systems would then provide the most likley candidate answer and do so in language which was natural and conversational.
This data comes in three tasks/forms: Original QnA dataset(v1.1), Question Answering(v2.1), Natural Language Generation(v2.1). The original question answering datset featured 100,000 examples and was released in 2016. Leaderboard is now closed but data is availible below.
The current competitive tasks are Question Answering and Natural Language Generation. Question Answering features over 1,000,000 queries and is much like the original QnA dataset but bigger and with higher quality. The Natural Language Generation dataset features 180,000 examples and builds upon the QnA dataset to deliver answers that could be spoken by a smart speaker.
Rank | Model | Submission Date | Rouge-L | Bleu-1 |
---|---|---|---|---|
Multi-doc Enriched BERT Ming Yan of Alibaba Damo NLP | June 20th, 2019 | 0.540 | 0.565 | |
Human Performance | April 23th, 2018 | 0.539 | 0.485 | BERT Encoded T-Net Y. Zhang, C. Wang, X.L. Chen | August 5th, 2019 | 0.526 | 0.539 |
Selector+Combine-Content-Generator QA Model Shengjie Qian of Caiyun xiaoyi AI and BUPT | March 19th, 2019 | 0.525 | 0.544 | |
LM+Generator Alibaba Damo NLP | November 25th,2019 | 0.522 | 0.516 | |
Masque Q&A Style NTT Media Intelligence Laboratories [Nishida et al. '19] | January 3rd, 2019 | 0.522 | 0.437 | |
Deep Cascade QA Ming Yan of Alibaba Damo NLP [Yan et al. '18] | December 12th, 2018 | 0.520 | 0.546 | |
Unnamed anonymous | December 9th,2019 | 0.518 | 0.507 | |
PALM Alibaba Damo NLP | December 9th,2019 | 0.518 | 0.507 | |
VNET Baidu NLP [Wang et al. '18] | November 8th, 2018 | 0.516 | 0.543 | |
LNET S.L. Liu of NEUKG | April 8th, 2020 | 0.514 | 0.553 | |
MultiLM QnA Model anonymous | December 2nd, 2019 | 0.514 | 0.498 | |
LNETS.L. Liu of NEUKG | March 23rd,2020 | 0.506 | 0.542 | |
BERT Encoded T-NET Y. Zhang, C. Wang, X.L. Chen | July 12th, 2019 | 0.506 | 0.525 | |
MultiLM QnA Model anonymous | December 5th, 2019 | 0.499 | 0.430 | |
BERT+ Multi-Pointer-Generator Tongjun Li of the ColorfulClouds Tech and BUPT | June 11th, 2019 | 0.498 | 0.525 | |
Selector+Combine-Content-Generator NL Model Shengjie Qian of Caiyun xiaoyi AI and BUPT | March 11th, 2019 | 0.496 | 0.535 | |
REAG Anonymous | March 27th, 2020 | 0.495 | 0.500 | |
CompLM Alibaba Damo NLP | December 2nd, 2019 | 0.495 | 0.516 | |
LM+Generator anonymous | November 21st,2019 | 0.494 | 0.529 | |
PALM Alibaba Damo NLP | December 9th,2019 | 0.492 | 0.510 | |
anonymous anonymous | December 16th,2019 | 0.492 | 0.499 | |
LNET S.L. Liu of the NEUKG | Nov 19th, 2019 | 0.491 | 0.530 | |
BERT+ Multi-Pointer-Generator Tongjun Li of the ColorfulClouds Tech and BUPT | May 21st, 2019 | 0.491 | 0.520 | |
MUSST-NLG Anonymous | May 15th, 2020 | 0.490 | 0.516 | |
CompLM Alibaba Damo NLP | December 3rd, 2019 | 0.490 | 0.502 | |
Masque NLGEN Style NTT Media Intelligence Laboratories [Nishida et al. '19] | January 3rd, 2019 | 0.489 | 0.488 | |
roberta_T_tlcd_18k Anonymous | May 14th, 2020 | 0.483 | 0.516 | |
Communicating BERT Xuan Liang of RIDLL from the University of Technology Sydney | October 4th, 2019 | 0.483 | 0.506 | |
MDC-Generator Ssk-nlp | April 23rd, 2020 | 0.482 | 0.516 | |
MultiLM NLGen Model anonymous | December 2nd, 2019 | 0.482 | 0.514 | |
LM+Generator anonymous | November 19th,2019 | 0.478 | 0.481 | |
MultiLM NLGen Model anonymous | December 5th, 2019 | 0.475 | 0.479 | |
BERT + Transfer anonymous | October 16th, 2019 | 0.474 | 0.499 | |
Bert Based Multi-taskZhangY & WangC | June 26th, 2019 | 0.471 | 0.512 | |
T-RoBERTa-wf-BERTbaseA-120k Anonymous | February 13th, 2020 | 0.471 | 0.483 | |
BERT-SS-K1-100k Anonymous | January 26th, 2020 | 0.470 | 0.493 | |
T-RoBERTa-wf-BERTbaseA-80k Anonymous | February 21st, 2020 | 0.468 | 0.500 | |
Multi-passage QA Model SudaNLP | October 21st, 2020 | 0.466 | 0.508 | |
BERT-SS-K1-100k Anonymous | February 2nd, 2020 | 0.464 | 0.485 | |
BERT-RGLM Anonymous | April 22nd, 2020 | 0.457 | 0.479 | |
REAG Anonymous | May 28th, 2020 | 0.456 | 0.449 | |
SNET + CES2S Bo Shao of SYSU University | July 24th, 2018 | 0.450 | 0.464 | |
ranking+nlg anonymous | October 9th, 2019 | 0.449 | 0.468 | |
ranker-reader RCZoo of UCAS | May 15th, 2019 | 0.441 | 0.371 | |
Extraction-net zlsh80826 | October 20th, 2018 | 0.437 | 0.444 | |
SNET JY Zhao | August 30th, 2018 | 0.436 | 0.463 | |
BIDAF+ELMo+SofterMax Wang Changbao | November 16th, 2018 | 0.436 | 0.459 | |
ranking+nlg anonymous | August 12th, 2019 | 0.434 | 0.411 | |
DNET QA Geeks | August 1st, 2018 | 0.432 | 0.479 | |
T-RoBERTa-wf-BERTbaseA-120k Anonymous | February 13th, 2020 | 0.431 | 0.424 | |
KIGN-QA Chenliang Li | April 22nd, 2019 | 0.429 | 0.404 | |
MaRCo-da-GAAMA IBM Research AI Multilingual NLP Group | April 7th, 2020 | 0.426 | 0.462 | |
Reader-Writer Microsoft Business Applications Group AI Research | September 16th, 2018 | 0.421 | 0.436 | |
Masque2 (single / NLG Style) NTT Media Intelligence Laboratories | October 22nd, 2020 | 0.419 | 0.469 | |
BERT+Multi-Loss S.L. Liu of NEUKG | November 4th, 2019 | 0.413 | 0.422 | |
REAG(based on PALM)anonymous | June 1st,2020 | 0.410 | 0.430 | |
RGLM anonymous | May 5th, 2020 | 0.406 | 0.455 | |
SNET+seq2seq Yihan Ni of the CAS Key Lab of Web Data Science and Technology, ICT, CAS | June 1st, 2018 | 0.398 | 0.423 | |
SSK3+BERTBaseAnswerGenerator anonymous | Jan 21st, 2020 | 0.391 | 0.413 | |
MP-MRC BERT H.Y. Zhang | Aug 27th, 2020 | 0.389 | 0.410 | |
MP-MRC BERT-base H.Y. Zhang | Sep 4th, 2020 | 0.388 | 0.411 | |
MUSST anonymous | March 31st, 2020 | 0.376 | 0.405 | |
Anonymous anonymous | October 12th, 2020 | 0.359 | 0.409 | |
fj-net(single) yzm nlp group | August 3rd, 2020 | 0.343 | 0.409 | |
MNet-Base(Single) NLGEN fuii of iDW | July 8th, 2020 | 0.337 | 0.405 | |
fj-reader(single) yzm nlp group | July 28th, 2020 | 0.336 | 0.404 | |
Generation with latent retrieval per answer anonymous | May 11th, 2020 | 0.335 | 0.290 | |
MDCG-Base ssk-nlp | June 8th, 2020 | 0.334 | 0.398 | |
MUSST-NLG Anonymous | June 2nd, 2020 | 0.334 | 0.388 | |
MDCC-Base ssk-nlp | June 10th, 2020 | 0.333 | 0.400 | |
Generation with latent retrieval Baseline 2 anonymous | May 11th, 2020 | 0.331 | 0.307 | |
MDCC ssk-nlp | June 10th, 2020 | 0.328 | 0.391 | |
Generation with latent retrieval Baseline 1 anonymous | May 11th, 2020 | 0.305 | 0.275 | |
MultiTask+DataAug+Unlikelihood UvA | June 3rd, 2020 | 0.300 | 0.332 | |
MUSST-QA Anonymous | June 1st, 2020 | 0.298 | 0.354 | |
lightNLP+BiDAF Enliple AI | February 1st, 2019 | 0.298 | 0.156 | |
Pretrained seq2seq model BDEG | September 10th, 2020 | 0.290 | 0.331 | |
roberta_T_tlx_90k Anonymous | July 29th, 2020 | 0.286 | 0.327 | |
BIDAF+seq2seq Yihan Ni of the CAS Key Lab of Web Data Science and Technology, ICT, CAS | May 29th, 2018 | 0.276 | 0.288 | |
BiDaF Baseline(Implemented By MSMARCO Team) Allen Institute for AI & University of Washington [Seo et al. '16] |
April 23th, 2018 | 0.240 | 0.106 | |
TrioNLP + BiDAF Trio.AI of the CCNU | September 23rd, 2018 | 0.205 | 0.232 | |
BiDAF + LSTM Meefly | January 15th,2019 | 0.153 | 0.120 |
Rank | Model | Submission Date | Rouge-L | Bleu-1 |
---|---|---|---|---|
Human Performance | April 23th, 2018 | 0.632 | 0.530 | |
PALM Alibaba Damo NLP | December 16th,2019 | 0.498 | 0.499 | |
REAG Anonymous | March 27th, 2020 | 0.498 | 0.497 | |
Masque NLGEN Style NTT Media Intelligence Laboratories [Nishida et al. '19] | January 3rd, 2019 | 0.496 | 0.501 | |
CompLM Alibaba Damo NLP | December 3rd, 2019 | 0.496 | 0.489 | |
PALM Alibaba Damo NLP | December 9th,2019 | 0.496 | 0.484 | |
BERT+ Multi-Pointer-Generator Tongjun Li of the ColorfulClouds Tech and BUPT | June 11th,2019 | 0.495 | 0.476 | |
CompLM Alibaba Damo NLP | November 19th,2019 | 0.495 | 0.470 | |
CompLM Alibaba Damo NLP | December 2nd, 2019 | 0.493 | 0.475 | |
BERT+ Multi-Pointer-Generator Tongjun Li of the ColorfulClouds Tech and BUPT | May 21st,2019 | 0.491 | 0.474 | |
CompLM Alibaba Damo NLP | November 19th,2019 | 0.488 | 0.485 | |
roberta_T_tlcd_18k Anonymous | May 14th, 2020 | 0.487 | 0.468 | |
BERT+ Multi-Pointer-Generator Tongjun Li of the ColorfulClouds Tech and BUPT | March 26th,2019 | 0.487 | 0.465 | |
Selector+Combine-Content-Generator NLGEN Model Shengjie Qian of Caiyun xiaoyi AI and BUPT | March 11th, 2019 | 0.487 | 0.449 | |
VNET Baidu NLP [Wang et al. '18] | November 8th, 2018 | 0.484 | 0.468 | |
BERT+ Multi-Pointer-Generator (Single) Tongjun Li of the ColorfulClouds Tech and BUPT | March 19th,2019 | 0.484 | 0.459 | |
Communicating BERT Xuan Liang of RIDLL from the University of Technology Sydney | October 4th, 2019 | 0.483 | 0.472 | |
MultiLM NLGen Model anonymous | December 2nd, 2019 | 0.483 | 0.461 | |
ranking+nlg anonymous | October 9th, 2019 | 0.481 | 0.468 | |
MUSST-NLG Anonymous | May 15th, 2020 | 0.480 | 0.458 | |
MultiLM NLGen Model anonymous | December 5th, 2019 | 0.478 | 0.481 | |
BERT-RGLM Anonymous | April 22nd, 2020 | 0.470 | 0.452 | |
BERT-SS-K1-100k anonymous | January 26th, 2020 | 0.470 | 0.437 | |
MDC-Generator Ssk-nlp | April 23rd, 2020 | 0.466 | 0.446 | |
BERT-SS-K1-100k anonymous | February 2nd, 2020 | 0.465 | 0.427 | |
T-RoBERTa-wf-BERTbaseA-120k Anonymous | February 17th, 2020 | 0.464 | 0.420 | |
T-RoBERTa-wf-BERTbaseA-80k Anonymous | February 21st, 2020 | 0.463 | 0.438 | |
ranking+nlg anonymous | October 9th, 2019 | 0.462 | 0.451 | |
PM-MUG-1 anonymous | May 20th, 2020 | 0.453 | 0.441 | |
PM-MUG-2 anonymous | May 20th, 2020 | 0.452 | 0.449 | |
SNET + CES2S Bo Shao of SYSU University | July 24th, 2018 | 0.450 | 0.406 | |
MaRCo-da-GAAMA IBM Research AI Multilingual NLP Group | April 7th, 2020 | 0.448 | 0.402 | |
REAG(based on PALM)anonymous | June 1st,2020 | 0.447 | 0.444 | |
Masque2 (single / NLG Style) NTT Media Intelligence Laboratories | October 22nd, 2020 | 0.445 | 0.423 | |
KIGN-QA Chenliang Li | April 22nd, 2019 | 0.441 | 0.462 | |
Reader-Writer Microsoft Business Applications Group AI Research | September 16th, 2018 | 0.439 | 0.426 | |
ranking+nlg anonymous | August 12th, 2019 | 0.439 | 0.411 | |
RGLM anonymous | May 5th, 2020 | 0.435 | 0.413 | |
T-RoBERTa-wf-BERTbaseA-120k Anonymous | February 13th, 2020 | 0.427 | 0.364 | |
ConZNet Samsung Research [Indurthi et al. '18] | July 16th, 2018 | 0.421 | 0.386 | |
Anonymous Anonymous | November 21st, 2019 | 0.412 | 0.410 | |
Bayes QA Bin Bi of Alibaba NLP | June 14st, 2018 | 0.411 | 0.435 | |
Generation with latent retrieval per answer anonymous | May 11th, 2020 | 0.408 | 0.442 | |
Generation with latent retrieval Baseline 2 anonymous | May 11th, 2020 | 0.401 | 0.415 | |
SNET+seq2seq Yihan Ni of the CAS Key Lab of Web Data Science and Technology, ICT, CAS | June 1st, 2018 | 0.401 | 0.375 | |
MUSST anonymous | March 31, 2020 | 0.392 | 0.359 | |
SSK3+BERTBaseAnswerGenerator Anonymous | Jan 21st, 2020 | 0.384 | 0.356 | |
Generation with latent retrieval Baseline 1 anonymous | May 11th, 2020 | 0.382 | 0.416 | |
BPG-NET Zhijie Sang of the Center for Intelligence Science and Technology Research(CIST) of the Beijing University of Posts and Telecommunications (BUPT) | August 1st, 2018 | 0.382 | 0.347 | |
GUM anonymous from anonymous | September 4th, 2019 | 0.375 | 0.438 | |
MDCC-Base ssk-nlp | June 10th, 2020 | 0.358 | 0.362 | |
MDCG-Base ssk-nlp | June 8th, 2020 | 0.358 | 0.359 | |
fj-net(single) yzm nlp group | August 3rd, 2020 | 0.353 | 0.363 | |
Deep Cascade QA Ming Yan of Alibaba Damo NLP | October 25th, 2018 | 0.351 | 0.374 | |
MNet-Base(Single) NLGEN fuii of iDW | July 8th, 2020 | 0.350 | 0.354 | |
fj-reader(single) yzm nlp group | July 28th, 2020 | 0.350 | 0.350 | |
MDCC ssk-nlp | June 10th, 2020 | 0.349 | 0.350 | |
MUSST-NLG Anonymous | June 2nd, 2020 | 0.340 | 0.358 | |
AE + ReRanking + Bert Based Multi-task ZhangY & WangC | July 12th, 2019 | 0.331 | 0.376 | |
BERT Encoded T-Net Y. Zhang, C. Wang, X.L. Chen | August 5th, 2019 | 0.329 | 0.373 | |
MultiTask+DataAug+Unlikelihood UvA | June 3rd, 2020 | 0.327 | 0.347 | |
Multi-doc Enriched BERT Ming Yan of Alibaba Damo NLP | June 20th, 2019 | 0.325 | 0.377 | |
BIDAF+seq2seq Yihan Ni of the CAS Key Lab of Web Data Science and Technology, ICT, CAS | May 29th, 2018 | 0.322 | 0.283 | |
BERT Encoded T-Net Y. Zhang, C. Wang, X.L. Chen | July 12th, 2019 | 0.320 | 0.361 | |
Unnamed Anonymous | December 9th,2019 | 0.318 | 0.384 | |
roberta_T_tlx_90k Anonymous | July 29th, 2020 | 0.303 | 0.298 | |
Pretrained seq2seq model BDEG | September 10th, 2020 | 0.302 | 0.294 | |
LM+Generator anonymous | November 25th,2019 | 0.299 | 0.372 | |
LNET S.L. Liu of NEUKG | April 8th, 2020 | 0.294 | 0.352 | |
LNETS.L. Liu of NEUKG | March 23rd,2020 | 0.293 | 0.347 | |
Masque Q&A Style NTT Media Intelligence Laboratories [Nishida et al. '19] | January 3rd, 2019 | 0.285 | 0.399 | |
Bert Based Multi-taskZhangY & WangC | June 26th, 2019 | 0.284 | 0.349 | |
Selector+Combine-Content-Generator QA Model Shengjie Qian of Caiyun xiaoyi AI and BUPT | March 11th, 2019 | 0.281 | 0.337 | |
DNET QA Geeks | August 1st, 2018 | 0.275 | 0.332 | |
ranker-reader RCZoo of UCAS | May 15th, 2019 | 0.271 | 0.382 | |
BIDAF+ELMo+SofterMax Wang Changbao | November 16th, 2018 | 0.268 | 0.346 | |
BERT+Multi-Loss S.L. Liu of NEUKG | November 4th, 2019 | 0.266 | 0.422 | |
LNET S.L. Liu of the NEUKG | Nov 19th, 2019 | 0.266 | 0.339 | |
MultiLM QnA Model anonymous | December 2nd, 2019 | 0.266 | 0.340 | |
MultiLM NLGen Model anonymous | December 5th, 2019 | 0.257 | 0.360 | |
REAG Anonymous | May 28th, 2020 | 0.247 | 0.328 | |
Multi-passage QA Model SudaNLP | October 21st, 2020 | 0.247 | 0.323 | |
Extraction-net zlsh80826 | August 14th, 2018 | 0.247 | 0.321 | |
SNET JY Zhao | May 29th, 2018 | 0.247 | 0.308 | |
MP-MRC BERT H.Y. Zhang | Aug 27th, 2020 | 0.211 | 0.258 | |
MP-MRC BERT-base H.Y. Zhang | Sep 4th, 2020 | 0.211 | 0.258 | |
lightNLP+BiDAF Enliple AI | February 1st, 2019 | 0.210 | 0.108 | |
Anonymous anonymous | October 12th, 2020 | 0.195 | 0.280 | |
MUSST-QA Anonymous | June 1st, 2020 | 0.187 | 0.285 | |
BiDaF Baseline(Implemented By MSMARCO Team) Allen Institute for AI & University of Washington [Seo et al. '16] |
April 23th, 2018 | 0.169 | 0.093 | |
TrioNLP + BiDAF Trio.AI of the CCNU | September 23rd, 2018 | 0.142 | 0.160 | |
BiDAF + LSTM Meefly | January 15th,2019 | 0.119 | 0.173 |
Rank | Model | Submission Date | Rouge-L | Bleu-1 |
---|---|---|---|---|
MARS YUANFUDAO research NLP | March 26th, 2018 | 0.497 | 0.480 | |
Human Performance |
December 2016 | 0.470 | 0.460 | |
V-Net Baidu NLP [Wang et al '18] | February 15th, 2018 | 0.462 | 0.445 | |
S-Net Microsoft AI and Research [Tan et al. '17] | June 2017 | 0.452 | 0.438 | |
R-Net Microsoft AI and Research [Wei et al. '16] | May 2017 | 0.429 | 0.422 | |
HieAttnNet Akaitsuki | March 26th, 2018 | 0.423 | 0.448 | |
BiAttentionFlow+ ShanghaiTech University GeekPie_HPC team | March 11th, 2018 | 0.415 | 0.381 | |
ReasoNet Microsoft AI and Research [Shen et al. '16] | April 28th, 2017 | 0.388 | 0.399 | |
Prediction Singapore Management University [Wang et al. '16] | March 2017 | 0.373 | 0.407 | |
FastQA_Ext DFKI German Research Center for AI [Weissenborn et al. '17] | March 2017 | 0.337 | 0.339 | |
FastQA DFKI German Research Center for AI [Weissenborn et al. '17] | March 2017 | 0.321 | 0.340 | |
Flypaper Model ZhengZhou University | March 14th, 2018 | 0.317 | 0.342 | |
DCNMarcoNet Flying Riddlers @ Carnegie Mellon University | March 31st, 2018 | 0.313 | 0.238 | |
BiDaF Baseline for V2 (Implemented By MSMARCO Team) |
April 23th, 2018 | 0.268 | 0.129 | |
ReasoNet Baseline rained on SQuAd, Microsoft AI & Research [Shen et al. '16] | December 2016 | 0.192 | 0.148 |
Data associated with the WebConf 2020's paper Leading Conversational Search by Suggesting Useful Questions
Truly Conversational Search is the next logic step in the journey to generate intelligent and useful AI. To understand what this may mean, researchers have voiced a continuous desire to study how people currently converse with search engines. As a result we have released a large corpus of anonymized user search sessions.
We hope the community can use this corpus to explore what conversations with search engines look like.
The dataset used for Optimal Freshness Crawl Under Politeness Constraints and Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling which are both focused on providing an optimal crawling schedule for a search engine based ont he changing nature of the internet.
There is currently no public task associated with this dataset
TLDR: We are closing the MSMARCO QnA and NLGEN Leaderboard. Last Submissions 10/23.Dear NLP community and Question Answering enthusiasts,When we released MSMARCO v2 back in March of 2018 we did not expect how much love this dataset would receive from the community. Needless to say we have been humbled by not only the number of submissions to the leaderboard but also all the remarkable research that incorporated this dataset as part of their benchmarking efforts. While we originally envisioned that this dataset will be useful to the NLP and QnA communities, we were again humbled by how the dataset was adopted and evolved by the IR community for document and passage retrieval tasks. However, as you may have guessed, maintaining a public resource like the MS MARCO leaderboard takes significant time and effort and we are grateful to our small but dedicated team of volunteers that maintain this website. As we look forward to the future, we believe that given the small size of this team and the limited resources, it is time to refocus our energy and time on the scenarios where MS MARCO can provide the most value to the research community moving forward. Towards that goal, we have made the hard (but what we believe is the right) decision to retire the Question answering and Language Generation leaderboard. Both tasks have not made large leaps in quality in the last year and we want to refocus our efforts on the document and passage retrieval tasks where the engagement with the research communities are actively growing in the present. As a result, the last day for any submissions to the MSMARCO Question Answering, Natural Language Generation, and KeyPhrase Extraction leaderboards is October 23, 2020. Submissions to both the document and passage retrieval leaderboards will continue as usual. We will continue to host all the datasets (including those specifically for the tasks being retired), as we believe they can still serve as valuable resources for future research. We want to again thank all the participants for their submissions and support for MS MARCO and we hope to see the community around the IR tasks continue to grow more in the future. We are always listening for feedback, so please continue to send us your suggestions and requests.SincerelyThe MS MARCO Team
10.23.2020Task Retirement
1. Retire QnA V2 Task 2. Retire NLGEN V2 Task 3. Retire OpenKP Task08.11.2020New Task
1. Released Document Ranking task and 3 baselines.07.30.2020New Data
1. Released ORCAS Click data02.11.2020:New Data
1. Released Usefulness Data10.22.2019:New Datasets
1. Released OpenKP Keyphrase Extraction dataset! 2. Released Optimal Crawling Dataset!05.06.2019:Fixed Encoding issues with Ranking Dataset
1. Updated various encoding issues in ranking dataset.04.23.2019:We have released a conversational search dataset
1. Brand new conversational search Dataset10.26.2018:We have released a new ranking dataset based on the v2.1 dataset
1. Brand new Ranking Dataset04.23.2018:We have released an updated to the dataset. V2.1 Includes the following:
1. Over 1 million queries
2. ~182k Well Formed Answers
3. Query type is now included for every query.
4. Bias in Evaluation set fixed(a small portion of answers for the V2.0 Evaluation set were able to be found in the v1.1 set and the v2.0 well formed sets, these have been removed from eval and added to train).
5. Utilities and Readme now availible.
03.01.2018:We have released an updated to the dataset. V2.0 Includes the following:
1. ~900,000 unique queries
2. ~160k Well Formed Answers
01.30.2017:We have released an update to the dataset! V1.1 contains the follwing:
1. Improvments to dataset and evaluation scripts
12.01.2016:We have released our dataset! V1.0 contains the follwing:
1. 100,000 unique query answer pairs
Once you have built a model that meets your expectations on evaluation with the dev set, you can submit your test results to get official evaluation on the test set. To ensure the integrity of the official test results, we do not release the correct answers for test set to the public.
To submit your model for official evaluation on the test set for the document ranking task, follow the instructions here.
To submit your model for official evaluation on the test set for other tasks, follow the below steps:
Your email should include
To avoid "P-hacking" we discourage too many submissions from the same group in a short period of time.
Microsoft Machine Reading Comprehension (MS MARCO) is a collection of large scale datasets for deep learning related to Search. In MS MARCO, all questions are sampled from real anonymized user queries. The context passages, from which answers in the dataset are derived, are extracted from real web documents using the most advanced version of the Bing search engine. The answers to the queries are human generated if they could summarize the answer.