msmarco

Submit to Document and Passage Ranking Leadboards

Datasets

Please go to https://microsoft.github.io/msmarco/Datasets for all dataset descriptions and pointers.

File format

For both tasks, please prepare the test results file in the following TAB-separated (TSV) format: qid<TAB>pid<TAB>rank.

1124703    8766037    1
1124703    8021997    2
1124703    7816201    3
1124703    8296123    4
1124703    8790898    5
1124703    5451590    6
1124703    8021999    7
1124703    8388210    8
1124703    8702520    9
1124703    8790903    10

We report MRR@10 for both tasks. Therefore, to minimize the size of your test results file, please free to only inclde the top 10 results per query.

Evaluation script

The official evaluation script for the two tasks are available at the below locations:

Submission process

Once you have built a model that meets your expectations on evaluation with the dev set, you can submit your test results to get official evaluation on the test set. To ensure the integrity of the official test results, we do not release the correct answers for test set to the public.

To submit your model for official evaluation on the test set, follow the steps corresponding to the appropriate task:

Document ranking

For the document ranking task, we follow a GitHub pull request based submission process. Please find the submission guidelines for the document ranking task here: https://microsoft.github.io/MSMARCO-Document-Ranking-Submissions/.

Passage ranking

For the passage ranking task, we follow a GitHub pull request based submission process. Please find the submission guidelines for the passage ranking task here: https://microsoft.github.io/MSMARCO-Passage-Ranking-Submissions/.

Terms and Conditions

The MS MARCO and ORCAS datasets are intended for non-commercial research purposes only to promote advancement in the field of artificial intelligence and related areas, and is made available free of charge without extending any license or other intellectual property rights. The datasets are provided “as is” without warranty and usage of the data has risks since we may not own the underlying rights in the documents. We are not be liable for any damages related to use of the dataset. Feedback is voluntarily given and can be used as we see fit. By using any of these datasets you are automatically agreeing to abide by these terms and conditions. Upon violation of any of these terms, your rights to use the dataset will end automatically.

Please contact us at ms-marco@microsoft.com if you own any of the documents made available but do not want them in this dataset. We will remove the data accordingly. If you have questions about use of the dataset or any research outputs in your products or services, we encourage you to undertake your own independent legal review. For other questions, please feel free to contact us.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Microsoft and any contributors grant you a license to the Microsoft documentation and other content in this repository under the Creative Commons Attribution 4.0 International Public License, see the LICENSE file, and grant you a license to any code in the repository under the MIT License, see the LICENSE-CODE file.

Microsoft licenses the MS MARCO Mark “as-is” and makes no express or implied representations or warranties regarding non-infringement. You must remove all uses of the Mark immediately upon request from Microsoft.

Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft’s general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.

Privacy information can be found at https://privacy.microsoft.com/en-us/.

Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.