Datasets

We provide instructions together with sample scripts for downloading and preprocessing the BLURB’s constituent datasets. You can download them from here.

Submission Instructions

Once you have built a model that meets your expectations, generate your test set predictions on any or all of the BLURB datasets. Follow these steps to submit your model’s performance for the leaderboard:

  1. For each task:
    1. Generate your model’s predictions.
    2. Save your test predictions to a JSON file.
    3. Verify that your JSON file matches the Universal Schema or the appropriate task-specific schema:
  2. Zip the JSON files that you would like to submit for evaluation.
  3. Email us at blurb@microsoft.com along with the following information:
    • (Required) Attach your zip file.
    • (Required) Team Name. Comma-separated (e.g. “PubMedBERT Team”, “Ada Lovelace, Charles Babbage” or “MIT, Microsoft Research”)
    • (Required) Model Name. (e.g. “SuperBERT”)
    • (Optional) Model Repository URL. (e.g. “https://www.github.com/…”)
    • (Optional) Publication URL. (e.g. “https://arxiv.org/…”)
  4. For those who wish to publish their work, please cite the BLURB publication. We also recommend adding this bibtex file, which contains an entry for BLURB and individual entries for BLURB’s constituent datasets.

Questions? Contact us at blurb@microsoft.com

Disclaimer

By making a submission to the Biomedical Language Understanding and Reasoning Benchmark (BLURB), you agree to have your submission scored and published on the leaderboard. We will retain your submission data for as long as we publish this leaderboard. We may contact you at the contact email provided to resolve any issues as we update the leaderboard. Please contact us at blurb@microsoft.com for any questions or if you want your submission removed.

Terms of Use

You also agree that your participation in this challenge is governed by the Microsoft Terms of Use and the additional terms set out below (collectively, the “Terms of Use”). Your submission and participation indicates your acceptance of the Terms of Use, and the Microsoft Privacy Statement.

The primary BLURB tasks are built on existing datasets. Access to these datasets is intended for non-commercial research purposes only to promote advancement in the field of artificial intelligence and related areas. You are responsible for complying with any terms and conditions required by the independent databases who are storing these datasets. These databases may present you with a privacy policy or require you to accept their terms before acquiring, using, requesting, or downloading any content. Any third party terms are in addition to and do not modify any of these Terms of Use. You are responsible for your dealings with third parties. Microsoft is not responsible or liable to you or others for information or services provided by any third-parties. This Submission Form is being made available “as is” and without any warranties. Microsoft is not responsible for and is not liable to you for any damages related to your use of the datasets, or for the performance, accuracy or results listed in the leaderboard.