The Overall score is calculated as the macro-average performance over tasks. Details can be found within our publication.

Rank Model BLURB Score
(Macro Avg.)
Micro Avg. NER PICO RE SS Class. QA
1 BioELECTRA-Base — Saama Research
Github icon
82.60 83.19 86.67 74.13 81.44 92.76 84.20 76.38
2 PubMedBERT (uncased; abstracts + full text) — Microsoft Research
Github icon
81.50 82.18 86.13 73.72 80.59 92.31 82.62 73.61
3 PubMedBERT (uncased; abstracts) — Microsoft Research
Github icon
81.16 81.95 86.08 73.38 81.19 92.30 82.32 71.70
4 BioBERT (cased)
Github icon
80.34 81.31 85.81 73.18 79.79 89.52 81.54 72.19
5 Scibert (uncased)
Github icon
78.86 80.16 85.43 73.12 79.56 86.25 80.66 68.12
6 Scibert (cased)
Github icon
78.14 79.38 85.47 73.06 79.19 87.15 81.16 62.81
7 ClinicalBERT (cased)
Github icon
77.29 77.87 83.99 72.06 76.91 91.23 80.74 58.79
8 RoBERTa (cased)
Github icon
76.46 77.74 83.09 73.02 77.71 81.25 79.66 64.02
9 BlueBERT (cased)
Github icon
76.27 77.42 84.50 72.54 76.13 85.38 80.48 58.57
10 BERT base (uncased)
Github icon
76.11 77.27 82.99 72.34 77.44 82.68 80.20 60.99
11 BERT base (cased)
Github icon
75.86 77.13 82.90 71.70 76.83 81.40 80.12 62.20
12 Unicoder base (multilingual)
Github icon
73.60 76.45 83.99 73.28 76.88 75.97 68.96 62.52