The Overall score is calculated as the macro-average performance over tasks. Details can be found within our publication.

Rank Model BLURB Score
(Macro Avg.)
Micro Avg. NER PICO RE SS Class. QA
1 BioLinkBERT-Large — Stanford
Github icon
84.30 84.80 86.89 74.19 82.74 93.63 84.88 83.50
2 BioM-ELECTRA-Large — University of Delaware
Github icon
83.81 84.67 86.88 73.67 83.17 91.09 84.03 84.00
3 BioLinkBERT-Base — Stanford
Github icon
83.39 83.84 86.39 73.97 81.56 93.27 84.35 80.81
4 PubMedBERT-LARGE (fine-tuning stabilization; uncased; abstracts) — Microsoft Research
Github icon
82.91 83.58 86.28 73.61 81.77 92.73 82.70 80.37
5 PubMedBERT (fine-tuning stabilization; uncased; abstracts) — Microsoft Research
Github icon
82.75 83.24 86.17 73.45 81.53 94.49 83.02 77.86
6 BioELECTRA-Base — Saama Research
Github icon
82.60 83.19 86.67 74.13 81.44 92.76 84.20 76.38
7 PubMedBERT (uncased; abstracts + full text) — Microsoft Research
Github icon
81.50 82.18 86.13 73.72 80.59 92.31 82.62 73.61
8 PubMedBERT (uncased; abstracts) — Microsoft Research
Github icon
81.16 81.95 86.08 73.38 81.19 92.30 82.32 71.70
9 BioBERT (cased)
Github icon
80.34 81.31 85.81 73.18 79.79 89.52 81.54 72.19
10 Scibert (uncased)
Github icon
78.86 80.16 85.43 73.12 79.56 86.25 80.66 68.12
11 Scibert (cased)
Github icon
78.14 79.38 85.47 73.06 79.19 87.15 81.16 62.81
12 ClinicalBERT (cased)
Github icon
77.29 77.87 83.99 72.06 76.91 91.23 80.74 58.79
13 RoBERTa (cased)
Github icon
76.46 77.74 83.09 73.02 77.71 81.25 79.66 64.02
14 BlueBERT (cased)
Github icon
76.27 77.42 84.50 72.54 76.13 85.38 80.48 58.57
15 BERT base (uncased)
Github icon
76.11 77.27 82.99 72.34 77.44 82.68 80.20 60.99
16 BERT base (cased)
Github icon
75.86 77.13 82.90 71.70 76.83 81.40 80.12 62.20
17 Unicoder base (multilingual)
Github icon
73.60 76.45 83.99 73.28 76.88 75.97 68.96 62.52