BiomedParse

A biomedical foundation model for image parsing of everything everywhere all at once

1Microsoft Research, 2Providence Genomics, 3Paul G. Allen School of Computer Science and Engineering, University of Washington
* Equal Contribution Main technical contribution Project lead Corresponding authors § Lead contact

Everything

BiomedParse performs segmentation for organs, abnormalities and cells, accurately following user's prompts. Without any image specific guidance like bounding box or points, BiomedParse outperforms state-of-the-art bounding box methods with text prompts only, across 9 biomedical imaging modalities.

BiomedParse Demo

Everywhere

BiomedParse detects the specific object of interest, and locate it at pixel-level precision, even for objects with irregular shapes. By effectively identifying text prompts describing object that does not exist in the image, BiomedParse is capable of object detection in an end-to-end manner.

Advanced Detection Demo

All at Once

Tired of typing prompts for every objects? BiomedParse can do object recognition all at once. Having learned 82 object types, BiomedParse can automatically identify all objects in a given image along with their semantic types, and simultaneously segment and label all biomedical objects of interests.

BiomedParse Demo

One model, 9 imaging modalities


"COVID-19 infection in chest CT"

"Glandular structure in colon Pathology"

"Neoplastic polyp in colon Endoscope"

"Lower-grade glioma in brain MRI"

"Malignant tumor in breast Ultrasound"

"Melanoma in skin Dermoscopy"

"COVID-19 infection in chest X-Ray"

"Optic disc in retinal Fundus"

"Cystoid macular edema in retinal OCT"

Abstract

Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. Holistic image analysis comprises interdependent subtasks such as segmentation, detection, and recognition of relevant objects. Traditionally, these tasks are tackled separately. For example, there have been a lot of works focusing on segmentation alone, completely ignoring key semantic information in downstream tasks of detection and recognition. In contrast, image parsing is a unifying framework that jointly pursues these tasks by leveraging their interdependencies such as the semantic label of a segmented object. Here, we propose BiomedParse, a biomedical foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition for 82 object types across 9 imaging modalities. Through joint learning, we can improve accuracy for individual tasks and enable novel applications such as segmenting all relevant objects in an image through a text prompt, rather than requiring users to laboriously specify the bounding box for each object. Interestingly, we can train BiomedParse using no more than standard segmentation datasets. The key is to leverage readily available natural-language labels or descriptions accompanying those datasets and use GPT-4 to harmonize the noisy, unstructured text information with established biomedical object ontologies. We created a large dataset comprising over six million triples of image, segmentation mask, and textual description. On image segmentation, we showed that BiomedParse is broadly applicable, outperforming state-of-the-art methods on 102,855 test image-mask-label triples across 9 imaging modalities (everything). BiomedParse is also able to identify invalid user inputs describing objects that do not exist in the image. On object detection, which aims to locate a specific object of interest, BiomedParse again attained state-of-the-art performance, especially on objects with irregular shapes (everywhere). On object recognition, which aims to identify all objects in a given image along with their semantic types, we showed that BiomedParse can simultaneously segment and label all biomedical objects in an image (all at once). In summary, BiomedParse is an all-in-one tool for biomedical image analysis by jointly solving segmentation, detection, and recognition. It is broadly applicable to all major biomedical image modalities, paving the path for efficient and accurate image-based biomedical discovery.