BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys

Abstract

In this paper, we present BiomedJourney, a novel method for counterfactual medical image generation by instruction-learning from multimodal patient journeys. Given a patient with two medical images taken at different time points, we use GPT-4 to process the corresponding imaging reports and generate a natural language description of disease progression. The resulting triples (prior image, progression description, new image) are then used to train a latent diffusion model for counterfactual medical image generation. Given the relative scarcity of image time series data, we introduce a two-stage curriculum that first pretrains the denoising network using the much more abundant single image-report pairs (with dummy prior image), and then continues training using the counterfactual triples. Experiments using the standard MIMIC-CXR dataset demonstrate the promise of our method.

In a comprehensive battery of tests on counterfactual medical image generation, BiomedJourney substantially outperforms prior state-of-the-art methods in instruction image editing and medical image generation such as InstructPix2Pix and RoentGen. To facilitate future study in counterfactual medical generation, we plan to release our instruction-learning code and pretrained models.

Patient Journey Emulation

BiomedJourney generate counterfactual medical images. (Left) The evolution dynamics of the generated medical images using BiomedJourney. (Right) Interactive emulation of progression of pleural effusion.

Demo

Starting from a patient's real Chest X-ray image, our BiomedJourney model can precisely follow the progression instructions to generate outcome images.

Select an instruction for counterfactual image generation:

Side-by-Side Comparison

From the patient's prior image to the BiomedJourney image, see them side by side.

"resolved pleural effusion"

" enlarged cardiac silhouette"

"right upper lobe airspace opacities"

Related Work

Gifsplanation via Latent Shift: A Simple Autoencoder Approach to Counterfactual Generation for Chest X-raysJoseph Paul Cohen, Rupert Brooks, Sovann En, Evan Zucker, Anuj Pareek, Matthew P. Lungren, Akshay Chaudhari.
Domain-Specific Pretraining for Biomedical Vision-Language Processing. Sheng Zhang, Yanbo Xu, Naoto Usuyama, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Matthew P. Lungren, Tristan Naumann, Hoifung Poon.
RoentGen: Vision-Language Foundation Model for Chest X-ray Generation. Pierre Chambon, Christian Bluethgen, Jean-Benoit Delbrouck, Rogier Van der Sluijs, Małgorzata Połacin, Juan Manuel Zambrano Chaves, Tanishq Mathew Abraham, Shivanshu Purohit, Curtis P. Langlotz, Akshay Chaudhari.
InstructPix2Pix: Learning to Follow Image Editing Instructions. Tim Brooks, Aleksander Holynski, Alexei A. Efros.

BibTeX

@article{gu2023biomedjourney,
      title={BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys},
      author={Yu Gu and Jianwei Yang and Naoto Usuyama and Chunyuan Li and Sheng Zhang and Matthew P. Lungren and Jianfeng Gao and Hoifung Poon},
      journal={arXiv preprint arXiv:2310.10765},
      year={2023},
}