In this paper, we present BiomedJourney, a novel method for counterfactual medical image generation by instruction-learning from multimodal patient journeys. Given a patient with two medical images taken at different time points, we use GPT-4 to process the corresponding imaging reports and generate a natural language description of disease progression. The resulting triples (prior image, progression description, new image) are then used to train a latent diffusion model for counterfactual medical image generation. Given the relative scarcity of image time series data, we introduce a two-stage curriculum that first pretrains the denoising network using the much more abundant single image-report pairs (with dummy prior image), and then continues training using the counterfactual triples. Experiments using the standard MIMIC-CXR dataset demonstrate the promise of our method.
In a comprehensive battery of tests on counterfactual medical image generation, BiomedJourney substantially outperforms prior state-of-the-art methods in instruction image editing and medical image generation such as InstructPix2Pix and RoentGen. To facilitate future study in counterfactual medical generation, we plan to release our instruction-learning code and pretrained models.
Starting from a patient's real Chest X-ray image, our BiomedJourney model can precisely follow the progression instructions to generate outcome images.
From the patient's prior image to the BiomedJourney image, see them side by side.
"resolved pleural effusion"
" enlarged cardiac silhouette"
"right upper lobe airspace opacities"
@article{gu2023biomedjourney,
title={BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys},
author={Yu Gu and Jianwei Yang and Naoto Usuyama and Chunyuan Li and Sheng Zhang and Matthew P. Lungren and Jianfeng Gao and Hoifung Poon},
journal={arXiv preprint arXiv:2310.10765},
year={2023},
}