Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation

Aug 1, 2024ยท
Xi Zhang
Xi Zhang
,
Zaiqiao Meng
,
Jake Lever
,
Edmond S.L. Ho
ยท 0 min read
Abstract
This paper introduces a radiology-focused visual language model designed to generate radiology reports from chest X-rays. Building on previous findings that large language models can acquire multimodal capabilities when aligned with pretrained vision encoders, we demonstrate similar potential with chest X-ray images. The model combines an image encoder (CLIP) with a fine-tuned large language model (LLM) based on the Vicuna-7B architecture. The training process involves a two-stage approach: initial alignment of chest X-ray features with the LLM, followed by fine-tuning for radiology report generation. The study highlights the importance of generating both FINDINGS and IMPRESSIONS sections in radiology reports and evaluates the model`s performance using various metrics, achieving notable accuracy in generating high-quality medical reports. The research also addresses the need for domain-specific fine-tuning to capture the intricate details necessary for accurate medical interpretations and reports.
Type
Publication
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Xi Zhang
Authors
CS PhD Candidate @ AI4BioMed Lab
Interested in AI4Health & AI Agent, specialising in the development of steerable radiology vision-language systems.