Truth-Anchored Evidence-Sensitive Training for Multimodal Radiology LLMs via Dual-Extractor Disagreement and Deterministic Counterfactual Constraints

Xiong Luo^*
Department of Information Technology, Uppsala University, Uppsala, Sweden
* Corresponding Author: Xiong Luo. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.081416

Received 02 March 2026; Accepted 13 April 2026; Published online 14 May 2026

Download PDF

Abstract

Large multimodal models (LMMs) can produce fluent radiology reports, yet two clinically important error modes remain common: unsupported assertions and missed findings. Optimizing both under open supervision remains difficult because many pipelines still rely on overlapping parser families during training and evaluation. This paper introduces Truth-Anchored Dual-Extractor Counterfactual-Constrained Training (TA-DECT), which combines an ontology-derived atomic finding interface with four coupled objectives: structured prediction, dual-extractor minimax consistency on generated reports, deterministic counterfactual selectivity under evidence removal, and label-anchored completeness. In matched-path internal comparisons across chest radiographs (CheXpert, MIMIC-CXR, MIMIC-CXR-JPG) and chest computed tomography (CT; CT-RATE), TA-DECT improves truth-anchored F1 while reducing both missed-finding and unsupported-assertion rates, with concurrent gains in calibration and selectivity. On held-out region-of-interest (ROI) datasets (MS-CXR, VinDr-CXR), it also improves coarse evidence linkage and intervention-targeted confidence responses under occlusion. In this revision, the strongest claims are kept explicitly anchored to structured labels and ROI references, counterfactual evidence-sensitivity summaries are interpreted with bootstrap uncertainty, and parser-derived report metrics are retained only as supplementary diagnostics.

Keywords

Multimodal radiology; large language models; report generation; counterfactual training; evidence grounding; structured labels

Downloads
- Full-Text PDF
Citation Tools
- BibTex
- EndNote
- RIS

213

View
35

Download
0

Like

Trends in Event Understanding and Caption Generation/Reconstruction in Dense Video: A Review
Ekanayake Mudiyanselage Chulabhaya...
Enhancing Relational Triple Extraction in Specific Domains: Semantic Enhancement and Synergy of Large Language Models and Small Pre-Trained Language Models
Jiakai Li, Jianpeng Hu, Geng Zhang
LKPNR: Large Language Models and Knowledge Graph for Personalized News Recommendation Framework
Hao Chen, Runfeng Xie, Xiangyang...
Evolution and Prospects of Foundation Models: From Large Language Models to Large Multimodal Models
Zheyi Chen, Liuchang Xu, Hongting...
Improving Machine Translation Formality with Large Language Models
Murun Yang, Fuxue Li

All issues

Online First

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Truth-Anchored Evidence-Sensitive Training for Multimodal Radiology LLMs via Dual-Extractor Disagreement and Deterministic Counterfactual Constraints

Abstract

Keywords

213

35

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link