Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.070507
Special Issues
Table of Content

Open Access

REVIEW

Transforming Healthcare with State-of-the-Art Medical-LLMs: A Comprehensive Evaluation of Current Advances Using Benchmarking Framework

Himadri Nath Saha1, Dipanwita Chakraborty Bhattacharya2,*, Sancharita Dutta3, Arnab Bera3, Srutorshi Basuray4, Satyasaran Changdar5, Saptarshi Banerjee6, Jon Turdiev7
1 Department of Computer Science, SNEC, University of Calcutta, Kolkata, 700073, India
2 Department of Computer Science, PRTGC, West Bengal State University, Barasat, 700126, India
3 Department of Computer Science & Engineering, The Neotia University, Kolkata, 743368, India
4 Department of Computer Science & Engineering, University College of Science and Technology, University of Calcutta, Kolkata, 700009, India
5 Department of Food Science, University of Copenhagen, Copenhagen, 1165, Denmark
6 Department of Computer Science, Illinois Institute of Technology, 10 West 35th Street, Chicago, IL 60616, USA
7 Department of Computer Science, San Francisco State University, 1600 Holloway Avenue, San Francisco, CA 94132, USA
* Corresponding Author: Dipanwita Chakraborty Bhattacharya. Email: dcb.wbes@gmail.com

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.070507

Received 17 July 2025; Accepted 16 September 2025; Published online 25 November 2025

Abstract

The emergence of Medical Large Language Models has significantly transformed healthcare. Medical Large Language Models (Med-LLMs) serve as transformative tools that enhance clinical practice through applications in decision support, documentation, and diagnostics. This evaluation examines the performance of leading Med-LLMs, including GPT-4Med, Med-PaLM, MEDITRON, PubMedGPT, and MedAlpaca, across diverse medical datasets. It provides graphical comparisons of their effectiveness in distinct healthcare domains. The study introduces a domain-specific categorization system that aligns these models with optimal applications in clinical decision-making, documentation, drug discovery, research, patient interaction, and public health. The paper addresses deployment challenges of Medical-LLMs, emphasizing trustworthiness and explainability as essential requirements for healthcare AI. It presents current evaluation techniques that improve model transparency in high-stakes medical contexts and analyzes regulatory frameworks using benchmarking datasets such as MedQA, MedMCQA, PubMedQA, and MIMIC. By identifying ongoing challenges in bias mitigation, reliability, and ethical compliance, this work serves as a resource for selecting appropriate Med-LLMs and outlines future directions in the field. This analysis offers a roadmap for developing Med-LLMs that balance technological innovation with the trust and transparency required for clinical integration, a perspective often overlooked in existing literature.

Keywords

Medical large language models (Med-LLM); AI in healthcare; natural language processing (NLP) in medicine; fine-tuning medical LLMs; retrieval-augmented generation (RAG) in medicine; multi-modal learning in healthcare; explainability and transparency in medical AI; FDA regulations for AI in medicine; evaluation and benchmarking of medical large language models
  • 944

    View

  • 823

    Download

  • 0

    Like

Share Link