Open Access
ARTICLE
Image Enhancement Combined with LLM Collaboration for Low-Contrast Image Character Recognition
1 School of Intelligent Manufacturing and Control Engineering, Shanghai Polytechnic University, Shanghai, 201209, China
2 School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW 2052, Australia
* Corresponding Author: Xuan Jiang. Email:
Computers, Materials & Continua 2025, 85(3), 4849-4867. https://doi.org/10.32604/cmc.2025.067919
Received 16 May 2025; Accepted 21 July 2025; Issue published 23 October 2025
Abstract
The effectiveness of industrial character recognition on cast steel is often compromised by factors such as corrosion, surface defects, and low contrast, which hinder the extraction of reliable visual information. The problem is further compounded by the scarcity of large-scale annotated datasets and complex noise patterns in real-world factory environments. This makes conventional OCR techniques and standard deep learning models unreliable. To address these limitations, this study proposes a unified framework that integrates adaptive image preprocessing with collaborative reasoning among LLMs. A Biorthogonal 4.4 (bior4.4) wavelet transform is adaptively tuned using DE to enhance character edge clarity, suppress background noise, and retain morphological structure, thereby improving input quality for subsequent recognition. A structured three-round debate mechanism is further introduced within a multi-agent architecture, employing GPT-4o and Gemini-2.0-flash as role-specialized agents to perform complementary inference and achieve consensus. The proposed system is evaluated on a proprietary dataset of 48 high-resolution images collected under diverse industrial conditions. Experimental results show that the combination of DE-based enhancement and multi-agent collaboration consistently outperforms traditional baselines and ablated models, achieving an F1-score of 94.93% and an LCS accuracy of 93.30%. These results demonstrate the effectiveness of integrating signal processing with multi-agent LLM reasoning to achieve robust and interpretable OCR in visually complex and data-scarce industrial environments.Keywords
Cite This Article
Copyright © 2025 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools