Open Access
ARTICLE
A Hybrid Approach for Query-Based Data Extraction Using Ensemble BERT Model with Walrus Optimization Algorithm
1 Department of Computer Science and Engineering (Data Science), Vignan’s Institute of Management and Technology for Women, Hyderabad, India
2 Department of Computer Science and Engineering, Prasad V Potluri Siddhartha Institute of Technology, Kanuru, India
3 Department of Computer Science and Engineering, AVN Institute of Engineering and Technology, Hyderabad, India
4 Department of Computer Science and Engineering (Data Science), AVN Institute of Engineering and Technology, Hyderabad, India
5 Department of Computer Science and Engineering (AI & ML), BVRIT Hyderabad College of Engineering for Women, Hyderabad, India
6 Department of Computer Science & Information Technology, Koneru Lakshmaiah Education Foundation Deemed to be University, Green Fields, Vaddeswaram, India
* Corresponding Author: Uddagiri Sirisha. Email:
Computers, Materials & Continua 2026, 88(2), 66 https://doi.org/10.32604/cmc.2026.078511
Received 01 January 2026; Accepted 20 April 2026; Issue published 15 June 2026
Abstract
The growing volume of digital text complicates the extraction of relevant information from unstructured data. Transformer models such as BERT, ALBERT, and RoBERTa are powerful, but they may face challenges in hyperparameter optimization and adaptation to new domains. To address this issue, a hybrid ensemble BERT model is suggested, optimized using the Walrus Optimization Algorithm (WaOA). The framework applies PCA to reduce dimensionality, ontology normalization, and K-means clustering to improve semantic comprehension. Experimental results on the SQuAD 2.0 and MS MARCO datasets show that the proposed model outperforms the baseline models. WaOA (Weighted Average of Attention) can improve convergence, reduce training time, and enhance prediction accuracy. The model also improves the semantic relevance of the extracted information. Attention maps visualize the model’s focus on relevant query terms. The method enhances efficiency and cuts redundancy. It also provides a more generalized approach to different query types. The framework promotes consistent and reliable performance across different data conditions, including varying input formats and varying noise levels. It can be generalized to multilingual and domain-specific applications. Overall, the framework provides a scalable and reliable solution to real-world information extraction.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools