TY  - EJOU
AU  - Ammu, Sai Venkata Akhil 
AU  - Sehra, Sukhjit Singh 
AU  - Sehra, Sumeet Kaur 
AU  - Singh, Jaiteg 

TI  - Amalgamation of Classical and Large Language Models for Duplicate Bug Detection: A Comparative Study
T2  - Computers, Materials \& Continua

PY  - 2025
VL  - 83
IS  - 1
SN  - 1546-2226

AB  - Duplicate bug reporting is a critical problem in the software repositories’ mining area. Duplicate bug reports can lead to redundant efforts, wasted resources, and delayed software releases. Thus, their accurate identification is essential for streamlining the bug triage process mining area. Several researchers have explored classical information retrieval, natural language processing, text and data mining, and machine learning approaches. The emergence of large language models (LLMs) (ChatGPT and Huggingface) has presented a new line of models for semantic textual similarity (STS). Although LLMs have shown remarkable advancements, there remains a need for longitudinal studies to determine whether performance improvements are due to the scale of the models or the unique embeddings they produce compared to classical encoding models. This study systematically investigates this issue by comparing classical word embedding techniques against LLM-based embeddings for duplicate bug detection. In this study, we have proposed an amalgamation of models to detect duplicate bug reports using textual and non-textual information about bug reports. The empirical evaluation has been performed on the open-source datasets and evaluated based on established metrics using the mean reciprocal rank (MRR), mean average precision (MAP), and recall rate. The experimental results have shown that combined LLMs can outperform (recall-rate@k = 68%–74%) other individual models for duplicate bug detection. These findings highlight the effectiveness of amalgamating multiple techniques in improving the duplicate bug report detection accuracy.
KW  - Duplicate bug detection; large language models; information retrieval

DO  - 10.32604/cmc.2025.057792