Open Access
ARTICLE
Head-Body Guided Deep Learning Framework for Dog Breed Recognition
1 Department of Computer Science, Yonsei University, Seoul, 03722, Republic of Korea
2 School of Information Technology, Murdoch University, Perth, WA 6150, Australia
3 Research Department, Chung-Ang University, Seoul, 06974, Republic of Korea
4 Department of Design Innovation, Sejong University, Seoul, 05006, Republic of Korea
* Corresponding Authors: Mi Young Lee. Email: ; Jakyoung Min. Email:
Computers, Materials & Continua 2025, 85(2), 2935-2958. https://doi.org/10.32604/cmc.2025.069058
Received 13 June 2025; Accepted 05 August 2025; Issue published 23 September 2025
Abstract
Fine-grained dog breed classification presents significant challenges due to subtle inter-class differences, pose variations, and intra-class diversity. To address these complexities and limitations of traditional handcrafted approaches, a novel and efficient two-stage Deep Learning (DL) framework tailored for robust fine-grained classification is proposed. In the first stage, a lightweight object detector, YOLO v8N (You Only Look Once Version 8 Nano), is fine-tuned to localize both the head and full body of the dog from each image. In the second stage, a dual-stream Vision Transformer (ViT) architecture independently processes the detected head and body regions, enabling the extraction of region-specific, complementary features. This dual-path approach improves feature discriminability by capturing localized cues that are vital for distinguishing visually similar breeds. The proposed framework introduces several key innovations: (1) a modular and lightweight head–body detection pipeline that balances accuracy with computational efficiency, (2) a region-aware ViT model that leverages spatial attention for enhanced fine-grained recognition, and (3) a training scheme incorporating advanced augmentations and structured supervision to maximize generalization. These contributions collectively enhance model performance while maintaining deployment efficiency. Extensive experiments conducted on the Tsinghua Dogs dataset validate the effectiveness of the approach. The model achieves an accuracy of 90.04%, outperforming existing State-of-the-Art (SOTA) methods across all key evaluation metrics. Furthermore, statistical significance testing confirms the robustness of the observed improvements over multiple baselines. The proposed method presents an effective solution for breed recognition tasks and shows strong potential for broader applications, including pet surveillance, veterinary diagnostics, and cross-species classification. Notably, it achieved an accuracy of 96.85% on the Oxford-IIIT Pet dataset, demonstrating its robustness across different species and breeds.Keywords
Cite This Article
Copyright © 2025 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools