Open Access iconOpen Access

ARTICLE

crossmark

CLIP-ASN: A Multi-Model Deep Learning Approach to Recognize Dog Breeds

Asif Nawaz1,*, Rana Saud Shoukat2, Mohammad Shehab1, Khalil El Hindi3, Zohair Ahmed4

1 College of Information Technology, Amman Arab University, Amman, 11953, Jordan
2 University of Institute Information Technology, PMAS-Arid Agriculture University Rawalpindi, Rawalpindi, 46000, Pakistan
3 Department of Computer Science, College of Computer & Information Sciences, King Saud University, Riyadh, 11543, Saudi Arabia
4 School of Computer Science and Engineering, Central South University, Changsha, 410083, China

* Corresponding Author: Asif Nawaz. Email: email

Computers, Materials & Continua 2025, 85(3), 4777-4793. https://doi.org/10.32604/cmc.2025.064088

Abstract

The kingdom Animalia encompasses multicellular, eukaryotic organisms known as animals. Currently, there are approximately 1.5 million identified species of living animals, including over 195 distinct breeds of dogs. Each breed possesses unique characteristics that can be challenging to distinguish. Each breed has its own characteristics that are difficult to identify. Various computer-based methods, including machine learning, deep learning, transfer learning, and robotics, are employed to identify dog breeds, focusing mainly on image or voice data. Voice-based techniques often face challenges such as noise, distortion, and changes in frequency or pitch, which can impair the model’s performance. Conversely, image-based methods may fail when dealing with blurred images, which can result from poor camera quality or photos taken from a distance. This research presents a hybrid model combining voice and image data for dog breed identification. The proposed method Contrastive Language-Image Pre-Training-Audio Stacked Network (CLIP-ASN) improves robustness, compensating when one data type is compromised by noise or poor quality. By integrating diverse data types, the model can more effectively identify unique breed characteristics, making it superior to methods relying on a single data type. The key steps of the proposed model are data collection, feature extraction based on Contrastive Language Image for image-based feature extraction and Audio stacked-based voice features extraction, co-attention-based classification, and federated learning-based training and distribution. From the experimental evaluation, it has been concluded that the performance of the proposed work in terms of accuracy 89.75% and is far better than the existing benchmark methods.

Keywords

Machine learning; ensemble methods; image detection; voice detection; animal breeds

Cite This Article

APA Style
Nawaz, A., Shoukat, R.S., Shehab, M., Hindi, K.E., Ahmed, Z. (2025). CLIP-ASN: A Multi-Model Deep Learning Approach to Recognize Dog Breeds. Computers, Materials & Continua, 85(3), 4777–4793. https://doi.org/10.32604/cmc.2025.064088
Vancouver Style
Nawaz A, Shoukat RS, Shehab M, Hindi KE, Ahmed Z. CLIP-ASN: A Multi-Model Deep Learning Approach to Recognize Dog Breeds. Comput Mater Contin. 2025;85(3):4777–4793. https://doi.org/10.32604/cmc.2025.064088
IEEE Style
A. Nawaz, R. S. Shoukat, M. Shehab, K. E. Hindi, and Z. Ahmed, “CLIP-ASN: A Multi-Model Deep Learning Approach to Recognize Dog Breeds,” Comput. Mater. Contin., vol. 85, no. 3, pp. 4777–4793, 2025. https://doi.org/10.32604/cmc.2025.064088



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 755

    View

  • 328

    Download

  • 0

    Like

Share Link