Open Access
ARTICLE
Bird Species Classification Using Image Background Removal for Data Augmentation
Department of Computer Science and Information Engineering, National Quemoy University, Kinmen, 892009, Taiwan
* Corresponding Author: Yu-Xiang Zhao. Email:
(This article belongs to the Special Issue: Computer Vision and Image Processing: Feature Selection, Image Enhancement and Recognition)
Computers, Materials & Continua 2025, 84(1), 791-810. https://doi.org/10.32604/cmc.2025.065048
Received 02 March 2025; Accepted 13 May 2025; Issue published 09 June 2025
Abstract
Bird species classification is not only a challenging topic in artificial intelligence but also a domain closely related to environmental protection and ecological research. Additionally, performing edge computing on low-level devices using small neural networks can be an important research direction. In this paper, we use the EfficientNetV2B0 model for bird species classification, applying transfer learning on a dataset of 525 bird species. We also employ the BiRefNet model to remove backgrounds from images in the training set. The generated background-removed images are mixed with the original training set as a form of data augmentation. We aim for these background-removed images to help the model focus on key features, and by combining data augmentation with transfer learning, we trained a highly accurate and efficient bird species classification model. The training process is divided into a transfer learning stage and a fine-tuning stage. In the transfer learning stage, only the newly added custom layers are trained; while in the fine-tuning stage, all pre-trained layers except for the batch normalization layers are fine-tuned. According to the experimental results, the proposed model not only has an advantage in size compared to other models but also outperforms them in various metrics. The training results show that the proposed model achieved an accuracy of 99.54% and a precision of 99.62%, demonstrating that it achieves both lightweight design and high accuracy. To confirm the credibility of the results, we use heatmaps to interpret the model. The heatmaps show that our model can clearly highlight the image feature area. In addition, we also perform the 10-fold cross-validation on the model to verify its credibility. Finally, this paper proposes a model with low training cost and high accuracy, making it suitable for deployment on edge computing devices to provide lighter and more convenient services.Keywords
Cite This Article

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.