Deep convolution neural networks are going deeper and deeper. However, the complexity of models is prone to overfitting in training. Dropout, one of the crucial tricks, prevents units from co-adapting too much by randomly dropping neurons during training. It effectively improves the performance of deep networks but ignores the importance of the differences between neurons. To optimize this issue, this paper presents a new dropout method called guided dropout, which selects the neurons to switch off according to the differences between the convolution kernel and preserves the informative neurons. It uses an unsupervised clustering algorithm to cluster similar neurons in each hidden layer, and dropout uses a certain probability within each cluster. Thereby this would preserve the hidden layer neurons with different roles while maintaining the model’s scarcity and generalization, which effectively improves the role of the hidden layer neurons in learning the features. We evaluated our approach compared with two standard dropout networks on three well-established public object detection datasets. Experimental results on multiple datasets show that the method proposed in this paper has been improved on false positives, precision-recall curve and average precision without increasing the amount of computation. It can be seen that the increased performance of guided dropout is thanks to shallow learning in the networks. The concept of guided dropout would be beneficial to the other vision tasks.
In recent years, deep learning technics [
So why does dropout switch off the neurons randomly? Actually the hidden layer neurons with different attributes play the different roles, while some neurons have similar attributes. Hence, the different roles of neurons should be regarded as a priori to decide whether to switch off neurons, rather than treat them equally. We propose a method called “guided dropout”, whose main idea is to cluster similar neurons in each hidden layer and dropout using a certain probability within each cluster. It helps to make the hidden layer neurons with different roles to be well preserved while maintaining the scarcity and generalization of the model. At the same time, it will increase the drop probability for similar neurons in the hidden layers. This is an idea of deep model guided by shallow model.
The avoidance of handcrafted features engineering may have both advantages and shortcomings. An appropriate orientation plays a vital role in daily life, as well as in deep learning. The use of shallow learning for proper guidance is conducive to deep learning to get faster convergence and improve training models’ accuracy. This paper focuses on using shallow learning model in dropout to set some neurons to zero with selective probability according to the difference of neurons.
In this paper, the proposed guided dropout uses the results obtained from shallow learning to reduce the certain drop probability of the hidden layer neurons that contribute differently to the object detection task [
This paper organizes as follows. Section 2 outlines the related works on dropout methods. Section 3 goes into detail on guided dropout. Sections 4 and 5 outline the experiment results and conclusions.
Recently some other alternative dropout methods have been proposed. These include a method of adaptive dropout [
Dropout has also been used innovatively by Wan et al. to create a new method called DropConnect [
More than ten years ago, Dalal et al. [
In the aspect of deep models, Ren Shaoqing and He Kaiming et al. improved Faster R-CNN based on the deep model and a series of subsequent improvements [
During a training epoch, the standard dropout method leads the neurons in the hidden layer to switch off with a certain probability of increasing the sparseness of the network. Each training sample can provide gradients for a different, randomly sampled architecture so that the final neural network efficiently represents a considerable ensemble of neural networks with good generalization capability. In the dropout method, a thinned network is sampled from the complete set of possible networks with a certain probability for each mini-batch. Gradient descent is then applied to the thinned network. This is an embodiment of the ideas of the “ensemble model” [
In fact, a large number of hidden layer neurons in a deep network usually have similar attributes. In other words, the hidden layer neurons with similar attributes can be approximated by some types of transformation. Hidden layer neurons with different attributes play the more critical roles. For example, for image recognition, generally, it is necessary to describe convolution kernel types of features such as edges [
Therefore, it is not the optimal mode that all neurons in the dropout are randomly set to 0 with a certain probability. The mode should be set to zero selectively according to the difference in the hidden layer’s neurons. In other words, the probability that the hidden layer neurons with different roles are set to zero should be smaller than the hidden output neurons of similar effects.
This paper proposes a new alternative dropout method based on unsupervised shallow learning. The main idea is first to use an unsupervised clustering algorithm to cluster similar neurons in each hidden layer and dropout using a certain probability within each cluster. This method would make the hidden layer neurons with different roles to be well preserved while maintaining the scarcity and generalization of the model.
Consider a neural network with L hidden layers. Let
Different from the standard dropout, as shown in
The detailed algorithm is described in
Algorithm 1 guided dropout |
---|
Notes: [a] The k-means++ [
[b] The definition of S_Dbw [
The experiments were implemented on the workstation with Intel Core i7-6900k 3.6 GHz processor, 64 GB random access memory (RAM) and Nvidia Titan Xp 12GB. We implemented the proposed method on Python 2.7.12, Compute Unified Device Architecture (CUDA) 10, Compute Unified Deep Neural Network library (cuDNN) 7 and our modification of the Caffe library (
We apply guided dropout to Faster R-CNN, called guided Faster R-CNN. For convenience, the guided Faster R-CNN proposed in this paper uses the ZF (Zeiler & Fergus) model [
The compared approaches are Faster R-CNN with standard dropout and LDCF.
In order to evaluate our guided dropout method comprehensively and objectively, we have made experiments on 3 well-established public object detection datasets, namely the INRIA person dataset, Eidgenössische Technische Hochschule Zürich (ETH) pedestrian dataset (Setup 1 (chariot Mk I)) [
The miss rate-false positives per image (FPPI) curves on the INRIA test set, the ETH Setup1 test set, and the Caltech test set are shown in
The average precision (AP) for the different methods on the INRIA test set, ETH Setup1 test set, and Caltech test set are shown in
Methods | AP (%) | ||
---|---|---|---|
INRIA | ETH | Caltech | |
Guided faster R-CNN | 89.2 | 89.0 | 88.6 |
LDCF | 88.8 | 88.4 | 87.8 |
Faster R-CNN | 88.7 | 88.3 | 88.3 |
Examples of the detection of partial images on the INRIA test set, the ETH dataset Setup1 test set and the Caltech test set are shown in
In terms of miss rate, Faster R-CNN is better than LDCF on ETH and Caltech, but LDCF is better than Faster R-CNN on INRIA. In terms of average precision, LDCF is slightly better than Faster R-CNN on INRIA and ETH, but Faster R-CNN is better than LDCF on Caltech. It indicates that, on the whole, the gap between the shallow model and the deep model in the miss rate is greater than the average precision. When the human body is large, the miss rate of the shallow learning model is lower than that of the deep model. When the human body is small, the average precision of the deep model is higher than that of the shallow model. Guided Faster R-CNN achieved the best results, which indicated that it combined the advantages of the deep model and shallow model.
From the above experimental results, the guided dropout proposed in this paper has achieved the best detection results on three datasets. The proposed method still has good generalization when the background is complex and changeable. By analyzing of miss rate-FPPI curve, precision-recall curve and AP, it can be seen that guided dropout is better than standard dropout, thanks to the guide of shallow learning in the networks.
This paper proposes an effective dropout method, which takes advantage of shallow learning. The work in this paper explores some new ideas for the study of deep learning. The core value focuses on using the guided and reasonable probability to dropout neurons based on an unsupervised clustering algorithm and leads to better results than the standard dropout approach. In this work, the proposed method has been conducted using three challenging datasets and achieved the best experimental results. On this basis, we conclude that the proper combination of deep learning and shallow learning may achieve better results. The other conclusion that can be drawn is that the gap between the shallow model and the deep model in the miss rate is greater than the average precision. In future work, we will study task-aware guided dropout and simplify the training process. In addition, we try to apply the concept of guided dropout to improve the performance of deep networks in other fields such as telemedicine [
This work is supported by the
The authors declare that they have no conflicts of interest to report regarding the present study.