The current adversarial attacks against deep learning models have achieved incredible success in the white-box scenario. However, they often exhibit weak transferability in the black-box scenario, especially when attacking those with defense mechanisms. In this work, we propose a new transfer-based black-box attack called the channel decomposition attack method (CDAM). It can attack multiple black-box models by enhancing the transferability of the adversarial examples. On the one hand, it tunes the gradient and stabilizes the update direction by decomposing the channels of the input example and calculating the aggregate gradient. On the other hand, it helps to escape from local optima by initializing the data point with random noise. Besides, it could combine with other transfer-based attacks flexibly. Extensive experiments on the standard ImageNet dataset show that our method could significantly improve the transferability of adversarial attacks. Compared with the state-of-the-art method, our approach improves the average success rate from 88.2% to 96.6% when attacking three adversarially trained black-box models, demonstrating the remaining shortcomings of existing deep learning models.
With the rapid development and remarkable success of deep neural networks (DNNs) in various tasks [
Adversarial attacks can be divided into white-box and black-box attacks. In the white-box scenario, the attacker has access to the full knowledge of the target model, including the framework, parameters, and trainable weights. If the adversarial examples are crafted by accessing the gradient of the target model, it is called gradient-based attacks [
In this work, we propose a new transfer-based black-box attack, called the Channel Decomposition Attack Method (CDAM) to improve the transferability of adversarial examples. Specifically, it decomposes the three-channel of the original red-green-blue (RGB) image and uses zero-value padding. Each channel individually constitutes a three-channel image, which together with the original image forms a set of images for gradient calculation. CDAM tunes the current gradient by aggregating the gradients to stabilize the update direction. For escaping from local optima, CDAM adds or subtracts the random noise of the standard normal distribution to initialize the data point at each iteration. Besides, CDAM can be combined with other transfer-based black-box attacks [ We propose a new transfer-based black-box attack method. Different from the others, it considers each channel of the input separately when attacking the substitute model, and calculates the aggregated gradient to tune the gradient direction and stabilize the gradient update. CDAM initializes the data point with random noise to escape from local optima, which reduces the dependence on substitute models and generates adversarial examples that can attack multiple black-box. Besides, it can combine with other transfer-based black-box attack methods, which could further enhance the transferability of crafted adversarial examples. Compared to the state-of-art method: CDAM obtains the highest average attack success rates; Under the ensemble-model setting, our integrated method achieves an average success rate of 96.6% on three adversarially trained black-box models, which is higher than the 85.6% of the current best method. The rest of the paper is organized as follows: Section 2 summarizes the related work; Section 3 introduces the implementation details of CDAM; Section 4 presents the experimental results and evaluates the performance of CDAM and Section 5 summarizes the work and describes the future work.
Given a substitute model
In this work, we use the
Transfer-based attacks transfer adversarial perturbations on a substitute to target models. However, due to the high dependence on the substitute models, they are easy to fall into local optima, which is also called the ‘overfit’ of the substitute model.
The process of generating adversarial examples is also treated as the training process of a neural network. In this perspective, DIM, SIM, and AAM could all be treated as ways to improve transferability through data augmentation.
However, to the best of our knowledge, existing attacks treat the RGB channels as a whole at each iteration to improve the transferability, without considering the impact of each channel individually. During each iteration of the attack, we consider the impact of each channel of RGB input. The RGB input channels are decomposed and padded with a zero-value matrix, with each channel forming a separate three-channel input. We feed them into the model to calculate the gradients. Then we calculate the aggregated gradient to tune the gradient update direction. We formulate this as
PGD improves the attack success rate by adding random noise, essentially adding pixel values to the images. From this, we can infer that
With the above analysis, we propose the CDAM. It calculates the aggregated gradient to tune the direction of the update gradient. Its formula can be expressed as
Adding or subtracting the random noise from the standard normal distribution, thus initializing the position of the input to escape from local optima. The gradient can be formulated as
In summary,
ImageNet large scale visual recognition challenge (ILSVRC) 2012 dataset [
We compare the state-of-the-art methods AAM [
We validate the effectiveness of CDAM on four popular normally trained models, namely Inception-v3 (M1) [
We use the attack success rate to compare the performance of different methods. The success rate is an important metric in adversarial attacks, which divides the number of misclassified adversarial examples by the total number of images.
We follow the attack setting in MI-FGSM, the maximum perturbation of
We first perform two adversarial attacks i.e., AAM and our proposed CDAM on a single neural network. We craft the adversarial examples on four normally trained neural networks and test them on seven neural networks.
Model | Attack | M1 | M2 | M3 | M4 | Ma | Mb | Mc |
---|---|---|---|---|---|---|---|---|
M1 | AAM | 82.9 | 80.5 | 73.6 | 40.8 | 38.2 | 20.9 | |
CDAM | ||||||||
M2 | AAM | 87.0 | 99.8* | 83.8 | 76.2 | 51.2 | 48.6 | 31.7 |
CDAM | ||||||||
M3 | AAM | 89.7 | 85.7 | 99.0* | 81.7 | 62.7 | 55.5 | 47.4 |
CDAM | ||||||||
M4 | AAM | 83.0 | 77.3 | 76.9 | 100.0* | 48.6 | 42.4 | 29.7 |
CDAM |
We can observe that CDAM outperforms both attacks on the normally trained black-box models and the adversarially trained black-box models while maintaining the success rates in the white-box scenario. In particular, when attacking the adversarially trained black-box model, CDAM outperforms AAM by a large margin, which is more practical for realistic scenarios.
For instance, when the attacked substitute model is M1, in the white-box scenario, both AAM and CDAM achieve success rates of 100.0%. However, in the black-box scenario, CDAM achieves an average success rate of 87.8%, which is 8.8% higher than AAM on the other three normally trained black-box models.
We also validate the attack effectiveness of our proposed CDAM combined with other transfer-based attacks, such as TIM, DIM, and SIM. Since SIM is a special case of AAM and CDAM, we validate the effect achieved by combining TIM and DIM with AAM and CDAM, respectively. That is, CAAM and CCDAM. The results are shown in
Model | Attack | M1 | M2 | M3 | M4 | Ma | Mb | Mc |
---|---|---|---|---|---|---|---|---|
M1 | CAAM | 90.4 | 86.4 | 82.6 | 71.7 | 68.1 | 50.7 | |
CCDAM | 99.2* | |||||||
M2 | CAAM | 91.5 | 98.9* | 88.8 | 82.3 | 77.0 | 72.8 | 63.0 |
CCDAM | ||||||||
M3 | CAAM | 91.4 | 89.5 | 87.1 | 82.6 | 80.1 | 77.4 | |
CCDAM | 98.1* | |||||||
M4 | CAAM | 88.5 | 85.2 | 87.6 | 79.2 | 74.8 | 65.0 | |
CCDAM | 99.0* |
In general, attacks combined with CDAM achieved better transferability than AMM. Taking the substitute is M1 model for example, the success of CCDAM on three adversarially trained black-box models outperforms CAAM with a clear margin of 13.2%∼20.0%. Such remarkable improvements demonstrate the high effectiveness of the proposed method.
An adversarial example is more likely to be able to successfully attack another black-box model if it can attack multiple models at the same time. MI-FGSM has proved that attacking multiple models can effectively improve the transferability of adversarial examples. Therefore, to fully validate the effectiveness of the CDAM, we use the ensemble-model attack proposed in [
Since adversarial examples are crafted on four normally trained models, all attacks have very similar success rates on these four models. Therefore, we only report the attack success rates on three adversarially trained black-box models.
As shown in
Attack | Ma | Mb | Mc | Average |
---|---|---|---|---|
CAAM | 91.6 | 89.4 | 86.4 | 89.1 |
CCDAM |
For the number of input examples
Through the above analysis, the change of
In this work, we propose a new transfer-based black-box attack, called the CDAM to improve the transferability. Specifically, CDAM decomposes the channels and pads them with a zero-value matrix to generate a set of images for tuning the gradient direction and stabilizing the update gradient. During each iteration, it initializes the data point with random noise to escape from local optima, further improving the adversarial attacks’ transferability. Extensive experiments show that the proposed CDAM significantly improves the transferability of the adversarial attacks in the black-box scenario. In future work, we plan to reduce the memory and time overhead of CDAM and increase the speed of generating adversarial examples.
This work was supported by
The authors declare that they have no conflicts of interest to report regarding the present study.