Mutual Adversarial Training : Learning Together is Better Than Going Alone

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

14 Scopus Citations
View graph of relations

Author(s)

Detail(s)

Original languageEnglish
Pages (from-to)2364-2377
Number of pages14
Journal / PublicationIEEE Transactions on Information Forensics and Security
Volume17
Online published17 Jun 2022
Publication statusPublished - 2022
Externally publishedYes

Abstract

Recent studies have shown that robustness to adversarial attacks can be transferred across deep neural networks. In other words, we can make a weak model more robust with the help of a strong teacher model. In this paper, we ask if models can 'learn together' and 'teach each other' to achieve better robustness instead of learning from a static teacher. We study how interactions among models enhance robustness via knowledge distillation. We propose mutual adversarial training (MAT), in which multiple models are trained together and share the knowledge of adversarial examples to achieve improved robustness. MAT allows robust models to explore a larger space of adversarial samples and find more robust feature spaces and decision boundaries. Through extensive experiments on the CIFAR-10, CIFAR-100, and mini-ImageNet datasets, we demonstrate that MAT can effectively improve model robustness and outperform state-of-the-art methods under white-box attacks. In addition, we show that MAT can also mitigate the robustness trade-off among different perturbation types. Specially, we train specialist models that learn to defend a specific perturbation type and a generalist model that learns to defend multiple perturbation types by learning from the specialists, which brings as much as 13.4% accuracy gain to AT baselines against the union of l, l2 , and l1 attacks. Our results show the effectiveness of the proposed method and demonstrate that collaborative learning is an effective strategy for designing robust models. © 2022 IEEE.

Research Area(s)

  • adversarial defense, Adversarial robustness, image classification, knowledge distillation

Bibliographic Note

Publisher Copyright: © 2005-2012 IEEE.

Citation Format(s)