Dual manifold adversarial robustness : Defense against Lp and non-Lp adversarial attacks
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review
Author(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | 34th Conference on Neural Information Processing Systems (NeurIPS 2020) |
Editors | H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, H. Lin |
Publisher | Neural Information Processing Systems Foundation, Inc. |
Pages | 3487-3498 |
Volume | 5 |
ISBN (print) | 9781713829546 |
Publication status | Published - Dec 2020 |
Externally published | Yes |
Publication series
Name | Advances in Neural Information Processing Systems |
---|---|
Volume | 33 |
ISSN (Print) | 1049-5258 |
Conference
Title | 34th Conference on Neural Information Processing Systems (NeurIPS 2020) |
---|---|
Location | Virtual |
Place | Canada |
City | Vancouver |
Period | 6 - 12 December 2020 |
Link(s)
Document Link | Links
|
---|---|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85101803955&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(5e5d767b-9be0-43ec-9915-d218185181af).html |
Abstract
Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms. However, it often degrades the model performance on normal images and more importantly, the defense does not generalize well to novel attacks. Given the success of deep generative models such as GANs and VAEs in characterizing (approximately) the underlying manifold of images, we investigate whether or not the aforementioned deficiencies of adversarial training can be remedied by exploiting the underlying manifold information. To partially answer this question, we consider the scenario when the manifold information of the underlying data is available. We use a subset of ImageNet natural images where an approximate underlying manifold is learned using StyleGAN. We also construct an “On-Manifold ImageNet” (OM-ImageNet) dataset by projecting the ImageNet samples onto the learned manifold. For this dataset, the underlying manifold information is exact. Using OM-ImageNet, we first show that adversarial training in the latent space of images (i.e. on-manifold adversarial training) improves both standard accuracy and robustness to on-manifold attacks. However, since no out-of-manifold perturbations are realized, the defense can be broken by Lp adversarial attacks. We further propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model. Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks. In addition, we observe that models defended by DMAT achieve improved robustness against novel attacks which manipulate images by global color shifts or various types of image filtering. Interestingly, similar improvements are also achieved when the defended models are tested on (out-of-manifold) natural images. These results demonstrate the potential benefits of using manifold information (exactly or approximately) in enhancing robustness of deep learning models against various types of novel adversarial attacks. Codes and models will be available in this link.
Citation Format(s)
Dual manifold adversarial robustness: Defense against Lp and non-Lp adversarial attacks. / Lin, Wei-An; Lau, Chun Pong; Levine, Alexander et al.
34th Conference on Neural Information Processing Systems (NeurIPS 2020). ed. / H. Larochelle; M. Ranzato; R. Hadsell; M.F. Balcan; H. Lin. Vol. 5 Neural Information Processing Systems Foundation, Inc., 2020. p. 3487-3498 (Advances in Neural Information Processing Systems; Vol. 33).
34th Conference on Neural Information Processing Systems (NeurIPS 2020). ed. / H. Larochelle; M. Ranzato; R. Hadsell; M.F. Balcan; H. Lin. Vol. 5 Neural Information Processing Systems Foundation, Inc., 2020. p. 3487-3498 (Advances in Neural Information Processing Systems; Vol. 33).
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review