Skip to main navigation Skip to search Skip to main content

Improved Adversarial Robustness by Hardened Prediction

Qihang Liang, Chung Chan*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

We find a way to harden the decision of a neural network. Combining such a hardening effect with another adversarial training method would further improve its adversarial robustness. By suppressing the logit corresponding to the class that the model has highest confidence during training, the model is encouraged to make harder predictions. This significantly improves a model's robustness against gradient-based adversarial attacks. The simplicity of our method makes it very easy to be deployed on existing adversarial training schemes with almost no computational overhead. The experimental results show that a model trained with TRADES benefits from hardening. It shows a greatly improved robustness against the PGD attack while retaining similar performance against decision-based attacks. How the hardening effect effectively defends the models from gradient-based attacks is worth further investigation.

Original languageEnglish
Title of host publication2022 IEEE International Symposium on Information Theory (ISIT)
PublisherIEEE
Pages2952-2956
ISBN (Electronic)978-1-6654-2159-1
ISBN (Print)978-1-6654-2160-7
DOIs
Publication statusPublished - 2022
Event2022 IEEE International Symposium on Information Theory, ISIT 2022 - University in Espoo, Espoo, Finland
Duration: 26 Jun 20221 Jul 2022

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
ISSN (Print)2157-8095
ISSN (Electronic)2157-8117

Conference

Conference2022 IEEE International Symposium on Information Theory, ISIT 2022
PlaceFinland
CityEspoo
Period26/06/221/07/22

Fingerprint

Dive into the research topics of 'Improved Adversarial Robustness by Hardened Prediction'. Together they form a unique fingerprint.

Cite this