Defending against Adversarial Examples in Deep Learning: New Regularization and Training Methods

Project: Research

View graph of relations


Deep neural networks (DNNs) have become a standard method in machine learning and related areas, such as computer vision, due to their rich model capacity and training heuristics that allow learning of complex models. However, one challenge to DNNs are adversarial examples, inputs with imperceptible modifications that cause the DNN to make a wrong prediction. Adversarial examples are a serious security threat to DNNs, especially “black-box” attacks that do not have direct access to the DNNs internal mechanisms. The ability to force a DNN to make a wrong prediction without a human observer noticing could lead to malicious behavior, especially as DNNs becomes more prevalent in automated systems, such as self-driving cars. Thus, defense against adversarial examples becomes increasingly important to ensure that DNNs and systems based on them are secure against bad actors.Typical defense methods to adversarial examples apply denoising to the input image or feature map, or use data augmentation. However, these approaches only address the “symptom” of adversarial examples. The existence of adversarial examples suggests a serious underlying flaw in the training of DNNs – there is a growing body of evidence that suggests that adversarial examples are a result of generalization error, i.e., although test accuracy is excellent, DNNs still suffer from overfitting. Thus, the most principled approach to defend against adversarial examples is to improve the generalization of DNNs or prevent overfitting.In this project, we propose to defend against adversarial examples by improving regularization and training methods to prevent DNNs from overfitting. First, we will theoretically investigate why weight regularization, which should control model complexity, fails in DNNs, and propose new regularizers to address these shortcomings. Second, we will research training with adversarial examples to better suppress non-robust features to prevent overfitting. Third, we will theoretically investigate why the Gaussian prior, used in Bayesian neural networks (BNNs) that should be less susceptible to overfitting, is ineffective and propose new priors that fix these problems. Fourth, we propose new approximate inference methods for BNNs that allow the posterior ensemble to be resampled dynamically at test time, which improves adversarial robustness compared to fixed ensembles.This project will address the root cause of adversarial examples by preventing overfitting of DNNs, and contribute to better understanding of current issues with DNNs. Improving our understanding of DNNs and making them secure to adversarial attacks is an important issue as it becomes more prevalent in automated systems in our daily lives. 


Project number9042992
Grant typeGRF
Effective start/end date1/01/21 → …