Harnessing Adversarial Attack for Social Good: Imperceptible Watermarking of Images by Adversarial Attack

Project: Research

View graph of relations

Description

Generative image (GenIm) models (e.g., Midjourney) can produce realistic synthetic images, which can be used as “deepfake” images to trick people. Thus, methods for verifying the provenance of GenIm content are required to improve user trust of online images. Meanwhile, online social media platforms for artists to market their artwork are flourishing. Protecting the intellectual property of artwork through claims of ownership is also important. A solution for both issues is to use image watermarking, where the image is imperceptibly modified to embed a watermark, thus keeping its aesthetics. First, images produced by GenIm can be watermarked, and then consumers can check for the watermark to identify it as a synthetic image. Second, an artist can watermark their images before uploading, and can extract the watermark from derivative works to claim ownership. Most existing watermarking methods are based on either traditional methods, which provide theoretical guarantees on detector performance but are less secure because they use known linear embedding functions, or deep learning, which use non-linear embedding functions to improve performance and security but do not have any guarantees.  In this project, we combine the advantages of traditional methods and deep learning by proposing a new framework, Adversarial Attack for Watermarking (AA4W), which has both theoretical guarantees and better secrecy. First, we propose a statistical framework for training a deep neural network (DNN) as a secret key network (SKN), and then embedding a watermark using adversarial attack on the SKN. The SKN is used to detect the watermark, and extract an optional message. We derive hypothesis tests on the presence of the watermark, as well as guarantees on the detector false negative rate. Second, we research on using AA4W for steganography by embedding a unit real vector as the message, in contrast to previous works that embed bit strings. We study the information capacity of this message space, and meanwhile leverage the vector embedding spaces of recent image/text models to encode message contents. Finally, we investigate the threats posed to AA4W when adopted by multiple parties, such as preventing collisions when multiple watermarks are embedded by different SKNs, and an asymmetric-key version with improved usability. Our proposed work is in line with recent research on applying adversarial attack for social good purposes, rather than to attack or break DNN systems. Our approach can benefit artists by protecting their image content, and GenAI companies and users by providing means for determining provenance. 

Detail(s)

Project number9043694
Grant typeGRF
StatusActive
Effective start/end date1/07/24 → …