Adversarial Attacks on Deepfake Detectors

Just as AI can be used to generate deepfakes, it can also be used to evade detection. Adversarial attacks involve adding imperceptible noise to an audio file to flip the detector's decision.

Types of Attacks

White-box attacks: The attacker has full access to the detector's model weights.
Black-box attacks: The attacker can only query the detector and observe the output.

Vulnerability

Our experiments show that even state-of-the-art detectors like RawNet2 can be fooled by adding noise with an SNR of 40dB, which is virtually inaudible to the human ear.

Robust Training

To defend against these attacks, we propose training detectors with adversarial examples included in the training set (adversarial training).