Adversarial MLRobustnessSecurity
Adversarial Attacks on Deepfake Detectors
2023-12-057 min read
Adversarial Attacks on Deepfake Detectors
Just as AI can be used to generate deepfakes, it can also be used to evade detection. Adversarial attacks involve adding imperceptible noise to an audio file to flip the detector's decision.
Types of Attacks
- White-box attacks: The attacker has full access to the detector's model weights.
- Black-box attacks: The attacker can only query the detector and observe the output.
Vulnerability
Our experiments show that even state-of-the-art detectors like RawNet2 can be fooled by adding noise with an SNR of 40dB, which is virtually inaudible to the human ear.
Robust Training
To defend against these attacks, we propose training detectors with adversarial examples included in the training set (adversarial training).