Certified and Forensic Defenses against Poisoning and Backdoor Attacks

Zayd Hammoudeh

Data poisoning and backdoor attacks manipulate model predictions by inserting malicious instances into the training set. Most existing defenses against poisoning and backdoor attacks are empirical and easily evaded by an adaptive attacker. In addition, existing empirical defenses provide, at best, minimal insights into an attacker's identity, goals, and methods. In contrast, this work proposes two classes of poisoning and backdoor defenses: (1) certified defenses, which provide provable guarantees on their robustness and (2) forensic defenses, which provide actionable, human-interpretable insights into an attack's goals so as to stop the attack via intervention outside the ML system. We focus on certified defenses for regression, where the model predicts a continuous value, and sparse (L0) attacks, where the adversary controls an unknown subset of the training and test features. Our forensic defense identifies the target of poisoning and backdoor attacks while simultaneously mitigating the attack; we validate our forensic defense on a wide range of data modalities, including speech, text, and vision.

This dissertation includes previously published and unpublished coauthored material.