Securing federated sensitive topic classification against poisoning attacks
arXiv preprint arXiv:2201.13086, 2022•arxiv.org
We present a Federated Learning (FL) based solution for building a distributed classifier
capable of detecting URLs containing GDPR-sensitive content related to categories such as
health, sexual preference, political beliefs, etc. Although such a classifier addresses the
limitations of previous offline/centralised classifiers, it is still vulnerable to poisoning attacks
from malicious users that may attempt to reduce the accuracy for benign users by
disseminating faulty model updates. To guard against this, we develop a robust aggregation …
capable of detecting URLs containing GDPR-sensitive content related to categories such as
health, sexual preference, political beliefs, etc. Although such a classifier addresses the
limitations of previous offline/centralised classifiers, it is still vulnerable to poisoning attacks
from malicious users that may attempt to reduce the accuracy for benign users by
disseminating faulty model updates. To guard against this, we develop a robust aggregation …
We present a Federated Learning (FL) based solution for building a distributed classifier capable of detecting URLs containing GDPR-sensitive content related to categories such as health, sexual preference, political beliefs, etc. Although such a classifier addresses the limitations of previous offline/centralised classifiers,it is still vulnerable to poisoning attacks from malicious users that may attempt to reduce the accuracy for benign users by disseminating faulty model updates. To guard against this, we develop a robust aggregation scheme based on subjective logic and residual-based attack detection. Employing a combination of theoretical analysis, trace-driven simulation, as well as experimental validation with a prototype and real users, we show that our classifier can detect sensitive content with high accuracy, learn new labels fast, and remain robust in view of poisoning attacks from malicious users, as well as imperfect input from non-malicious ones.
arxiv.org