By Daniel Peterson, Pallika Kanani, and Virendra J. Marathe
A single user or a customer in the enterprise setting often does not have enough labeled data to train a reliable machine learning model. They might benefit greatly by pooling their data with other parties interested in solving essentially the same task, but might not be comfortable sharing their data.
Federated Learning (FL) enables multiple parties to jointly train a shared model without sharing data. The data remains decentralized, and each party only shares the gradients with a centralized server. All users can now get access to a more accurate model than the one they could have built using their individual data. Here’s what a typical FL setup looks like:
It is now recognized that Federated Learning alone is not sufficient to guarantee data privacy. The gradients shared by individual parties might leak information about their data distribution.
Differentially Private Federated Learning provides an additional layer of privacy. Informally, differential privacy aims to provide a bound, ε, on the variation in the model’s output based on the inclusion or exclusion of a single data point. Introducing “noise” in the training process (inputs, parameters, or outputs) makes it difficult to guarantee whether any particular data point was used to train the model. While this noise ensures ε-differential privacy for the data point, it can degrade the accuracy of model predictions.
The central question we address in our work is: How can we maintain the accuracy for individual users along with the privacy offered by Differentially Private Federated Learning?
Customize Machine Learning Models with Domain Adaptation
One interesting fact that we noticed about these Federated Learning settings was that users participating in FL often come from different domains, and even though they come together to solve the same task, each user’s data comes from slightly different distribution.
Consider the example of building an email spam classifier. Say we have 15 users, each with only 50 emails labeled “spam” or “not spam” for training. There are clear privacy concerns letting others read your email, but equally clear benefits to pooling the data, since spam is usually spam across the board. However, there is also room for personalization, since “good” email might vary by person.
Domain Adaptation (DA) provides a way for each user to customize or fine tune a Machine Learning model to their own unique data distribution; it often leads to improved model accuracy for that user.
The core idea we propose in our paper is to combine Differentially Private Federated Learning with Domain Adaptation to help maintain the accuracy for individual users while respecting their privacy.
Improve Accuracy by Using Domain Experts
In this setup, we maintain two models: a collaboratively learned general model and a privately learned domain adapted model. The general model is learned using Differentially Private Federated Learning, and has the advantage of training data from many users, but is less accurate due to added noise. The private model is adapted to the individual user’s data, and has the advantage of noise-free updates.
Each user then combines the outputs of these two models using a Mixture of Experts (MoE) to make their final prediction. The two “experts” in the mixture are the general FL model and the domain-tuned private model, so we refer to our system as federated learning with domain experts (FL+DE). Using an MoE architecture allows the general and private models to influence predictions differently on individual data points. All the parameters for this model, including the gating function for MoE are learned using Stochastic Gradient Descent. Here’s our proposed architecture:
We tested our approach on both synthetic and real-world datasets. In the synthetic regression experiment, two users attempt to fit a linear model of a non-linear function. The input examples are sampled from each user’s distinct Gaussian distributions.
We see that the gating learned by User 1 prefers its domain expert model in the darker region, and the gating function of User 2, which uses the shared model in a different region than User For comparison, we use a domain-only baseline system, which trains a separate model for each user on their data. Traditional FL, and our system of FL with domain experts (FL+DE) are tested with various noise levels, σ, for differential privacy.
The RMSE for these two users show that our system outperforms the accuracy of Differentially Private Federated Learning. We see a similar outcome on the real-world email spam classification problem mentioned before.
In the Low-Noise setting, Differentially Private FL system accuracy degrades by 11.5%, and on average it perform worse than the baseline system, in which the users did not collaborate, whereas the FL+DE accuracy does not degrade at all.
In the High-noise setting, Differentially Private FL system accuracy degrades by 13.9%, and FL+DE accuracy degrades by only 0.8%.
This shows that by introducing a private, per user domain expert, we’re able to increase the accuracy for individual user; this is especially beneficial when privacy guarantees begin to diminish the utility of the collaborative general model.
You can read the full paper here.