By Swetasudha Panda, Senior Member of Technical Staff, Oracle Labs - EAST
Machine learning models are being increasingly used to make critical decisions in various regulated domains, such as hiring or credit/lending decisions. However, on multiple occasions, such models have been reported to exhibit discriminatory behavior with respect to various legally recognized protected groups, such as on the basis of age, gender, or ethnicity.
Ideally, the models should comply with the appropriate standards enforced by regulatory departments. For example, in case of hiring, the US Equal Opportunity Commission and Department of Labor define the Four-Fifths Rule, which requires that the selection rate of any protected group should be at least four-fifths of that of the group with the highest rate.
While this rule has been widely used to audit for disparate impact (a form of discriminatory practice recognized by law), it fails to handle the case of small data. Specifically, consider the case when there are only two applicants for a job -- a male and a female -- and one of them gets hired. This example obviously violates the rule, even though there is possibly not enough evidence for discrimination.
Recently, there has been an abundance of research work in the machine learning community on how to quantify the notions of fairness and algorithmic interventions to mitigate any discriminatory effects in ML models.
Even with such a variety of fairness techniques, it is difficult to ensure that the overall application meets a particular fairness objective. This is because models that satisfy a given fairness objective in isolation do not necessarily compose into a system that is fair overall. Moreover, certain fairness interventions during training do not subsequently generalize. Many intervention approaches are targeted at specific ML models, and it is often non-trivial to extend the approaches to other ML algorithms. There is extensive work on enforcing fairness criteria in binary classification, but there is very limited research in the case of ranking models. As it is common for an application to have several ML models (some of which might also have the relevant fairness interventions), it becomes difficult to manage and diagnose the overall application from a fairness perspective.
In this paper featured at NeurIPS, we propose the use of Bayes factors - hypothesis testing that accounts for the uncertainty in small data and determines the amount of evidence for discrimination. The Bayes factor also acts as a continuous measure of a given fairness objective and directs the aggressiveness of a post-training mitigation control system, so as to maintain the fairness objective within appropriate limits for the overall application.
We design this system in the context of ranking applications. The Bayes factor tests operate on an aggregate of rankings, and the mitigation step operates in an online set-up and makes appropriate re-rankings to improve the fairness objective while minimizing changes from the original. Experiments on both real-world and synthetic datasets demonstrate that the Bayes factor is a useful measure of a given fairness objective for auditing the application as well as the control system itself.
You can read the full paper here.