research-article

Open access

Fairness-Driven Private Collaborative Machine Learning

Authors:

Dana Pessach,

Tamir Tassa,

Erez ShmueliAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 15, Issue 2

Article No.: 27, Pages 1 - 30

https://doi.org/10.1145/3639368

Published: 22 February 2024 Publication History

PDF eReader

Abstract

The performance of machine learning algorithms can be considerably improved when trained over larger datasets. In many domains, such as medicine and finance, larger datasets can be obtained if several parties, each having access to limited amounts of data, collaborate and share their data. However, such data sharing introduces significant privacy challenges. While multiple recent studies have investigated methods for private collaborative machine learning, the fairness of such collaborative algorithms has been overlooked. In this work, we suggest a feasible privacy-preserving pre-process mechanism for enhancing fairness of collaborative machine learning algorithms. An extensive evaluation of the proposed method shows that it is able to enhance fairness considerably with only a minor compromise in accuracy.

1 Introduction

Machine Learning (ML) is one of the most prominent technological developments in recent years, rapidly becoming a central part of our life. Applications of ML are all around us, ranging from traffic prediction and virtual personal assistants to automated radiology and autonomous vehicles.

The performance of ML models inherently depends on the availability of a large quantity of useful training data. In neural network learning, for example, recent studies have shown that accuracy can be improved considerably by having access to large datasets [1]. In many applications, however, data is scattered and held by multiple different parties that may be reluctant to share their information for multiple reasons such as commercial competition, privacy concerns, and in some domains even legal constraints. For example, the Health Insurance Portability and Accountability Act (HIPAA) in the United States places strict constraints on the ability of health care providers to share patient data [2].

As a result, a variety of methods have been recently proposed to allow multiple parties to collaboratively train ML models while preventing the disclosure of private information. Hereinafter, we refer to this setting as Private Collaborative Machine Learning (PCML). While these methods address a very similar problem, they are often associated with different (although overlapping) research domains including privacy-preserving data mining [3, 4], privacy-preserving ML [5], collaborative ML [6, 7], and federated ML [8].

These methods take two main approaches for preserving privacy. The first approach is based on perturbation [4], which incorporates noise to the training data to obscure sensitive information. Differential privacy [9] is perhaps the most prominent perturbation technique. The main limitation of this approach is that incorporating noise to the data may yield an inferior model. The second approach is based on secure Multi-Party Computation (MPC) [3]. MPC achieves privacy protection by applying cryptographic techniques that enable several parties to perform a joint computation on private data that is distributed among them, where only the outcome of the computation is revealed to the parties, but no other information is exposed [10]. In contrast to perturbation, MPC does not change the data, and therefore the output issued by such algorithms is identical to the output of their non-privacy-preserving counterparts. However, since in many cases a generic and perfectly secure MPC solution is infeasible [11], lower security requirements are typically accepted for the sake of higher efficiency [12].

Despite the growing number of studies proposing algorithms for PCML, the fairness of such algorithms was overlooked. Since many automated decisions (including which individuals will receive jobs, loans, medication, bail or parole) can significantly impact peoples’ lives, there is great importance in assessing and improving the fairness of the decisions made by such algorithms. Indeed, in recent years, the concern for algorithmic fairness has made headlines. One of the most prominent examples was in the field of criminal justice, where recent revelations have shown that an algorithm used by the U.S. criminal justice system had falsely predicted future criminality among African-Americans at twice the rate as its parallel predictions for White people [13]. These lines of evidence and concerns about algorithmic fairness have led to growing interest in the literature on defining, evaluating, and improving fairness in ML algorithms (see a recent review in other works [14, 15]). All of these studies, however, focused on centralized settings.

In this article, we consider a learning setting similar to the PCML setting described previously, where data is scattered among several parties, who wish to engage in a joint ML procedure, without disclosing their private information. Our setting, however, also adds a fairness requirement, mandating that the learned model satisfies a certain level of fairness.

To address the new fairness requirement, we suggest a privacy-preserving pre-process mechanism for enhancing fairness of collaborative ML algorithms. Similarly to the pre-process fairness mechanism suggested in the work of Feldman et al. [16], our method improves fairness through decreasing distances between the distributions of attributes of the privileged and unprivileged groups. Our approach is not tailored to a specific algorithm and therefore can be used with any PCML algorithm. In contrast to Feldman et al. [16], our method was designed to allow privacy-preserving enhancements, which are obtained through MPC techniques.

An extensive evaluation that we conducted over real-world datasets shows that the proposed method is able to improve fairness considerably, with almost no compromise in accuracy. Furthermore, we show that the runtime of the proposed method is feasible, especially considering that it is executed once as a pre-process procedure.

The article is organized as follows. We begin with a review of related work in Section 2. In Section 3, we define the problem that we study and present relevant notations that we shall be using. In Section 4, we describe our solution to this problem—a privacy-preserving fairness-enhancing mechanism. We present our experimental evaluation in Section 5 and conclude in Section 6.

2 Related Work

We organize the survey of relevant related work as follows. In Section 2.1, we survey related studies that deal with algorithmic fairness in centralized settings. In Section 2.2, we review literature on PCML. We conclude, in Section 2.3, with a discussion of recent research efforts on private and fair ML. We note that all of the latter studies focused on either a centralized setting or a distributed setting that is non-collaborative, whereas we consider a distributed collaborative setting in this work.

2.1 Algorithmic Fairness in Centralized Settings

Fairness Definitions and Measures.

The law makes a distinction between two types of discrimination: (i) disparate treatment [17, 18], which is intentionally treating an individual differently based on his/her membership in a protected class/unprivileged group (direct discrimination), and (ii) disparate impact [18, 19], which is negatively affecting members of a protected class/privileged group, more than others, even if by a seemingly neutral policy (indirect discrimination).

Put in our context, it is important to note that algorithms trained with data that do not include sensitive attributes (i.e., attributes that explicitly identify the privileged and unprivileged groups) are unlikely to produce disparate treatment but may still induce unintentional discrimination in the form of disparate impact [20].

In the algorithmic fairness literature, multiple measures were suggested. (The reader is referred to other works [14, 15, 21] for a comprehensive review of fairness definitions and measures.) The most prominent measures include demographic parity [22, 23] and equalized odds [24]. Demographic parity ensures that the proportion of the positive predictions is similar across groups. For example, if a positive prediction represents acceptance for a job, then the demographic parity condition requires the proportion of accepted applicants to be similar across groups. One disadvantage of this measure is that a fully accurate classifier may be considered unfair, when the base rates (i.e., the proportion of actual positive outcomes) of the various groups are considerably different. Moreover, to satisfy demographic parity, two similar individuals may be treated differently since they belong to two different groups, and in some cases such treatment is prohibited by law.

In this article, we focus on a variation of the equalized odds measure. This measure was devised by Hardt et al. [24] to overcome the disadvantages of measures such as demographic parity. It was designed to assess the difference between the two groups by measuring the difference between the False-Positive Rates (FPR) in the two groups, as well as the difference between the False-Negative Rates (FNR) in the two groups.

In contrast to the demographic parity measure, a fully accurate classifier will necessarily satisfy the two equalized odds constraints. Nevertheless, since equalized odds relies on the actual ground truth, it assumes that the base rates of the two groups are representative and were not obtained in a biased manner.

It is important to note that there is an inherent tradeoff between accuracy and fairness: as we pursue a higher degree of fairness, we may compromise accuracy (see, e.g., [20]).

Mechanisms for Enhancing Fairness

Fairness-enhancing mechanisms are broadly categorized into three types: pre-process, in-process, and post-process. Pre-process mechanisms involve changing the training data before feeding it into the ML algorithm [16, 25, 26, 27, 28, 29]. For example, Feldman et al. [16] suggested to modify the attributes in the dataset so that the distributions for both privileged and unprivileged groups become closer, therefore making it more difficult for the algorithm to differentiate between the two groups. In-process mechanisms involve modifying ML algorithms to account for fairness during training time [22, 27, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]. For example, Kamishima et al. [30] suggested adding a regularization term to the objective function that penalizes the mutual information between the sensitive attribute and the model’s predictions. Post-process mechanisms perform post-processing to the output scores of the model to make decisions fairer [24, 40, 41, 42]. For example, Hardt et al. [24] proposed a technique for flipping some decisions of the model to enhance equalized odds. Similarly, Corbett-Davies et al. [40] suggested to select a threshold for each group separately, in a manner that maximizes accuracy and minimizes demographic parity.

The different mechanism types are associated with the following advantages and disadvantages [14, 15]. Pre-process mechanisms can be advantageous since they can be used with any ML algorithm. However, since they are not tailored to a specific algorithm, there is no guarantee on the level of accuracy that such a mechanism may yield at the end of the process. Similar to pre-process mechanisms, post-process mechanisms may be used with any ML algorithm. However, due to the relatively late stage in the learning process in which they are applied, post-process mechanisms typically obtain inferior results [31]. Another danger of post-process mechanisms is that they may treat entirely differently two individuals who are very similar across all attributes except for the group to which they belong. In-process mechanisms are beneficial since they can explicitly impose the required tradeoff between accuracy and fairness in the objective function. However, such mechanisms are tightly coupled with the ML algorithm.

Note that the works mentioned previously focused on a centralized setting, in which all information is held by one party, and therefore did not have to deal with privacy considerations.

2.2 Private Collaborative Machine Learning

The Setting

We consider a setting in which several parties wish to collaboratively train ML models on data that is distributed among them while preventing the disclosure of private information. We refer to this setting as PCML. While many methods were proposed in the literature to address this same (or very similar) setting, they are often associated with different (although overlapping) research domains including privacy-preserving data mining [3, 4], privacy-preserving ML [5], collaborative ML [6, 7], and federated ML [8].

The manner in which data is distributed among the collaborating parties is typically categorized as either vertical, horizontal, or mixed. In a vertical distribution, each party holds all records with a different subset of attributes, whereas in a horizontal distribution each party holds a subset of the records with all attributes [11]. The mixed scenario refers to an arbitrary partition of the data among the collaborating parties.

When analyzing the privacy preservation of the protocol, it is common to distinguish between two types of adversaries: semi-honest and malicious. A semi-honest adversary is a party that follows the protocol correctly but tries to infer sensitive information of other parties from intermediate messages that are received during the protocol. A malicious adversary, however, may deviate from the prescribed protocol in an attempt to learn sensitive information on others. Protocols in the semi-honest model offer weaker security than protocols in the malicious model, as they prevent inadvertent leakage of information between the collaborating parties, and are thus useful for application scenarios in which those are the realistic threats [43]. Protocols in the semi-honest model are viewed as a first step toward achieving higher levels of security. However, achieving security against malicious parties entails significantly higher costs than achieving security against semi-honest parties. Hence, most of the protocols in the malicious model are protocols that are designed for basic functionalities (e.g., Oblivious Polynomial Evaluation or Private Set Intersection), whereas most studies on applied privacy-preserving data mining or ML (e.g., [44, 45, 46, 47, 48, 49, 50]) adopt the semi-honest model.

The two main approaches for achieving privacy in the aforementioned setting are perturbation and secure MPC, as we proceed to describe.

Perturbation

The idea underlying the perturbation approach (e.g., [4]) is to modify the values of attributes randomly so that privacy will be maintained, whereas the results of the desired query or analysis on the perturbed data will still be close to the original ones. This technique is at the base of the paradigm called differential privacy [9]. An algorithm is considered differentially private if the addition or removal of a single record from the dataset does not significantly affect the output of the algorithm. Typically, differential privacy is obtained by incorporating a Laplacian noise to the results of the data analysis [11].

Application examples of PCML methods that are based on perturbation include classification [5, 51, 52, 53, 54, 55], collaborative filtering [56, 57], stochastic gradient descent [58, 59], and deep learning [6, 60, 61, 62].

It is important to note that methods that are based on incorporating noise and randomness to the data may perform poorer compared to their non-private counterparts. This problem becomes even more acute in distributed collaborative settings, since in such settings, each party must add noise to the data it holds independently. Such distributed incorporation of noise may result in adding excessive amounts of noise during the training process, and that may yield a considerably inferior model [5, 6].

Secure Multi-Party Computation

In the general setting of secure MPC [10], L mutually distrustful parties, \(P_1, \ldots ,P_L\), that hold private inputs, \(X_1, \ldots , X_L\), wish to compute some joint function on their inputs (i.e., \(f(X_1, \ldots , X_L)\)). Ideally, no party should gain during the computation process any information on other parties’ inputs, beyond what can be inferred from its own input and the joint function output.

Theoretical results show that such perfect privacy is achievable for any problem of MPC by invoking generic solutions such as the garbled circuit construction of Yao [10]. In that approach, one represents the function f as a circuit, either a Boolean one (consisting of XOR and AND gates) or an arithmetic one (consisting of addition and multiplication gates). The input wires to the circuit bring in the inputs that the parties hold, whereas the output wires convey the desired output of the computation. Subsequently, the parties run a protocol that emulates the circuit’s operation in a secure manner. The circuit-based approach is general and can be used to compute any function on the distributed inputs.

However, as the computational costs are proportional to the size of the circuit, such generic solutions are practical only to rather simple functions. When dealing with more involved functions, such as the ones encountered in ML, such generic solutions become impractical, and more specialized solutions, which are tailored to the function of interest, should be developed. For the sake of higher efficiency, such specialized solutions typically relax the notion of perfect privacy, but they do so in a manner that the leaked information is deemed benign.

Application examples of PCML methods that are based on MPC include clustering [63, 64, 65], association rule mining [46, 66, 67, 68], classification [3, 5, 6, 69, 70, 71], and collaborative filtering [49, 72, 73].

2.3 Privacy and Fairness

While many recent studies have investigated algorithms for PCML, the fairness of such algorithms has been overlooked. There were, however, several recent studies that incorporated both privacy and fairness considerations to ML algorithms in non-collaborative settings. Most of these studies have investigated the case of a centralized setting, in which all of the dataset is held by a single party [74, 75, 76, 77, 78, 79]. The goal of those studies was to train an ML model over the centralized dataset, making sure that the released model and its future outputs are fair, as well as private, in the sense that one cannot infer from them (meaningful) information about individual data records of the dataset.

A few other studies have investigated a distributed setting that is non-collaborative in the following sense. Their distributed setting includes a “main” party that wishes to train a fair ML model over the data it holds, and a third party to which the sensitive attributes are outsourced. The motivation behind this setting is that while the sensitive attributes should not be exposed to the main party, they should still be used in the training process of the ML model (in a privacy-preserving manner) to ensure that the resulting model is fair. To obtain this goal, these studies used either random projections [80, 81] or MPC techniques [82].

There are a few other studies that combine fairness with federated learning. These studies dealt with in-process fairness mechanisms in horizontal federated learning with a single server, in which additional fairness constraints are introduced into ML training optimization procedures [83, 84, 85, 86, 87, 88]. These methods are limited in their applicability to only certain ML algorithms, mainly those with non-convex loss functions [83, 84]. Moreover, some of these methods can leak private information due to the need to share excessive information with the server that coordinates the computation [85, 86, 87]. An exception in this line of work is the study of Liu et al. [89], which focuses on vertical federated learning. They propose a method to train a fair learning model in the vertical setting in a manner that balances between fairness, accuracy, and privacy.

In this article, we propose a collaborative private pre-process fairness-enhancing mechanism based on MPC that does not require any third party or server. As a pre-process mechanism, our method is not limited to a specific algorithm, and therefore it can be used with any collaborative ML algorithm.

A Concluding Remark. The term fairness is used in the literature also in another meaning, which inherently differs from algorithmic fairness that we consider herein. In algorithmic fairness, the goal is to mitigate unintentional biases in the computed ML model for the sake of future individuals on which the learned model will be applied. In contrast, the other meaning of fairness is concerned with the interests of the data owners who contribute data to the collaborative learning process and what they expect to get from that collaboration. In cryptography, an MPC protocol is considered fair if it ensures that either all parties receive their designated outputs, or none of them does [11]. In federated learning, it relates to what the data owners, who contribute data of varying quantity and quality and have different computational and communication resources, expect to receive in return from the data federation that they join [90].

3 Problem Statement AND Preliminaries

Let W be a population consisting of n individuals, and let A be a set of attributes or features that relate to each of those individuals. We considers tabular datasets D that describe the individuals in W in the following manner: each row in D relates to a single individual in W, whereas each column in D relates to a single attribute in A. The \((i,j)\)-entry in D equals the value of the jth attribute/feature of the ith individual.

We are interested in the case where D is distributed horizontally among L parties, \(P_\ell\), \(\ell \in [L]:=\lbrace 1, \ldots , L\rbrace\). Namely, party \(P_\ell\) holds a subset of the rows in D, which we denote by \(D_\ell\), and the subsets are disjoint (in the sense that each individual record appears exactly in one of those partial datasets). They wish to engage in a fair collaborative ML classification process over the entire dataset \(D=⊍_{\ell \in [L]} D_\ell\)¹ without disclosing to other parties information on the confidential information (\(D_\ell\), \(\ell \in [L]\)) that they hold, beyond what can be naturally deduced from the computed classifier.

In this study, we focus on pre-process mechanisms for enhancing fairness in such a distributed setting (as opposed to in-process or post-process, as described in Section 2.1). We consider two adversarial models: one in which the parties are semi-honest, and another in which the parties are malicious (see Section 2.2).

We proceed to introduce the basic notations that will be used throughout the article. We distinguish between three types of attributes (columns):

—

S represents a sensitive attribute (e.g., race or gender). In this work, we focus on a binary attribute S that attains one of two possible values, \(S \in \lbrace U,V\rbrace\), where U means unprivileged and V means privileged. (By abuse of notation, we are using S to denote the attribute, as well as its values in different rows of the dataset.)

—

X stands for the collection of all non-sensitive attributes. To simplify our discussion, we shall assume that X consists of a single attribute. When X consists of several non-sensitive attributes, we will apply the same pre-process mechanism that we describe in the following on each such attribute, independently. We shall also assume hereinafter that X is a numerical attribute.

—

Y is the binary class that needs to be predicted (e.g., “hire/no hire”).

The set of rows, or W, is split in two different manners. The first split is as induced by the sensitive attribute S. Namely, \(W = W^U ⊍ W^V\), where \(W^U\) is the subset of all individuals in W that are unprivileged (i.e., \(S=U\) for them) and \(W^V = W \setminus W^U\) is the complementary set of privileged individuals. For each \(S \in \lbrace U,V\rbrace ,\) we let \(n^S:=|W^S|\).

The other manner in which W is split is according to the distribution of the records of D among the L parties. Namely, \(W = ⊍_{\ell \in [L]} W_\ell\), where \(W_\ell\) is the subset of individuals whose information is held by the party \(P_\ell\), \(\ell \in [L]\). For each \(\ell \in [L],\) we let \(n_\ell :=|W_\ell |\).

Finally, we let \(n_\ell ^S\) denote the size of \(W_\ell ^S:=W^S \bigcap W_\ell\), namely the number of individuals in \(W^S\) whose records are held by \(P_\ell\), \(S \in \lbrace U,V\rbrace\), \(\ell \in [L]\). Then

\begin{equation} n = \sum _{S \in \lbrace U,V\rbrace } n^S = \sum _{\ell \in [L]} n_\ell = \sum _{ S \in \lbrace U,V\rbrace } \sum _{\ell \in [L]} n^S_\ell \,. \end{equation}

(1)

We shall adopt these superscript and subscript conventions hereinafter. Namely, a superscript S will denote a restriction to the subset \(W^S\), \(S \in \lbrace U,V\rbrace\); a subscript \(\ell\) will denote a restriction to the subset \(W_\ell\), \(\ell \in [L]\); a combination of the two will denote a restriction to \(W_\ell ^S\); and no superscript S or subscript \(\ell\) means that we relate to the entire population W.

In addition, we let \(D(X)\) denote the collection of all values appearing in the X-column of D. Similarly, \(D^S(X)\), \(D_\ell (X)\) and \(D_\ell ^S(X)\) denote the collection of all values appearing in the X-column of D, restricted to the rows in \(W^S\), \(W_\ell\), or \(W_\ell ^S\), respectively. We assume that \(D^S_\ell (X)\), for any S and \(\ell\), are multisets; namely, they may contain repeated values.

4 The Proposed Method

In this section, we propose a private pre-process fairness-enhancing mechanism based on MPC for the problem described in Section 3. In Section 4.1, we introduce our pre-process fairness-enhancing mechanism. We do that for a centralized setting, in which all information is held by one party (i.e., \(L=1\)); in such a setting, privacy is irrelevant. Then, in Section 4.2, we show how to implement that fairness mechanism in the (horizontally) distributed setting, \(L\gt 1\), in a manner that offers privacy to the interacting parties \(P_\ell\), \(\ell \in [L]\). Our protocol invokes a general-purpose MPC sub-protocol for computing the kth-ranked element in distributed databases which we describe in Section 4.3. In Section 4.4, we discuss the privacy guarantees of our fairness-enhancing mechanism, and we conclude with a communication and computational cost analysis in Section 4.5.

4.1 A Pre-Process Fairness-Enhancing Mechanism

Inspired by Feldman et al. [16], we devise a methodology that improves fairness through decreasing distances between the distributions of attributes of the privileged and unprivileged groups, in a pre-process stage. The goal in performing such a repair is to reduce the dependency of the ML model on the group to which an individual belongs, even when that dependency is not a direct one but rather an indirect one through proxy variables. In other words, we obfuscate the ability to differentiate between groups using the presumably legitimate non-sensitive variables. In contrast to Feldman et al. [16], our method is designed specifically to be suitable for privacy-preserving enhancements, as we do in Section 4.2.

To illustrate the intuition behind the proposed method, consider the example depicted in Figure 1. The figure illustrates a case in which SAT scores of individuals are used to predict their success (or failure) in a given job. The plot on the left shows the distribution of SAT scores, within the two sub-populations in the dataset: the privileged group and the unprivileged group. The plot on the right shows the job success rate as a function of the SAT score, within each of those two sub-populations.

Fig. 1.

In this example, SAT scores may be used to predict the success (or failure) of candidates in a given job, since the higher the SAT score is, the higher is the probability for success. However, relying solely on the SAT scores, while ignoring the group to which the candidates belong, may create an undesired bias. More specifically, as can be seen from the figure, unprivileged candidates with SAT scores of approximately 1100 perform just as well as privileged candidates with SAT scores of 1200 (e.g., since they may have encountered harder challenges along their way for achieving their scores). Therefore, if SAT scores were used for hiring, such as by just placing a threshold, unprivileged candidates with high potential would be excluded, whereas lower potential candidates from the privileged group would be hired instead. The goal of the pre-process mechanism that we suggest herein is to rectify this bias.

For each group \(S \in \lbrace U,V\rbrace\), we first partition the multiset of values in \(D^S(X)\) into (nearly) equal-sized bins of sequential values. Namely, if we let B denote the number of bins, then we partition \(D^S(X)\) into \(D^S(X) = ⊍_{i=1}^B b_{(i)}^S\), where (i) for all \(i \in [B-1]\), \(y \in b^S_{(i)}\), and \(y^{\prime } \in b^S_{(i+1)}\), we have \(y \le y^{\prime }\)², and (ii) the bins are of (nearly) equal sizes, in the sense that the sizes of the bins (viewed as multisets) are either \(\lfloor n^S/B \rfloor\) or \(\lceil n^S/B \rceil\). These two conditions simply state that the resulting bins are the B-quantiles of \(D^S(X)\). In the following, we provide the precise manner in which those bins/quantiles are computed.

After completing the binning process, we proceed to compute the minimal values in each of the B bins, for each group \(S \in \lbrace U,V\rbrace\),

\begin{equation} m^S_{(i)} := \min \lbrace b^S_{(i)} \rbrace \,, \quad S \in \lbrace U,V\rbrace \,, ~~ i \in [B]\,, \end{equation}

(2)

and also the maximal value in the last bin,

\begin{equation} m^S_{(B+1)} := \max \lbrace b^S_{(B)} \rbrace \,, \quad S \in \lbrace U,V\rbrace \,. \end{equation}

(3)

Those values enable us to perform a repair of \(D^V(X)\), as follows: for every bin \(b^V_{(i)}\) in \(D^V(X)\), \(i \in [B]\), we shift all values in it “toward” the values in the corresponding bin in \(D^U(X)\), according to the repair rule described in Equation (4):

\begin{equation} \bar{x}=\left(1-\lambda \right)\cdot x+\lambda \cdot \left(m_{(i)}^U+\left(\frac{x-m_{(i)}^V}{m_{(i+1)}^V-m_{(i)}^V}\right)\cdot \left(m_{(i+1)}^U-m_{(i)}^U \right) \right), ~~~ i \in [B], x\in b_{(i)}^V\,. \end{equation}

(4)

Here, x is an original value extracted from the bin \(b_{(i)}^V\), whereas \(\bar{x}\) is its repaired value. This repair procedure represents a linear mapping that performs min-max scaling of the values in each bin of the privileged group to the range of values in the corresponding bin in the unprivileged group. The computation brings the distributions of both groups closer.

The repair tuning parameter \(\lambda \in [0,1]\) controls the strength of the repair. If \(\lambda =0,\) then we perform no repair, since then \(\bar{x}=x\). If \(\lambda =1,\) then we get a full repair, since then all values in \(b_{(i)}^V\) are replaced with values in the range of the corresponding bin \(b_{(i)}^U\), while keeping a similar distribution as the original ones in \(b_{(i)}^V\). Note that if \(b_{(i)}^V\) elements are all the same, then both the numerator and denominator in the fraction on the right-hand side of Equation (4) are zero. In such a case, we interpret that fraction as \(1/2\). Such a setting implies that all the (equal) values in \(b_{(i)}^V\) will be mapped to the same value in the middle of the range of the corresponding bin \(b_{(i)}^U\), namely \((m_{(i+1)}^U+m_{(i)}^U)/2\).

At the completion of the repair pre-process procedure, an ML model is trained with the repaired dataset. The steps of the method described previously are illustrated in Figure 2.

Fig. 2.

To illustrate the effects of the two main parameters that our mechanism uses (i.e., \(\lambda\) and B), recall the example presented in Figure 1. Figure 3 shows the effect of varying \(\lambda\) values. The higher the value of \(\lambda\) is (with a fixed number of bins, \(B=3\)), the closer the distributions are.

Fig. 3.

Figure 4 shows the effect of varying B values. The figure shows that a higher value of B (with a fixed value of \(\lambda =0.9\)) yields closer distributions.

Fig. 4.

In both cases, making the distributions closer means that the two groups will be treated more similarly by the ML model, hence making it harder for the model to distinguish between the two groups. In other words, when using higher values of \(\lambda\) and B, the model will be less likely to make decisions that depend on the group through other presumably legitimate non-sensitive attributes.

To conclude this section, we proceed to provide the formal details of the binning scheme described previously. Let us first order all values in \(D^S(X)\), the size of which is \(n^S\), in a non-decreasing manner, as follows:

\begin{equation} D^S(X)= (x_1, \ldots , x_{n^S})\,, \quad \mbox{where}~~ x_1 \le x_2 \le \cdots \le x_{n^S}\,. \end{equation}

(5)

Next, we define a set of indices in the ordered multiset \(D^S(X)\) in the following manner:

\begin{equation} K^S :=\lbrace k_{(0)}:=0\lt k_{(1)} \lt \cdots \lt k_{(B-1)} \lt k_{(B)}:=n^S\rbrace . \end{equation}

(6)

Assume next that \(n^S=q^S\cdot B+r^S\), where \(r^S \in [0,B)\) (i.e., \(q^S\) and \(r^S\) are the quotient and remainder, respectively, when dividing \(n^S\) by B). Then the sequence of indices \(k_{(i)}\), \(i \in [B]\), is defined by the following equation:

\begin{equation} k_{(0)} = 0 \,; \quad k_{(i)}=k_{(i-1)}+\Delta \,, ~~ \mbox{where} ~~\Delta =\left\lbrace \begin{matrix} q^S+1 & \mbox{if} & i\le r^S \\ q^S & \mbox{if} & i \gt r^S \end{matrix}\right. \,, ~~~i \in [B]\,. \end{equation}

(7)

It is easy to verify that \(k_{(B)}=n^S\). Finally, the bins in \(D^S(X)\), Equation (5), are defined by \(K^S\) as follows:

\begin{equation} b^S_{(i)}:=\lbrace x_{j}: k_{(i-1)}+1 \le j \le k_{(i)}\rbrace \,, \quad i \in [B]\,. \end{equation}

(8)

We see, in view of Equation (8), that the size of the first \(r^S\) bins is \(q^S+1\), whereas all other bins are of size \(q^S\). Moreover, \(m^S_{(i)} = \min \lbrace b^S_{(i)} \rbrace\) (Equation (2)) equals the \((k_{(i-1)}+1)\)-th ranked element in \(D^S(X)\), for all \(i \in [B]\) and \(S\in \lbrace U,V\rbrace\), whereas \(m^S_{(B+1)} := \max \lbrace b^S_{(B)} \rbrace\) (Equation (3)) equals the maximal element in \(D^S(X)\), \(S\in \lbrace U,V\rbrace\).

4.2 Secure Multi-Party Algorithm for Enhancing Fairness

In Section 4.1, we described our fairness-enhancing mechanism in a centralized setting (i.e., when all of D is held by a single party). We now revisit that algorithm and devise a secure implementation of it, given that the dataset D is horizontally distributed among L parties, as described in Section 3.

To that end, let us go back to Figure 2. The step that poses a challenge when privacy is of concern is the first one: dividing the non-sensitive attribute into equal-sized bins for each group \(S \in \lbrace U,V\rbrace\) (i.e., \(D^S(X) = ⊍_{i=1}^B b_{(i)}^S\)) and then compute the boundaries of those bins, \(m^S_{(i)} := \min \lbrace b^S_{(i)} \rbrace\) and \(m^S_{(B+1)} := \max \lbrace b^S_{(B)}\), for \(S \in \lbrace U,V\rbrace\) and \(i \in [B]\) (see Equations (2) and (4)). Performing those computations poses a challenge in the distributed setting, since they depend on data that is distributed among the L parties and cannot be shared due to privacy concerns.

The second step in Figure 2 poses no problem since it can be carried out by each party locally, independently of others, once the bin boundaries \(m^S_{(i)}\), \(i \in [B+1]\), \(S \in \lbrace U,V\rbrace\), are computed. Even though in that step the parties do not collaborate, they must agree upfront on \(\lambda\) (the repair tuning parameter) (see Equation (4)). Note that at this step, every \(P_\ell\), \(\ell \in [L]\), repairs the values of \(D^V_\ell (X)\) that it possesses, but it does not share the repaired values with anyone else.

As for the last step in Figure 2, there the parties need to learn a classifier on distributed data, without sharing that data. Since this is, again, a computational problem that involves all of the distributed dataset, privacy issues kick in. However, for such problems of privacy-preserving distributed ML classification, there are existing MPC-based solutions (e.g., [3, 69, 70, 71, 91]).

Protocol 1 provides a pseudo-code that summarizes our mechanism for fairness-driven collaborative ML. It begins with the L parties computing in a secure manner the sizes of the two distributed datasets, \(D^U\) and \(D^V\) (Line 1). They do so without revealing information on the sizes of the partial datasets that each party holds; we explain how this computation is carried out in Section 4.3. Then, they agree on the two parameters that control the repair procedure: B and \(\delta\) (Line 2). Afterward, they repair the non-sensitive attributes in each of the two groups (Lines 3–8). The main challenge here is to compute the bin boundaries (Line 5). That computation relies on an MPC sub-protocol for computing the kth-ranked element in a distributed dataset. Such a sub-protocol is described in Section 4.3. Finally, the parties perform the desired distributed ML computation on the repaired version of the unified dataset (Line 9).

Therefore, we focus in the next section on the problem of privacy-preserving binning of a distributed dataset. That computation consists of two MPC protocols: a simple one for computing the number of entries in a distributed dataset (as done in Line 1 of Protocol 1) and a more involved one for computing the kth-ranked element in such a dataset (as done \((B+1)\) times in Line 5 of Protocol 1). In our discussion of those two MPC sub-protocols, we omit the superscript that denotes the group, since the computation is carried out in the same manner for each the two groups.

We conclude by noting that as cryptographic protocols usually operate over finite fields, it is essential to convert all real data into integers before subjecting them to such computations. To that end, we apply the following simple procedure to convert real values into integers (with some precision loss), prior to executing the protocol. If we are interested in preserving a precision of d digits after the decimal point, we multiply each real value by \(10^d\) and round the resulting value to the nearest integer. After executing our mechanism, we divide the values by \(10^d\).

4.3 A Protocol for Computing the kth-Ranked Element in Distributed Datasets

4.3.1 Overview.

In our setting, there are L parties, \(P_\ell\), \(\ell \in [L]\), each one holding a private dataset \(D_\ell\) that consists of \(n_\ell\) scalars. Let \(n = \sum _{\ell \in [L]} n_\ell\), and let \(k \in [n]\). The parties wish to compute in a secure manner the kth-ranked element in \(D = \bigcup _{\ell \in [L]} D_\ell\), namely the value \(x \in D\) such that exactly \(k-1\) elements in D are less than or equal to x while \(n-k\) elements in D are greater than or equal to x. Protocol 2 that is described in the following solves that problem. That is the protocol which is called \(B+1\) times in Line 5 of Protocol 1 to compute the kth-ranked elements that constitute the boundaries of the B bins.

The \(B+1\) relevant values of k are described in Section 4.1 (see the last paragraph there and Equation (7)). As we can see, those values of k depend on B, as well as on \(n = \sum _{\ell \in [L]} n_\ell\). We, however, assume that the sizes of the private datasets, \(n_\ell\), \(\ell \in [L],\) are kept secret. In the following, we clarify how the parties securely compute n, without leaking any further information on \(n_\ell\), \(\ell \in [L]\).

Protocol 2, which we describe in the following, is based on the protocol template that was offered by Aggarwal et al. [92]. That protocol template left some MPC components unspecified, and we suggest in Protocol 2 specific implementations of those components. Moreover, the protocol in the work of Aggarwal et al. [92] assumed that the sizes of the private datasets, \(n_\ell\), \(\ell \in [L]\), are public. Our Protocol 2 keeps those values private.

The main cryptographic shield on which Protocol 2 bases its security is secret sharing [93]. Specifically, each party \(P_\ell\), \(\ell \in [L]\), secret shares its private information (which are counts of records in its own private dataset \(D_\ell\)) among all parties using Shamir’s secret sharing scheme with the threshold set to

\begin{equation} t = \lfloor (L+1)/2 \rfloor \,. \end{equation}

(9)

All subsequent computations are carried out on those secret shares by invoking suitable MPC sub-protocols. With such a setting of the threshold, the protocol is secure under the assumption of an honest majority; namely, if some of the parties collude in attempt to recover the secret shared information, then their number is less than \(L/2\).

Protocol 2 is designed to deal with both semi-honest and malicious parties (see Section 2.2 for the definition of those notions). Specifically, Protocol 2 in its entirety solves the problem in a secure manner for the malicious case, whereas if we remove from it Lines 3, 10, 15, and 18 that are marked by [Mal.], the protocol solves the problem securely in the semi-honest case.

4.3.2 In Preparation for Running the Protocol.

Before running Protocol 2, each party \(P_\ell\) secret shares \(n_\ell\) among all parties using t-out-of-L Shamir’s threshold secret sharing scheme, with t as in Equation (9). Then, each of the parties adds up all shares that it got in \(n_\ell\) for all \(\ell \in [L]\). Owing to the linearity of secret sharing, the parties obtain that way a t-out-of-L secret sharing of \(n = \sum _{\ell \in [L]} n_\ell\). Now, they can publicly use any t of those shares to reconstruct n. Hence, \(n_\ell\), \(\ell \in [L]\), are kept secret, but n is computed and made public. That is the computation that the parties perform in Line 1 of Protocol 1.

In addition, the parties agree on the number of bins B. Once n and B are known, the sequence of \(B+1\) ranks k that determine the bin boundaries can be computed. (See Line 2 of Protocol 1.)

Finally, the parties jointly determine a priori lower and upper bounds, \(\alpha\) and \(\beta\), on all elements of D. Now all is ready for the execution of Protocol 2 (which is executed in Line 5 of Protocol 1).

4.3.3 Semi-Honest Parties.

We begin by describing the protocol for the semi-honest case; namely, we ignore for now the lines marked by [Mal.].

The protocol computes the kth-ranked element by implementing a privacy-preserving binary search. The lower and upper bounds of the search interval, denoted a and b, are initialized in Line 1. A Boolean flag called found is set to false in Line 2; it will be set to true once the kth-ranked element is found and then the search will stop.

The main loop takes place in Lines 4 through 20. First, m is publicly set by all parties to the middle of the current search interval (Line 5). Then, each party counts how many elements in its own private dataset are smaller than m and how many are larger than m; those two private counters are denoted \(x_\ell\) and \(y_\ell\), respectively (Line 7). Subsequently, each party \(P_\ell\) performs secret sharing of those two private counters among all parties (Line 8); as stated earlier, the secret sharing scheme is Shamir’s t-out-of-L scheme with t as in Equation (9).

Then (Line 11), the parties engage in an MPC sub-protocol to verify two inequalities. We defer to a later stage the details of that sub-protocol. If both inequalities are verified then the current value of m is the sought-after kth-ranked element. Therefore, we set the value of the Boolean flag found to true so that the repeat loop will stop and issue the current value of m as the output.

Otherwise, the parties securely verify the inequality in Line 13. If it is verified then that the overall number of elements in the unified dataset D that are smaller than m is at least k. In that case, we can safely deduce that the value of the kth-ranked element is smaller than m. Hence, we modify the value of the upper bound on the kth-ranked element (Line 14) and stay in the repeat loop for another iteration. Similarly, if the inequality in Line 16 is verified, then the parties update the lower bound in the binary search (Line 17). The search thus proceeds until convergence.

It is easy to see that if all parties are semi-honest, the search will converge to the correct value of the kth-ranked element. Furthermore, given that the MPC sub-protocol that is invoked in Lines 11, 13, and 16 is perfectly secure (namely, that it only outputs a single bit that indicates whether the inequality holds or not, and nothing beyond that), the parties only learn the sequence of updated lower and upper bounds, a and b. However, such information is implied by the computed output and the a priori bounds \(\alpha\) and \(\beta\). Hence, the protocol does not leak any information to the interacting parties apart from the computed output.

4.3.4 Malicious Parties.

We now turn our attention to the enhancements in Protocol 2 for the sake of addressing potential malicious conduct. A malicious party \(P_\ell\), \(\ell \in [L]\), could sabotage the search by submitting inconsistent values of \(x_\ell\) or \(y_\ell\). Such a sabotage could prevent the convergence of the protocol, or divert it to a wrong output. The added lines in the protocol offer a layer of protection against such malicious conduct. Those enhancements to the protocol introduce two new counters that are secret shared among all parties: \(X_\ell\), which equals, in every stage of the protocol’s run, the number of elements in \(D_\ell\) that are smaller than a, and \(Y_\ell\), which equals the number of elements in \(D_\ell\) that are greater than b, \(\ell \in [L]\).

Those two counters are initialized to zero (Line 3) since initially a and b are lower and upper bounds on all elements in \(D_\ell\) for all \(\ell \in [L]\). When the search interval is shrunk to its lower half (Lines 13–15), only \(Y_\ell\) has to be updated. Its new value is set to \(n_\ell - x_\ell\), since \(x_\ell\) equals the number of elements in \(D_\ell\) that are less than or equal to the new upper bound b. Similarly, if the search interval is shrunk to its upper half (Lines 16–18), \(X_\ell\) is updated to equal \(n_\ell - y_\ell\).

The three verifications that are carried out in Line 10 check that all information that was provided by \(P_\ell\) is consistent. Indeed, \(x_\ell +y_\ell\) is supposed to be the number of elements in \(D_\ell\) that are strictly smaller or larger than m, and hence it should be at most \(n_\ell\), which is the size of \(D_\ell\) that \(P_\ell\) had reported in preparation for running this protocol through the secret shares that it distributed (see Section 4.3.2). Since \(a \le m,\) then \(\lbrace u \in D_\ell : u\lt a\rbrace \subseteq \lbrace u \in D_\ell : u\lt m\rbrace\), and therefore we should have \(x_\ell \ge X_\ell\). Same arguments apply for the third inequality that is verified in Line 10, namely \(y_\ell \ge Y_\ell\). It can be shown that the verification of those three inequalities suffices. Namely, if the protocol ends successfully and outputs some value m, then the values provided by \(P_\ell\), for any \(\ell \in [L]\), during the protocol’s run are consistent with some dataset \(D_\ell\) of size \(n_\ell\) so that m is the kth-ranked element in \(D = \bigcup _{\ell \in [L]} D_\ell\) (see [92, Theorem 5]).

4.3.5 An MPC Sub-Protocol for Verifying Inequalities.

Each of the four inequalities in Lines 11, 13, and 16 is equivalent to an inequality of the form \(v\gt 0\), where v is an integer in which the parties hold t-out-of-L secret shares. Furthermore, in all of those inequalities, v is in the range \([-n,n]\). Let us assume that the field \({\mathbb {Z}}_p\) that underlies the secret sharing scheme is sufficiently large, and in particular that \(p\gt 2n\).

Lemma 4.1.

Let \(n\gt 0\) and v be two integers so that \(v \in [-n,n]\), and let \(p\gt 2n\) be a prime. Then \(v\gt 0\) if and only if the LSB of \((-2v \mod {p})\) is 1.

Proof.

Assume that \(v\gt 0\), namely that \(v \in (0,n]\). Hence, \(-2v \in [-2n,0)\). Therefore, as \(2n\lt p\), \((-2v \mod {p}) = -2v+p\). As that number is odd, its LSB is 1. If, however, \(v \le 0\), then \(v \in [-n,0]\). Hence, \(-2v \in [0,2n] \subset [0,p-1]\). Therefore, \((-2v \mod {p}) = -2v\). As that number is even, its LSB is 0. □

Hence, to check that \(v\gt 0\), the parties need to check that the LSB of \((-2v \mod {p})\) is 1. To that end, Nishide and Ohta [94] suggested an MPC protocol for computing the LSB of a secret shared value in a secure manner.

To illustrate the preceding course of action, let us consider the inequality in Line 11, \(\sum _{\ell \in [L]} x_\ell \le k-1\). It is equivalent to \(v\gt 0,\) where \(v = k-\sum _{\ell \in [L]} x_\ell\). As v is a difference between two integers in the range \([0,n],\) we infer that \(v \in [-n,n]\). Since each party \(P_j\) holds a share \(x_\ell [j]\) in \(x_\ell\), for all \(j,\ell \in [L]\), then \(P_j\) can compute \(v[j] = k- \sum _{\ell \in [L]} x_\ell [j]\) and that would be its share in v. Therefore, \(-2v[j]\) would be \(P_j\)’s share in \(-2v\), \(j \in [L]\) (all computations are done in \({\mathbb {Z}}_p\)). Those will be the shares that the parties will input to the MPC sub-protocol for computing the LSB of \(-2v\), which will indicate whether \(v\gt 0\).

However, in the malicious case, we cannot use the preceding solution since it relies on the assumption that all secret shared values are smaller than or equal to n. Indeed, malicious parties may distribute in Line 8 secret shares in values \(x_\ell\) and \(y_\ell\), where \(x_\ell\), \(y_\ell\), or \(x_\ell +y_\ell\) are greater than n, and then such malicious conduct will sabotage the preceding suggested method for inequality verification. Hence, in the malicious case, we suggest using another method for verifying inequalities that was suggested in the work of Nishide and Ohta [94]. If u and v are two elements in \({\mathbb {Z}}_p\) that are secret shared among \(P_1,\ldots ,P_L\) using t-out-of-L Shamir’s threshold scheme, then that method enables the parties to verify the inequality \(u\lt v\) (when u and v are viewed as non-negative integers) without learning any further information on those two numbers. That method does not rely on the assumption that \(u,v \in [0,n]\). However, it involves three invocations of the LSB sub-protocol (instead of just one, as described earlier), hence it is more costly. We omit further details of that computation; interested readers are referred to the work of Nishide and Ohta [94].

4.4 Privacy Analysis

Our mechanism, which is summarized in Protocol 1, begins with each party \(P_\ell\), \(\ell \in [L]\), holding its share of the dataset, \(D_\ell\), and ends with \(P_\ell\) holding the repaired version of \(D_\ell\). The information that needs to be protected is the original set of records \(D_\ell\).

The only places during Protocol 1 in which the parties exchange messages relating to their private datasets is in Lines 1 and 5. In Line 1, the parties perform a simple t-out-of-L secret sharing of their private counters \(n_\ell\) (see Section 4.3.2). As the value of the threshold t is set in Equation (9) to be greater than or equal to \(L/2\), such secret sharing offers perfect security under the assumption of honest majority.

As for Protocol 2 that is invoked in Line 5 of Protocol 1, both variants of that protocol, for the semi-honest and for the malicious case, are perfectly secure, as argued in Sections 4.3.3 and 4.3.4. Namely, the parties are unable to extract from their views during the execution of that protocol anything beyond the computed output and what can be inferred from that output and their own input on the private data of other parties. Hence, the only leakage of information in Protocol 1 is the \(B+1\) bin boundaries for each of the two groups, \(S \in \lbrace U,V\rbrace\).

The bin boundaries allow the parties to infer some excessive information on the private data of their peers. To illustrate the type of such inferences, let us consider one party \(P_\ell\) and bin number \(i \in [B]\), in which all values are within the interval \(I:=[m_{(i)},m_{(i+1)}]\). The size of the bin is also publicly known, and it equals \(\lfloor n/B \rfloor\) or \(\lceil n/B \rceil\). Hence, since \(P_\ell\) knows how many entries in its own dataset are within the interval I, it may infer the number of elements in \(D \setminus D_\ell\) in I. However, \(P_\ell\) cannot make any specific inferences about any of the other parties, as it cannot tell how those elements are spread between the other private datasets, nor can it link those records to specific individuals. Hence, such an information leakage is arguably benign.

We observe that the leaked information increases when B increases. When \(B=1\), only the minimal and maximal values in \(D(X)\) are exposed to the collaborating parties, whereas if \(B=n\), all of the values in \(D(X)\) are exposed (but still, their allocation to parties or individuals is kept secret). In Section 5, where we evaluate our method, we show that using a small number of bins (e.g., \(B=3\)) enhances fairness considerably, whereas the marginal contribution of higher values of B for fairness enhancement are insignificant. Hence, it appears that using a small number of bins, such as \(B=3\), is the preferred choice, as it achieves a considerable enhancement of fairness, whereas it entails very benign information leakages, as discussed earlier, and reduced communication and computational costs, as it reduces the number of invocations of Protocol 2.

4.5 Computational and Communication Costs

The parameters that determine the computational and communication costs of Protocol 1 are as follows:

—

\(L:\) The number of parties.

—

\(N_X:\) The number of non-sensitive attributes.

—

\(M:\) A uniform bound on the range of possible values in each of the non-sensitive attributes. (It is determined by the minimal and maximal values, \(\alpha\) and \(\beta\), in each of the non-sensitive attributes.)

—

\(p:\) The size of the underlying finite field in which all secret sharing takes place. Note that p must be greater than L as well as greater than \(n = |D|\). However, for the sake of performing the secure comparison sub-protocol, we must select p to be greater than \(2n\) (see Lemma 4.1).

—

\(B:\) The selected number of bins.

The main operations in Protocol 1 are as follows:

(1)

Two secure summation protocols to compute \(n^U\) and \(n^V\). Each such protocol involves adding L secret numbers in \({\mathbb {Z}}_p\) (Line 1).

(2)

\((B+1)N_X\) invocations of Protocol 2 (Line 5).

The secure summation protocol, for adding L secret numbers in \({\mathbb {Z}}_p\), involves the following costs per party: L evaluations of a polynomial of degree \(t-1\) over \({\mathbb {Z}}_p\) and communicating messages of \(O(L \log p)\) bits. Finally, it is necessary to perform a single Lagrange interpolation to recover the required sum. As those computations are done just once (in Line 1), their effect on the overall cost is negligible.

We therefore turn our attention to analyzing Protocol 2. The number of iterations in the repeat loop is bounded by \(\log M\), as it is a binary search over the range \([\alpha ,\beta ]\), the size of which is bounded by M. In each iteration, there are \(2L\) operations of secret sharing (Line 8) followed by two to four inequality verifications in the semi-honest case (Lines 11, 13, and 16) or five to seven inequality verifications in the malicious case (Lines 10, 11, 13, and 16).

Hence, the main bottleneck is in the inequality verifications. As explained in Section 4.3.5, such verifications can be done by performing a single LSB computation over shared values in the semi-honest case, or three such LSB computations in the malicious case. A single LSB computation of a shared value in \({\mathbb {Z}}_p\) involves performing \(93 \log p +1\) multiplications of two secret shared field elements [94]. Finally, a single multiplication of two secret shared values requires performing two secret sharings, each of which has a communication cost of \(O(L \log p)\) bits per party. The preceding analysis implies that the overall communication cost is bounded by \(O(BN_X L \log M \log ^2 p)\). (The computational cost is analyzed similarly, but we omit further details as it would require a detailed discussion of the LSB protocol and that is beyond the scope of the present work. Interested readers may refer to the work of Nishide and Ohta [94] for those details.)

4.6 A Concluding Remark

The fairness-enhancing mechanism of Feldman et al. [16] is in fact a special case of our mechanism, where the number of bins is set to \(B=\min \lbrace n^U,n^V\rbrace\). Namely, it equals the size of the smaller of the two groups: the unprivileged group, \(W^U\), and the privileged one, \(W^V\). Using binning this way totally exposes all records in the smaller group and reveals too much information on the larger group. Hence, such a course of action cannot be taken when privacy is of concern. Moreover, the computational and communication costs of the mechanism depend linearly on B, as the number of invocations of the costly MPC sub-protocol (Section 4.3) depends linearly on B (see Section 4.5). Finally, as we show in our experimental evaluation (see Section 5), using \(B=3\) bins is sufficient for mitigating unfairness; using larger numbers of bins barely contributes to the decrease in unfairness.

5 Evaluation

5.1 Experimental Setting

In this section, we present the experimental setting in terms of the datasets, the measures we used, and the parameters’ space, and we also provide several implementation details.

Datasets

We evaluated the proposed method using several real-world datasets: the publicly available ProPublica Recidivism dataset, the ProPublica Violent Recidivism dataset, and the Bank Marketing dataset.

ProPublica Recidivism Dataset.

This dataset includes data from the COMPAS risk assessment system (see [13, 95]). The dataset includes 10 attributes, such as the number of previous felonies, charge degree, age, race, and gender, and it has 6,167 individual records. The target attribute indicates whether an individual has recidivated (namely, was arrested again) after 2 years or not. The sensitive attribute in this dataset is race. All Caucasians constitute the privileged group, whereas all individuals of other races constitute the unprivileged group. We use the pre-processed files provided by Friedler et al. [96].

ProPublica Violent Recidivism Dataset.

This dataset (see [13]) is similar to the ProPublica Recidivism dataset mentioned previously. It has the same set of 10 attributes, and the sensitive attribute is race. However, this dataset contains 4,010 individual records, and the target attribute indicates the recidivism of a violent crime within 2 years. Here as well, we use the pre-processed files provided by Friedler et al. [96].

The Bank Marketing Dataset.

This dataset contains information on subscriptions to term deposits in a Portuguese banking institution (see [97]). It includes 41,188 individual records with 20 attributes. In our evaluation, we used the 10 attributes out of the 20 which are numeric. The target attribute indicates whether an individual has subscribed to a term deposit. The sensitive attribute in this case is the age. Individuals of ages younger than 25 years and older than 60 years are considered as the unprivileged group (as in the work of Zafar et al. [35]). This dataset is available at the UCI repository [98].

Measures

As mentioned, there is an inherent tradeoff between prediction performance and fairness. We wish to evaluate how this tradeoff is reflected in the application of our method. To that end, consider the confusion matrix shown in Table 1.

Table 1.

Accuracy.

Similarly to many studies on algorithmic fairness, we use accuracy as a measure of prediction performance. Accuracy is measured by the proportion of correct classifications (i.e., \((TN+TP)/(N+P)\)).

Fairness.

For measuring fairness, we consider a measure based on equalized odds. As mentioned in Section 2.1, one advantage of this measure (in contrast, e.g., to demographic parity) is that a perfectly accurate classifier will be considered fair. More specifically, we define the unfairness measure \(\bar{\varphi }:=|DFNR|+|DFPR|\), where \(|DFNR|\) (respectively, \(|DFPR|\)) is the absolute difference between the FNR (respectively FPR) of the two groups and FNR (respectively, FPR) as defined next:

\begin{equation*} FNR=\frac{FN}{P}=\frac{FN}{FN+TP}\,, \quad FPR=\frac{FP}{N}=\frac{FP}{FP+TN}\,. \end{equation*}

While the equalized odds measure dictates two separate measures, we use the sum of their absolute values to obtain a single combined measure. Such a combined measure will allow us an easier examination of the fairness-accuracy tradeoff at later stages, where higher values of \(\bar{\varphi }\) indicate lower levels of fairness (or higher levels of unfairness).

Note that in the two ProPublica datasets, the value of “recidivated” is considered as the “positive” value, whereas in the case of the Bank Marketing dataset, a “positive” value indicates that the corresponding individual has not subscribed to a term deposit service.

Distance.

To better understand the repairing mechanism of our method, we also measure the distances between the distributions of attributes within the two groups. We do so by computing the EMD (earth mover’s distance) [99], divided by M (the size of the range of possible values of the attribute (see Section 4.5)).

Parameters’ Space

For our experiments, we examined varying values of parameters, as follows:

—

Number of bins: \(B \in \lbrace 1,2,3,4,6,8,10\rbrace\).

—

Repair tuning parameter: \(\lambda \in \lbrace 0.1,0.2,\ldots ,0.9,1\rbrace\).

—

Number of parties: In the majority of our experiments, we applied a procedure based on three parties (\(L=3\)); only for the sake of measuring runtimes, we used higher numbers of parties (\(L \in \lbrace 3,4,5,6,7\rbrace\)). Records were distributed among the L parties randomly.³

The ML classifier in our experiments was Logistic Regression (for experiments with additional classifiers, we refer the reader to Appendix B). We used a constant ratio to split the dataset into train set (66.7%) and test set (33.3%); splits were repeated 10 different times in a random manner, and the reported results are the average over these 10 repetitions.

Implementation Details

Our method was implemented in Python assisted with the Virtual Ideal Functionality Framework (VIFF) library for secure MPCs (see [100]). As noted earlier, our method was applied separately on each of the non-sensitive attributes of the considered dataset, prior to the training of an ML model. For our purpose herein—to examine the effects of our pre-process method—we used a simple non-distributed non-private implementation of the ML algorithm.

All experiments were executed on a server running Windows Server 2008 R2, having two six-core CPU processors (12 virtual CPUs) with a clock speed of 1.9 GHz, and 128 GB of RAM.

5.2 Results

We first evaluated the effect of our proposed method on unfairness and accuracy. To do so, we executed the method using varying values of \(\lambda\) and B, as mentioned in the previous section, and measured the resulting unfairness and accuracy values. To better understand the repairing mechanism of our method, we also measured the resulting distance between the distributions of attributes within the two groups. Recall that accuracy was measured by the proportion of correct classifications, unfairness by \(\bar{\varphi }\), and distance by EMD.

Figure 5 shows the effect of the repair tuning parameter \(\lambda\) on the three considered measures for each of the three considered datasets. Each row of charts represents a different dataset: ProPublica Recidivism (top), ProPublica Violent Recidivism (center), and Bank Marketing (bottom), whereas each column of charts represents a different measure: distance (left), unfairness (center), and accuracy (right). In each chart, the X axis represents \(\lambda\), whereas the Y axis represents the value of the corresponding measure. Colors and line thickness represent the value of B, where thicker lines represent higher values of B. Note that the distances are calculated for each attribute separately and are then averaged over the set of attributes in each dataset. For all three measures, the reported results represent an average over 10 train-test splits.

Fig. 5.

As can be seen from the figure, by using the proposed method with higher values of \(\lambda\), it is possible to improve fairness considerably with only a minor compromise in accuracy. The considerable reduction in unfairness is a result of reducing the distances between the distributions of attributes within the two groups, as shown in the left column of charts in the figure. For example, in the case of the ProPublica Recidivism dataset (top row charts), unfairness (top-middle chart) is reduced from 0.29 (\(\lambda =0.0\)) to 0.08 (\(\lambda =1.0\))—a reduction by 72%; this is achieved with almost no compromise in accuracy (top-right chart) that is reduced by only 1%. Similar behavior can be seen with the other datasets.

Figure 6 presents a similar analysis to the one presented in Figure 5 to assess the effect of the number of bins B on the three considered measures. Here, the X axis of each chart represents the value of B, whereas the Y axis represents the value of the corresponding measure. Colors and line thickness represent the value of \(\lambda\), where thicker lines represent higher values of \(\lambda\).

Fig. 6.

As can be seen from the figure, when increasing B, unfairness is reduced with only a minor compromise in accuracy. However, increasing B beyond \(B=3\) has almost no effect on all three measures. In particular, it barely contributes to the decrease in unfairness. For example, in the case of the ProPublica Recidivism dataset (top row of charts), using 10 bins with \(\lambda \in \lbrace 0.9,1\rbrace\) obtains about the same results as with only 3 bins.

The latter analysis indicates that it is preferable to use a small number of bins, around \(B=3\). Further increasing the number of bins does not contribute toward enhancing fairness, but it does entail higher computational and communication costs, as well as increased leakage of information, as discussed in Sections 4.4 and 4.5.

We then turned to evaluate the efficiency of our method, as reflected by its runtimes and communication costs. Figure 7 shows the effect of B on the runtime of our method, when considering semi-honest parties and setting their number L to 3. Specifically, we report the runtime for the secure distributed computation of bin boundaries, for both of the groups. Again, each row of charts represents a dataset. In each chart, the X axis represents B, whereas the Y axis represents runtime in minutes. Line colors represent the attributes that were repaired in each dataset. In the legend of each chart, we indicate next to each such attribute its range of values. The figure shows that, in accord with the analysis in Section 4.5, the runtime depends linearly on B.

Fig. 7.

In the next experiment (Figure 8), we tested the effect of L, the number of parties, on the runtime of our method, in both cases of semi-honest (left column of charts) and malicious parties (right column of charts). In each chart, the X axis represents L, whereas the Y axis represents runtime in minutes. Line colors represent the number of bins. For clarity, we present the runtime for repairing one non-sensitive attribute in each of the datasets. For the ProPublica Recidivism and ProPublica Violent Recidivism datasets, we selected the attribute “prior count,” whereas for the Bank Marketing dataset, we selected the attribute “duration.” The figure shows that, as indicated by the analysis in Section 4.5, the runtime depends linearly on L.

Fig. 8.

Finally, we measured the communication costs and report them in Figure 9, which is structured similarly to Figure 8. Here, too, the measured communication costs show linear dependence on L, as analyzed in Section 4.5.

Fig. 9.

In summary, the runtimes and communication costs of the proposed method, as demonstrated in Figures 7 through 9 are practical, especially considering that the method is applied once as a pre-process procedure.

To conclude, in the preceding experiments, we showed that our method is able to improve fairness considerably with only a minor compromise in accuracy, despite their inherent tradeoff. We further showed that privacy is highly maintained and that the information leakage is minimal, considering the diminishing effect of the number of bins. Finally, we showed that the runtime of the proposed method is feasible for a one-time pre-process procedure.

6 Conclusion

In this article, we proposed a privacy-preserving pre-process mechanism for enhancing fairness of collaborative ML algorithms. In particular, our method improves fairness by decreasing distances between the distributions of attributes of the privileged and unprivileged groups. We use a binning approach that enables the implementation of privacy-preserving enhancements, by means of MPC. Being a pre-process mechanism, our method is not limited to a specific algorithm, and therefore it can be used with any collaborative ML algorithm.

An extensive evaluation conducted using three real-world datasets revealed that the proposed method is able to improve fairness considerably, with only a minor compromise in accuracy. We also showed that using a small number of bins (e.g., \(B=3\)), it is possible to achieve that considerable improvement of fairness, with very minor and benign leakage of information. Finally, we demonstrated that the runtime of the proposed method is practical, especially considering that it is executed once as a pre-process procedure.

Possible future research directions are as follows:

—

Data distribution: We assumed a horizontal data distribution setting, where each party holds full records of a subset of the population. Forthcoming work may investigate a vertical distribution scenario, where each party holds a subset of the information about all individuals, or a mixed scenario that combines both horizontal and vertical distributions.

—

Sensitive attribute: We assumed a single sensitive attribute that attains two possible values. While most studies in the algorithmic fairness literature make the same assumptions, future research should consider the case of multiple sensitive attributes that may attain more than just two values.

—

Fairness mechanism: We proposed a private version of a pre-process fairness enhancing mechanism, inspired by Feldman et al. [16]. While it is clear qualitatively how our fairness mechanism’s parameters, B and \(\lambda\), affect fairness, we do not offer theoretical bounds on their effect. Establishing such theoretical bounds could be a promising direction for future research. Moreover, future efforts should be invested in developing private versions of additional fairness enhancing mechanisms, including in-process and post-process mechanisms (e.g., [24, 30, 40]).

Footnotes

\(⊍\) denotes a disjoint union.

We use subscripts in parentheses for any indexing purpose that does not relate to the distribution between the L parties.

Recall that our privacy-enhancing mechanism ensures that the results of executing the fairness-enhancing mechanism are identical to the case of a single non-distributed dataset; therefore, the manner in which the dataset is distributed does not matter.

A Summary of Notations

—

D: The horizontally distributed dataset.

—

L: The number of parties.

—

\(P_\ell\): A party, \(\ell \in [L]:=\lbrace 1,\ldots ,L\rbrace\).

—

W: A given population of individuals.

—

A: A set of attributes that relate to each of the individuals.

—

S: A sensitive attribute (e.g., race or gender). \(S \in \lbrace U,V\rbrace\), where U means unprivileged and V means privileged.

—

X: The non-sensitive attribute.

—

Y: The binary class attribute that needs to be predicted (e.g., “hire/no hire”).

—

\(W^S\): The subset of individuals in W that are associated with group S, \(S \in \lbrace U,V\rbrace\).

—

\(W_\ell\): The subset of individuals in W whose information is held by the party \(P_\ell\), \(\ell \in [L]\).

—

\(W_\ell ^S\): \(W^S \bigcap W_\ell\).

—

n: \(|W|\).

—

\(n^S\): \(|W^S|\), for each \(S \in \lbrace U,V\rbrace\).

—

\(n_\ell\): \(|W_\ell |\), for each \(\ell \in [L]\).

—

\(n_\ell ^S\): \(|W_\ell ^S|\), for each \(S \in \lbrace U,V\rbrace\), \(\ell \in [L]\).

—

\(D(X)\): The multiset of values appearing in the X-column of D.

—

\(D^S(X)\): The multiset of values appearing in the X-column of D, restricted to the rows in \(W^S\), for each \(S \in \lbrace U,V\rbrace\).

—

\(D_\ell (X)\): The multiset of values appearing in the X-column of D, restricted to the rows in \(W_\ell\), for each \(\ell \in [L]\).

—

\(D_\ell ^S(X)\): The multiset of values appearing in the X-column of D, restricted to the rows in \(W_\ell ^S\), for each \(S \in \lbrace U,V\rbrace\), \(\ell \in [L]\).

—

B: The number of bins.

—

\(b^S_{(i)}\): The ith bin in a sorted quantile-based binning scheme, dividing \(D^S(X)\) to nearly equal-sized bins.

—

x: An original value of an individual.

—

\(\bar{x}\): The repaired value of x.

—

\(\lambda\): The repair tuning parameter (\(\lambda \in [0,1]\)).

—

\(K^S\): The list of locations of boundaries for all bins in \(D^S(X)\), for each \(S\in \lbrace U,V\rbrace\).

—

\(q^S\) and \(r^S\): The quotient and remainder, respectively, when dividing \(n^S\) by B, for each \(S\in \lbrace U,V\rbrace\).

—

\(m^S_{(i)}\): \(\min \lbrace b^S_{(i)} \rbrace\), for each \(S \in \lbrace U,V\rbrace\), \(i \in [B]\).

—

\(m^S_{(B+1)}\): \(\max \lbrace b^S_{(B)} \rbrace\), for each \(S \in \lbrace U,V\rbrace\).

—

d: The precision of digits after the decimal point.

—

\([\alpha ,\beta ]\): The range of possible values in \(D(X)\) (after they were multiplied by \(10^d\), for some d on which the parties agreed upfront, and rounded to the nearest integer).

—

M: The number of possible values in \(D(X)\) (i.e., \(\beta - \alpha +1\)).

—

\(⊍\): A disjoint union.

B Experiments with Additional ML Classifiers

In Section 5.2, we presented experiments based on a Logistic Regression classifier. Here, we present results with additional classifiers: Random Forest and Neural Network (Figures 10 and 11).

Fig. 10.

Fig. 11.

References

[1]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.

Abstract

1 Introduction

2 Related Work

2.1 Algorithmic Fairness in Centralized Settings

Fairness Definitions and Measures.

Mechanisms for Enhancing Fairness

The Setting

Perturbation

Secure Multi-Party Computation

2.3 Privacy and Fairness

3 Problem Statement AND Preliminaries

4 The Proposed Method

4.1 A Pre-Process Fairness-Enhancing Mechanism

4.2 Secure Multi-Party Algorithm for Enhancing Fairness

4.3 A Protocol for Computing the kth-Ranked Element in Distributed Datasets

4.3.1 Overview.

4.3.2 In Preparation for Running the Protocol.

4.3.3 Semi-Honest Parties.

4.3.4 Malicious Parties.

4.3.5 An MPC Sub-Protocol for Verifying Inequalities.

4.4 Privacy Analysis

4.5 Computational and Communication Costs

4.6 A Concluding Remark

5 Evaluation

5.1 Experimental Setting

Datasets

ProPublica Recidivism Dataset.

ProPublica Violent Recidivism Dataset.

The Bank Marketing Dataset.

Measures

Accuracy.

Fairness.

Distance.

Parameters’ Space

Implementation Details

5.2 Results

6 Conclusion

Footnotes

A Summary of Notations

B Experiments with Additional ML Classifiers

References

Index Terms

Recommendations

Fairness and privacy preserving in federated learning: A survey

How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing

When Machine Learning Meets Privacy: A Survey and Outlook

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations