Bayes Security: A Not So Average Metric

Security system designers favor worst-case security metrics, such as those derived from differential privacy (DP), due to the strong guarantees they provide. On the downside, these guarantees result in a high penalty on the system's performance. In this paper, we study Bayes security, a security metric inspired by the cryptographic advantage. Similarly to DP, Bayes security i) is independent of an adversary's prior knowledge, ii) it captures the worst-case scenario for the two most vulnerable secrets (e.g., data records); and iii) it is easy to compose, facilitating security analyses. Additionally, Bayes security iv) can be consistently estimated in a black-box manner, contrary to DP, which is useful when a formal analysis is not feasible; and v) provides a better utility-security trade-off in high-security regimes because it quantifies the risk for a specific threat model as opposed to threat-agnostic metrics such as DP. We formulate a theory around Bayes security, and we provide a thorough comparison with respect to well-known metrics, identifying the scenarios where Bayes Security is advantageous for designers.


I. INTRODUCTION
Quantifying the level of protection given by security and privacy-preserving mechanisms is a fundamental process in secure system engineering.To perform a quantitative analysis, one needs to define appropriate metrics that capture the adversary's gain, and, ultimately, what are the risks for the system's users.
A common way of evaluating threats in security and privacy applications is to quantify the probability that an adversary guesses some secret information; this metric is referred to as the success rate or accuracy of an attacker.For example, membership inference attacks against machine learning (ML) models [31], where the attacker aims at guessing if a data record was used for training the model, have been for long evaluated w.r.t. the attacker's accuracy.
Average-case metrics.The success rate (or accuracy) has a very clear interpretation: it measures the probability that an adversary succeeds in the attack.An important special case, the Bayes vulnerability (e.g., [32]), is the accuracy of the (Bayes) optimal adversary, who has maximal information about the underlying uncertainty.Both Bayes vulnerability and accuracy rely on the the prior probability of the secret information that the attacker is trying to guess; unfortunately, this can result in misleading conclusions about an attack's strength [32].In the membership inference example, if the prior probability that a data record is low (say, 0.1), a strawman attack that always guesses "non-member" will achieve 90% accuracy; yet, this is a rather weak attack.This shows that the accuracy metric does not characterize well the risk of this attack.
An alternative criterion, used to evaluate cryptographic primitives, is the advantage (e.g., [5]).Advantage defines the prior probability over the secrets to be uniform, by construction, and it relates this prior probability to the probability that the adversary succeeds after having access to the model (accuracy).Intuitively, this metric disregards the contribution of the prior, and it quantifies the information leakage of the algorithm itself; however, to the best of our knowledge, no known result shows that this metric is prior-independent.
Both cryptographic advantage and Bayes vulnerability are threat-specific, i.e., they are connected to the threat model under which security is quantified.This gives a precise interpretation of what attacks they protect against.However, they are rarely used to study complex real-world systems, such as ML training algorithms.The main reason is that, due to complexity, one often needs to evaluate the security of individual parts of the algorithm and then compose them; this is not known to be possible with these metrics.
Worst-case metrics.At the other end of the spectrum, Differential Privacy (DP) has become the golden standard in privacy analysis [14].In DP, a parameter ε bounds the probability that an algorithm's output leaks any information.There are several reasons why DP is generally preferred over other metrics: 1) DP is easy to compose analytically; e.g., if two algorithms are resp.ε 1 -and ε 2 -DP, their cascade composition is (ε 1 + ε 2 )-DP.2) DP is prior-independent: it measures the risk of releasing a secret via the algorithm, independently of the secret's prior probability; 3) DP protects against virtually any threat model: its guarantees hold whether the adversary wishes to learn an entire data record or just one bit of information; we refer to this property as being threat-agnostic.4) DP considers the worst-case scenario over the outputs, ensuring robustness against any threat: it bounds the best gain an adversary can have, even if their maximum gain is achieved with negligible probability.
DP, however, also comes with disadvantages.First, DP is often too strict of a requirement: in many security settings, such as traffic analysis, side channel protection, and privacypreserving ML (PPML), DP mechanisms that provide high protection levels incur severe utility loss.This mostly comes from the fact that DP is threat-agnostic.Second, it is theoreti-= P (s).The channel is a matrix defining the posterior probability of observing an output o ∈ O given an input s ∈ S: C s,o def = P (o | s).We denote by C s ∈ D(O) the s-th row of C (which is a distribution over O), and by C S the set of all rows of C. Table I summarizes our notation.Adversarial Goal.We consider a passive adversary A who, given an output o, aims at inferring which secret s was input to the mechanism.We model this adversary with the following indistinguishability game, which we call IND-BAY: We consider an optimal adversary that has perfect knowledge of the channel C and of the prior distribution over the secret inputs π (line 1).A challenger samples a secret s according to the prior π (line 2), and inputs it to the channel C to obtain an observable output o (line 3).For simplicity, the game considers an individual observation, but we note that sequences of observations (e.g., representing multiple uses of the same channel to hide one secret, or simultaneous use of two channels with the same secret) can be accounted for by redefining o to be a vector.Upon observing the output o. the adversary produces a prediction s ′ (line 4).The adversary wins if they guess the secret correctly: s = s ′ (line 5).We evaluate the adversary A with respect to their expected prediction error according to the 0-1 loss function: R A def = P (s ̸ = A(o)) = P (s ̸ = s ′ ).Extensions to further loss functions are possible, but out of the scope of this paper.
This formulation is different from typical cryptographic games because of the following reasons.First, we assume an optimal adversary: instead of providing them with knowledge of the cryptographic algorithm except for the key, and let them query the primitive to learn its statistical behavior, we assume that the adversary has perfect knowledge of the probabilistic behavior of the channel.Second, we compute the advantage with respect to the adversary's error, while cryptographic games compute the adversary's probability of success.Third, this game captures an eavesdropping adversary that cannot influence the secret used by the challenger to produce the observable output.This is considered to be a weak adversary in cryptography, where typically the adversary is allowed to provide inputs to the algorithm under attack.However, it corresponds to many security and privacy problems where the adversary cannot influence the secret and only observes channel outputs: website fingerprinting [21], [37], privacy-preserving distribution estimation [18], [29], [30], side channel attacks [25], [26], [33], or pseudorandom number generation.Adversarial Models.In this paper, we consider the Bayes adversary, an idealized adversary who knows both prior π and channel matrix C, and guesses according to the Bayes rule: The expected error of the Bayes adversary (Bayes risk) is: When this adversary is confronted with a perfect channel, whose outputs leak nothing about the inputs, their best strategy is to guess according to priors: s ′ = arg max s∈S π s .The expected error of this strategy is the random guessing error: Multiplicative Bayes risk leakage.We study the properties of a metric, β, defined as [9]: where it is assumed that G(π) > 0; we refer to β as the multiplicative Bayes risk leakage.Inspired from the cryptographic advantage (Section V-A), β captures how much better than random guessing an adversary can do.It takes values in [0, 1]; β = 1 when the system is perfectly secure (i.e., it exhibits no leakage), and β = 0 when the adversary always guesses the secret correctly.In the next section, we define the Bayes security metric β * (C) to be the minimizer (i.e., least secure configuration) of β(π, C) for any prior π.We then study its properties, which we argue make it suitable for analyzing the security of complex real-world algorithms.We note that β is closely related to the multiplicative Bayes vulnerability leakage [6], which is defined as: 1−G(π) .Differently from β, L × is defined for the adversary's probability of success (i.e., Bayes vulnerability) rather than failure, and it takes values in [0, n].Despite their similar definition, these two metrics behave very differently (Section V).An important result for L × is that it takes its worse value (i.e., least secure configuration) on a uniform prior over the secrets [6].Therefore, even when the real priors are unknown, a security analyst can easily compute a bound on the security of the system.In the next section, we derive the counterpart result for the β: it reaches its least secure configuration when setting a uniform prior on the two most vulnerable secrets, and a 0 prior probability elsewhere; the proof is substantially more involved than the one for L × .We further discuss the relation between the Bayes Security metric and multiplicative leakage in Section V-E.

III. THE BAYES SECURITY METRIC
Leakage notions based on the Bayes risk generally depend on the prior distribution over the secrets.This makes them unsuitable for measuring security in real-world applications where the true priors are unknown, e.g., traffic analysis [17], [38] or membership inference attacks [31], and result in an overestimation of a mechanism's security if the real prior implies more leakage than the prior considered in the analysis.

Cs
The s-th row of a channel matrix.It corresponds to the probability distribution A vector of prior probabilities over the secret space.The i-th entry of the vector is π i .π ij A prior vector with exactly 2 non-zero entries, in position i and j, with i ̸ = j.υ = ( 1 /n, ... 1 /n) Uniform priors for a secret space of size |S| = n.R * (π, C) (abbr.R * ) The Bayes risk of a channel.G(π) (abbr.G) The random guessing error (error when only priors' knowledge is available).
Min Bayes security of a channel.
Given the similarity between multiplicative risk leakage and multiplicative vulnerability leakage, one could expect that the uniform prior also represents the worst case for the latter [6].Unfortunately, this is not the case: Theorem 7 (Appendix A) shows that, for secret spaces |S| > 2, there exists a prior π for which β is smaller than the one achieved for a uniform prior.Prior-independence for β.In this section, we show that the multiplicative Bayes risk leakage, β(π, C), for a channel C, is minimized when the prior π assigns equal weight to the two secrets that are maximally distant (according to posterior distribution), and 0 to all other secrets.We refer to this minimizer, representing the highest risk for the channel w.r.t.adversary's prior knowledge, as the Bayes security metric; we denote it with β * (C) (omitting the argument if no confusion arises).This result makes the Bayes security metric priorindependent: for any prior knowledge the attacker may have in practice, β * bounds their success.
For simplicity, we present our result in the one-try attack scenario, as formalized by the IND-BAY game: the adversary observes just one output of the system before guessing the secret input.In Section IV we extend this result to cases where the adversary can collect more observations.Theorem 1.Consider a channel C on a secret space with |S| ≥ 2. There exists a prior vector π * ∈ D(S) of the form In the following, we provide an intuition of the concepts involved with this proof.
We denote with U (k) ⊂ D(S), for k = 1, ..., |S|, the set of distributions whose support has cardinality k, and with a uniform distribution over its non-zero components: For example, if n = 3, then: U (1) = {(1, 0, 0), ..., (0, 0, 1)}, For a fixed channel C, the proof of Theorem 1 is based on demonstrating the following two steps: 1) the function β(π, C) = R * (π,C) /G(π) has its minimum in the set U = U (1) ∪ ... ∪ U (|S|) .The elements of U are known in the literature as the corner points of G(π); 2) the minimizing prior π * of β(π, C) has cardinality 2; that is, π * ∈ U (2) .The proof for the first step comes from the observation that the function β is the ratio between a concave function, R * , and a function G that is convexly generated by U. Lemma 2 (Appendix B) shows that the minima of this ratio exist, and that they must come from the set of corner points of G (i.e., the set U).This determines the form of the minimizing priors.
For the second step, under the constraints given by Lemma 2, the Bayes risk R * (π, C) decreases quicker than G(π) as we increase the number of 0'es in π ∈ U.By excluding the solution π * ∈ U (1) , which would force the denominator G = 0, it follows that the minimizer of β(π, C), π * , has exactly 2 nonzero elements; that is, π * ∈ U (2) .Discussion.Theorem 1 has several consequences.First, the fact that β * (C) does not depend on a prior means that it captures the actual leakage of the channel, excluding any prior knowledge that the adversary may have.Conveniently, after Bayes security is computed for the channel, one can recover the success rate of the Bayes optimal adversary for desired levels of attacker's knowledge (Section V-E).Second, the fact that Bayes security represents the risk for the two leakiest secrets means that: • If the two leakiest secrets can be determined a priori, this makes the security analysis straightforward (Section VI); • If the two leakiest secrets cannot be determined a priori, one only needs O(n 2 ) computations (instead of O(n!)) to recover them.Finally, Theorem 1 suggests that Bayes security can be interpreted as a middle way between worst-case and averagecase security metrics: it represents the expected (i.e., average) risk for the two most vulnerable (i.e., worst-case) secrets.We argue that this, paired with the fact that Bayes security is threat-specific, favors the interpretability of this metric.In the next sections, we prove properties about Bayes security which make it suitable for studying complex mechanisms.Bayes security and total variation.We now introduce an important result for β * which helps analyzing mechanisms in practice: Bayes security is the complement of the total variation of the two maximally distant rows of the channel: Theorem 2. For any channel C, it holds that This result gives a clear interpretation of what β * represents: it measures the maximal distance between the pairwise posterior distributions of the outputs w.r.t. the secret inputs (Figure 1).Further, thanks to this result: i) it is easy to analyze mechanisms both analytically (Section VI) and via estimation techniques (Section VII) by exploiting the plethora of results surrounding the total variation distance between distributions.IV.THE BAYES SECURITY METRIC UNDER COMPOSITION Some of the properties that made DP so popular for studying complex algorithms are its compositionality rules: given DPcompliant mechanisms, it is very easy to determine the privacy of a mechanism that combines them (e.g., by chaining them).Further, compositionality enables studying complex threat scenarios.For example, while so far we have only considered an adversary who observes the channel's output once, it is common that a real adversary observes more than one channel at a time; e.g., observing obfuscated locations at different layers [35] or combining side channels [33].Moreover, they can observe the output of two sequential channels, e.g., users' privacypreserving interactions with a database through anonymous communication channels [34]; or they can observe more than one output from one channel, e.g., by gathering several side channel measurements from a hardware running cryptographic routines [25], or observing more than one visit to a website through an anonymous communication channel [24], [37].
In this section, we uncover compositionality rules for Bayes security, which enable it to tackle the above examples.

A. Parallel composition
We first consider an adversary who has access to the outputs of two channels that have as input the same secret [33], [35] or where an adversary observes multiple channel outputs belonging to the same secret [24], [25], [37].
Given two channels, In layman's terms, the composition of two channels that are respectively β * 1 -secure and β * 2 -secure leads to a β * 1 β * 2 -secure channel.This bound is tight.
Note that the security of this new channel is not necessarily minimized by the secrets that minimize the composing channels C1 , C 2 , not even when the channel is composed with itself: Proposition 1.Let C be a channel for which β is minimized for secrets (s 1 , s 2 ).Then the composition channel C ′ := C||C is not necessarily minimized by secrets (s 1 , s 2 ).

B. Chaining mechanisms
Another typical configuration, used to strengthen the security of the system, is to put in place a cascade of security mechanisms (in-depth security).More formally, consider two channels C 1 : S 1 → S 2 and C 2 : S 2 → O. Their cascade composition is the channel C 1 C 2 in which the secret is input in C 1 and this channel's output is post-processed by C 2 .
It is well understood that post-processing cannot decrease the security of a mechanism.Therefore, C 1 C 2 should be at least as secure ask C 1 .Indeed, based on the concavity of R * , it is easy to show that R * (π, Understanding the effect of C 1 on C 2 is less straightforward.The composition C 1 C 2 can be seen as the pre-processing of C 2 , which is not necessarily a safe operation.Note that C 2 receives as input the output of C 1 , which is not necessarily the same as S 1 .Hence, the prior π on the input secret in S 1 is meaningless for C 2 .Remarkably, as β * does not depend on the prior, it allows to compare C 1 C 2 and C 2 despite the different input spaces.From Theorem 2 we know that β * (C 1 C 2 ) is given by the maximum ℓ 1 distance between the rows of C 1 C 2 .The key observation is that the rows of C 1 C 2 are convex combinations of the rows of C 2 ; but convex combinations cannot increase distances, which brings us to the following result.Theorem 4. For all channels C 1 , C 2 it holds that This means that neither pre-processing nor post-processing decreases the Bayes security provided by a mechanism.

V. RELATION WITH OTHER NOTIONS
In the previous sections, we presented Bayes security, discussed its properties and showed how to compute it in an efficient manner.In this section, we compare it with three well-known security notions: cryptographic advantage, a mainstream threat-specific metric in the security community; DP, the paradigmatic worst-case metric; and multiplicative Bayes vulnerability leakage, which is closely related to β but comes with different properties.

A. Cryptographic advantage
In cryptography, the advantage Adv of an adversary A is defined assuming that there are two secrets (|S| = 2) with a uniform prior as input to a channel C. Formally (e.g.[39]): The factor 2 serves to scale Adv within the interval [0, 1].
Denoting by Adv(C) the advantage of the optimal (Bayes) adversary and considering a uniform prior π = υ, we derive: Hence the Bayes security metric can be seen as a generalization of 1 − Adv for which the secret space S may contain more than two secrets the prior is not necessarily uniform. 1   Bayes security as IND-CPA security.In Section II, we introduced the IND-BAY game to formalize the adversarial setting captured by β(π, C).When considering this game in the light of the minimizer, β * (C), and our main result Section III (β is minimized on the two leakiest secrets), the IND-BAY game becomes a version of the traditional IND-CPA cryptographic game that we call IND-MINBAY (Figure 2, left).First, recall that the adversary has perfect knowledge of the prior π and the channel C (line 1).Then, as opposed to the IND-BAY game, where the adversary cannot influence the input, we allow A to select the secrets and provide them to the challenger (line 2).This is analogous to classical IND-CPA, and it allows to capture the worst-case inputs.Then the challenger selects one of the two secrets according to the prior π (line 3), and returns to the adversary an obfuscated version according to the channel probability matrix (line 4).The adversary guesses one of the two secrets (line 5), and wins the game if the guess is the secret selected by the challenger.The advantage of this adversary is equivalent to that of a CPA adversary guessing what message was encrypted by the challenger.
This equivalence of games reinforces that the Bayes security metric sits in the middle between average metrics (measuring the expected risk) and worst-case metrics (measuring the worstcase risk across the secrets).In the next part of this section we explore this relation further.

B. Local Differential Privacy
We investigate the relation between the privacy guarantees induced by Bayes security (and, more in general, β), and those induced by DP metrics.
For a parameter ε ≥ 0, we say that a mechanism is ε-LDP (local DP) [13] if for every i, h, j: LDP is a worst-case metric, while recall that β has the characteristics of an average metric.Therefore, we expect that LDP implies a lower bound on β, but not vice versa.The rest of this section is dedicated to analyzing this implication.A game for LDP.We first illustrate the difference between the threat model of Bayes security and the one considered by Local Differential Privacy using security games.Figure 2, right, represents the game for local differential privacy (IND-LDP). 2 A first remarkable difference with respect to typical security games is that, in addition to selecting the secrets as the IND-MINBAY game, the adversary also chooses the observation (line 3).This captures a worst-case in which the adversary not only picks the most vulnerable inputs, but also the output that makes them easier to distinguish.Upon receiving the secrets and the observation, the challenger selects one of the secrets according to the probability that it caused the observation (line 4).A second difference with respect to typical games, and IND-MINBAY, is that the challenger does not show the chosen value to the adversary.Otherwise, it would be a trivial win.The adversary guesses a secret (line 5), and wins if this is the secret the challenger chose (line 6).Note that, because the adversary has much greater freedom in their choices, their chances to win are considerably greater than than in IND-MINBAY or traditional games.Therefore, the LDP game captures a stronger attacker than most cryptographic games, but it is much harder to map it to a realistic threat scenario.LDP induces a lower bound on Bayes security.In general, if there are no restrictions on the channel matrix, the lowest possible value for β is 0; this is achieved when the adversary can identify the value of the secret from every observable with probability 1.Assuming that |S| ≥ 2 and that π is not concentrated on one single secret, 3 and that S contains at least two elements, β can only be zero if and only if the channel contains at most one non-0 value for each column.
If the matrix is exp(ε)-LDP, however, then the ratio between two values on the same column is at most exp(ε).Intuitively, under this restriction, β cannot be 0 anymore: the best case for the adversary is when the ratio is as large as possible, i.e., when it is exactly exp(ε).In particular, in a 2 × 2 channel, we 2 We use this game for a qualitative comparison between the metrics.However, we observe that the parameter ε of LDP can be recovered from this game by ensuring a uniform prior when sampling from P (s | o) = P (o|s) /(2P (o)) (i.e., P (s 1 ) = P (s 2 ) = 1 /2), and by evaluating the game with the following success metric: ε = ln( V * /1−V * ), where V * = maxs P (s | o) is the probability that a Bayes-optimal adversary guesses the secret correctly. 3If the probability mass of π is concentrated on one secret, then G(π) = 0 and β(π, C) is undefined.However also R * (π, C) = 0, and expect that the minimum β is achieved by a matrix that has the values exp(ε) /1+exp(ε) on the diagonal, and 1 /1+exp(ε) in the other positions (or vice versa).The next theorem confirms this intuition, and extends it to the general case n × m.
1) If C is ε-LDP, then for every π we have 2) For every n, m ≥ 2 there exists a n × m ε-LDP channel . Examples of such C * 's are illustrated in Figure 3.

Bayes security does not induce a lower bound on LDP.
Theorem 5 shows that ε-LDP induces a bound on Bayes security, and that we can express a strict bound that depends only on ε.The other direction does not hold.The main reason is that if a column contains both a 0 and a positive element, then ε-LDP cannot hold, independently from the value of β.

C. Approximate Differential Privacy
One may consider (ε, δ)-LDP [15].This is a variant of LDP in which small violations to Equation 2 are tolerated.Precisely, a mechanism is (ε, δ)-LDP if for every s i , s j ∈ S and O ⊆ O: With (ε, δ)-LDP, a column may contain 0 and non-0 values, as long as the latter are smaller than δ.Similarly to pure DP, approximate DP is threat-agnostic; this makes it harder to match (ε, δ) values to the risk of an attack occurring.
Surprisingly, we observe a direct relation between Bayes security and the special case (0, δ)-DP: This comes from the fact that, for ε = 0, the LHS of Equation 3 becomes o corresponds to the total variation between C si,o and C sj ,o .Applying the equivalence between β * and total variation (Theorem 2) concludes the argument.
The special case of (0, δ)-LDP mechanisms is not commonly studied.Intuitively, it corresponds to a mechanism that is completely vulnerable, but only with probability δ.We hope that the direct correspondence between β * , δ, and the cryptographic advantage can give further insights in the decision of the parameter choices for approximate DP.

D. Differential privacy [16]
Differential privacy is similar to LDP, except that it involves the notion of adjacent databases.Two databases x, x ′ are adjacent, denoted as x ∼ x ′ , if x is obtained from x ′ by removing or adding one record.
The definition of ε-differential-privacy (ε-DP), in the discrete case, is as follows.A mechanism K is ε-DP if for every x, x ′ such that x ∼ x ′ , and every y, we have A relation between the Bayes security and DP follows from an analogous result in [1] for the multiplicative Bayes leakage L × (π, K), and the correspondence between the latter and the Bayes security (cfr.Section V-E), which is given by The following result, proven by Alvim et al. [1], states that ε-DP induces a bound on the multiplicative vulnerability leakage, where the set of secrets are all the possible databases.The theorem is given for the bounded DP case, where we assume that the number of records present in the database is at most a certain number n, and that the set of values for the records includes a special value ⊥ representing the absence of the record.The adjacency relation is modified accordingly: x ∼ x ′ means that x and x ′ differ for the value of exactly one record.We also assume that the cardinality v of the set of values is finite.Hence also the number of secrets (i.e., the possible databases) is finite.
Theorem 6 (From [1], Theorem 15).If K is ε-DP, then, for every π, L × (π, C) is bounded from above as and this bound is tight when π is uniform.
From Theorem 6 and Equation 4we immediately obtain a bound also for the Bayes security: and this bound is tight when π is the uniform distribution υ which assigns 1 /v n to every database, in which case it case it can be rewritten as Alvim et al. [1] show that the reverse of Theorem 6 does not hold, and as a consequence the reverse of Corollary 1 does not hold either.The reason is analogous to the case of LDP: a 0 in a position of a non-0-column implies that the mechanism cannot be DP, independently from the value of β.Membership inference.In Corollary 1 the secrets are the whole databases.Often, however, in DP we assume that the attacker is not interested at discovering the whole database, but only whether a certain record belongs to the database or not.We can model this case by isolating a generic pair of adjacent databases x and x ′ , and then restricting the space of secrets to be just {x, x ′ }.On this space, the mechanism can be represented by a stochastic channel C {x,x ′ } that has only the two inputs x and x ′ , and as outputs the (obfuscated) answers to the query.It is immediate to see that K is ε-DP iff C {x,x ′ } is ε-LDP for any pair of adjacent databases x and x ′ .Hence, the relations we proved between Bayes security and LDP hold also for DP.In particular, the following is an immediate consequence of Theorem 5.
Corollary 2. If K is ε-DP, then for every pair of adjacent databases x and x ′ and every π we have .
and this bound is strict.
A similar investigation was done by Yeom et al. [39].They studied the privacy of C {x,x ′ } in terms of the advantage, defined in the context of membership inference attacks (MIA).The authors established that, if a mechanism is ε-DP, then the following lower bound for holds for Adv(C {x,x ′ } ), for any adjacent databases x and x ′ : By using the relation between the advantage and the Bayes security metric (Equation 1), we derive the following bound: where υ is the uniform distribution.By exploiting the equivalence between Bayes security and the advantage, we conclude that the bound by Yeom et al. [39] is loose; from Corollary 2 and Equation 1, we derive the following (strict) bound for the advantage of an ε-DP mechanism: which is much tighter than their bound (see Figure 4).Concurrent work by Humphries et al. [22] proved a similar bound for (ε, δ)-DP in the context of membership inference attacks.Their bound is more general than ours, since it captures approximate DP; however, we prove tightness for our bound.Whether tightness can be proven for the bound by Humphries et al. is to our knowledge an open problem.

E. Leakage notions from Quantitative Information Flow
We discuss multiplicative risk leakage (β) and its minimizer (β * ) from the point of view of Quantitative Information Flow (QIF), and compare it with similar metrics stemming from the field.QIF measures the information leakage of a system by comparing its vulnerability before and after observing its output.It starts with a vulnerability metric V (π), expressing how vulnerable the system is when the adversary has knowledge π about the secret.The posterior vulnerability is defined as , where δ o is the posterior distribution on S produced by the observation o; intuitively, it expresses how vulnerable the system is, on average, after observing the system's output.Leakage is defined by comparing the two, either multiplicatively or additively: One of the most widely used vulnerability metrics is Bayes vulnerability [32], defined as V (π) = max s π s = 1 − G(π); it expresses the adversary's probability of guessing the secret correctly in one try. 4For the posterior version, it holds that V (π, C) = 1−R * (π, C).The multiplicative risk leakage follows the same core idea: G(π) can be thought of as a prior version of R * : indeed, it holds that R * (π, C) = o p(o)G(δ o ) where δ o are the posteriors of the channel.Hence, β can be considered to be a variant of multiplicative vulnerability leakage, using Bayes risk instead of Bayes vulnerability.
Since the two are closely related, one would expect to be able to directly translate results about L × (π, C) to similar results on β(π, C).This would be the case for additive leakage, since C), but in the multiplicative case, the "one minus" in both sides of the fraction completely changes the behavior of the function.Capacity vs β * .One should first note that, while β takes lower values to indicate a worse level of security, L × takes higher values.In both cases, a natural question is to find the prior π that provides the worst level of security; in the case of leakage, its maximum value is known as channel capacity, denoted by ML × (C) is given by the uniform prior [6], and ML × (C) = o max s C s,o .Our main result (Theorem 1) shows that β(π, C) is minimized on a uniform prior over 2 secrets.Hence, despite the similarity between Bayes vulnerability and Bayes risk, the corresponding leakage and security metrics behave very differently.Note that this difference makes ML × (C) easier to compute for arbitrary channels; it is linear on both |S| and |O|, while β * is quadratic on |S|.We discuss fast ways for computing (or estimating) Bayes security in Section VII.
Channel composition.Despite their difference w.r.t. the prior that realizes each notion, ML × and β * behave similarly w.r.t.parallel and cascade composition.It was shown that [19]: The same bounds are given in Section IV for β * .Note, however, that the proofs for β * are completely different and cannot be directly obtained from those of ML × .Bounds on Bayes risk.The goal of a security analyst is to quantify how much information is leaked by a mechanism in the worst case.This is captured by both ML × and β * which focus on the prior that produces the highest leakage instead of the true prior.The user, however, is mostly interested in the actual threat that he is facing: how likely it is for the adversary to guess his secret, given a particular prior π that captures the user's behavior.In this sense, the Bayes risk, R * (π, C), has a clear operational interpretation for the user.
Fortunately, having computed either ML × or β * , we can obtain direct bounds for the prior π of interest: The goodness of either bound depends on the application.Intuitively, how good the bound is depends on how close π is to the one achieving ML × or β * .Concretely, since the former implies uniform priors, and the latter a vector with only 2 non-empty (uniform) entries, the tightness of these bounds depends on the sparsity of the real prior vector π.
We study empirically how tight these bounds are.We consider two channels with |S| = 10 inputs and |O| = 1K outputs.The first channel (hereby referred to as the random channel) is obtained by sampling at random its conditional probability distribution P (o | s); the second one (geometric channel) has a geometric distribution as used by Cherubin et al. [10], with a noise parameter ν = 0.1.To evaluate the effect of sparsity, we set a sparsity level to σ ∈ 0, 1, ..., n − 2, and we sample a prior that is σ-sparse uniformly at random.We compute the values of L × and β, and measure their absolute distance respectively from ML × and β * .The experiment is repeated 1K times for each sparsity level.
Figure 5 shows the results.As expected, the multiplicative vulnerability leakage bound is tighter for vectors that are less sparse, and the Bayes security one for higher sparsity levels.However, we observe that the Bayes security bound is loose for high values of sparsity in the case of the geometric channel, but not for the random one.The reason is that if the real prior has maximum sparsity (i.e., only 2 non-zero entries), then it is TABLE II: Security metrics comparison: Local Differential Privacy (LDP), Multiplicative leakage capacity, and Bayes Security (β * ).Note that the cryptographic advantage is a special case of β * , and therefore not included in this table."Consistent Black-box Estimation" refers to the existence of a statistically consistent estimator for the security metric (e.g., [10]).more likely that the secrets on which β is minimized are not the same 2 secrets on which the prior is not empty.As a consequence of this analysis, we suggest ML × is better suited to analyze deterministic mechanisms with a large number of secrets distributed close to uniformly (see [32]).For deterministic programs, β * is always 0, unless the program is non-interfering; the bound obtained by β * is then trivial, while ML × can provide meaningful bounds if the number of outputs is limited.On the other hand, β * is advantageous if π is very sparse (e.g., website fingerprinting, where the user may be visiting only a small number of websites): since π is very different than the uniform one, and more similar to the one achieving β * , the latter will provide much better bounds.We discuss application examples for Bayes security in Section VIII.
Miracle theorem.Both Bayes vulnerability and Bayes risk can be thought of as instantiations of a general family of metrics parameterized by a gain function g (for vulnerability) or a loss function ℓ (for risk).For generic choices of g and ℓ, we can define g-leakage L × g (π, C) and the corresponding security notion β ℓ (π, C), in a natural way (detailed in Appendix F).
A result by Alvim et al. [2], known as "miracle" due to its arguably surprising nature, states that for all priors π and all non-negative gain functions g.This gives a direct bound for a very general family of leakage metrics.For β ℓ , however, we know that a corresponding result does not hold in general, even if we restrict to the family of [0, 1] loss functions.Identifying families of loss functions that provide similar bounds is left as future work.

MECHANISMS
We now exploit the various properties we proved about Bayes security to study well-known mechanisms: Randomized Response, and the Gaussian and Laplace mechanisms.These are often used as building blocks for more complex ones.
We can derive β * easily for RR because it obfuscates each secret according to the same distribution.For any two secrets s i and s j , the rows of the channel matrix are identical except for positions i and j, where their values are inverted; therefore, the Bayes security is the same between any two secrets, and thus all are equally vulnerable.Using the results in Section VII-A, we just need to look at any two rows, e.g., the first two.Let R ab indicate the sub-channel matrix containing only the first two rows, and let υ = 1 /2.The corresponding Bayes risk is R * (υ, R ab ) = n 2(exp(ε)+n−1) ; hence the Bayes security is: where n is the number of secrets and observables.Discussion.Equation 7 captures the risk that an optimal adversary can distinguish between any two data records (secrets) from the RR output; by Theorem 1, these are the easiest two records to distinguish, and it implies Bayes security for any other subset of the secret space.
We use this equation to relate Bayes security and ε w.r.t. the number of data records (secrets) in Figure 6 (a).We observe that the number of data records is essential for security.For a rather loose DP parameter of ε = 10, having a dataset of 1M gives β * ≈ 0.978; assuming the two data records have the same prior, the probability that the adversary guesses correctly is 0.511.With the same ε, having a dataset of 10M ensures a practically perfect Bayes security of 0.998 (Bayes vulnerability for a uniform prior: 0.501).Overall, this shows that, if we are interested in a specific threat model(s), then a threat-specific metric such as Bayes security can reassure us on the security of the mechanism even when DP suggests it is not.
In the appendix (Section H), we include an empirical study of RR for the Census1990 dataset.We observe that, in this specific case, a good utility (95%) is only achieved for a rather large ε = 3.3; in principle, one would disregard the mechanism to be unsafe in this particular instance.Yet, Bayes security (β * = 0.99999 for ε = 3.3, and β * = 0.99995 for ε = 4.8) reassures us on its security within this threat model.

B. Laplace mechanism
For parameter a λ, we define the Laplace mechanism as L : s → s + Λ(0, λ), where Λ(µ, λ) is a µ-centered Laplace distribution with scale λ.Proposition 3. L is β * -secure with Discussion.We can use this analysis to compare β * with DP.Let f : D → R be a real-valued function with sensitivity ∆f def = max x,y∈D f (x)−f (y).The mechanism f (x)+L(0, λ) with scale parameter λ = ∆f ε is ε-DP.By using the last result, the Bayes security of this mechanism is β * = exp(− ε 2 ).Now, suppose we care about the probability of an adversary at distinguishing the two maximally distant points s 1 , s 2 = arg max x,y∈D f (x) − f (y).For a relatively strong DP level of ε = 0.1, we get β * ≈ 0.95; This implies a non-negligible advantage for the adversary; e.g., assuming the two points have identical prior 1 /2, the probability that the optimal adversary distinguishes between them is roughly 0.525.Figure 6 (b) shows the overall behavior.

C. Gaussian mechanism
For parameter σ, the Gaussian mechanism adds noise to a secret s from a Gaussian distribution: where Φ is the CDF of N (0, 1), and α = max si,sj Discussion.Because the Gaussian mechanism does not satisfy pure DP, we compare Bayes security with approximate DP.For a function f with sensitivity ∆f , and for ε < 1, the following mechanism satisfies (ε, δ)-DP: f (x) + N (0, 2 ln( 1.25 /δ)(∆f ) 2 ε 2 ). 25 /δ.As desired, security does not depend on the sensitivity of the function.

By applying Proposition 4 we obtain β
We observe a similar behavior to what we observed for the Laplace mechanism (Figure 6 (c)).Consider a dataset containing N = 1K records, for which an appropriate choice of δ according to the literature is δ = 1 /N 2 .For a relatively secure setting (ε = 1), we have β * = 0.925.As before, an interpretation of this value is that an optimal attacker will distinguish the two most vulnerable secrets with probability 0.538; this is clearly non-negligible.We note that only a stricter value such as ε = 0.1 ensures a strong guarantee against the attack (β * = 0.992).
Overall, Bayes security enabled us to interpret the privacy guarantees of various mechanisms, by matching them back to the probability of success of an optimal attacker under a specific threat model.

VII. COMPUTATIONAL ESTIMATION OF β *
Suppose that, differently from the cases we just analyzed (Section VI), a simple closed-form expression of the mechanism does not exist: how can we determine its Bayes security?Theorem 1 shows that to quantify Bayes security, the minimizer of multiplicative risk leakage β, we just need to estimate β for all pairs of secrets; this requires O(n 2 ) measurements.A measurement for a pair of secrets is obtained by estimating the Bayes risk of the mechanism for those two secrets; we can do this analytically, if we have white-box knowledge of the mechanism, or in a black-box manner 5 .In either case, if the mechanism is complex enough (e.g., large input or output space), each measurement may need a non-negligible computational time, from seconds to tens of minutes.
In this section, we investigate techniques for improving the search time.Since the bottleneck is the time it takes to measure β(π, C) for one prior π, we seek to reduce the number of such measurements.We first assume white-box knowledge of the system (subsections VII-A-VII-C), and then study the black-box case (subsection VII-D).
Initial observations.Denote by π ab be the sparse prior vector (0, ..., 0, 1 /2, 0, ..., 0, 1 /2, 0, ..., 0) such that the two non-zero elements of value 1 /2 are in positions a and b, and a ̸ = b.Given a channel C, from the definition of β we get that The crucial observation (shown in the proof of Theorem 2) is that the above quantity is equal to the complement of the total variation distance tv(C a , C b ) between the rows C a and C b of the channel.The total variation distance of two discrete distribution is 1 /2 of their L 1 distance (seen as vectors); hence: Then, from Theorem 1, we get that minimizing β is equivalent to finding the rows of the channel that are maximally distant with respect to L 1 .This is the well-known diameter problem (for L 1 ): given the set of vectors C S , find the two that are maximally distant (i.e., find the diameter of the set).

A. Computing β * with domain knowledge
In practical applications, domain knowledge may enable a priori identification of the two leakiest secrets.For example, the smallest and largest webpages users can visit in website fingerprinting (Section VIII); and the smallest and largest exponents in timing side channels against exponentiation algorithms [10].There are also applications where all the secrets are equally vulnerable; hence β * is obtained for any pair of distinct secrets.For instance, when the mechanism operates in such a way that all secrets enjoy the same protection (e.g., the Randomized Response mechanism, Section VI).
More generally, if one does not know the exact minimizing secrets, but knows that they belong to a set S ′ ⊂ S, then to determine β * it suffices measuring β for all s 1 , s 2 ∈ S ′ .

B. Computing β * in linear time n
The geometric characterization given by Theorem 2 implies that obtaining β * requires computing the diameter of a set of n = |S| vectors of dimension m = |O|.The direct approach is to compute the distance between every pair of vectors, i.e., perform O(n 2 m) operations.This quadratic dependence on n can be prohibitive when the number of secrets grows.
We first show that, by using an isometric embedding of L m , which has one component for every bitstring b of length m, such that ϕ(x) b = m i=1 x i (−1) bi .Note that the equivalence ∥ϕ(x) − ϕ(x ′ )∥ ∞ = ∥x − x ′ ∥ 1 holds for all x, x ′ ∈ R m .The L ∞ diameter problem can be solved in linear time: we only need to find the maximum and minimum value of each component.
This computation is linear in |S| but exponential in |O|.It outperforms the direct approach when the number of observations is small, but the problem becomes harder as the number of observations grows.When m = Θ(n) there is no sub-quadratic algorithm for the L p -diameter problem for any p ≥ 0 [12] .This suggests that there may not be any sub-quadratic time for computing β * either.

C. An efficient approximation of β *
We present an estimation of β * that can be obtained in O(nm) time.One selects an arbitrary distribution q ∈ D(O) and computes the maximal distance d between any channel row and q.The diameter of C S is at most 2d, giving a lower bound on β * .Furthermore, if q lies within the convex hull of C S (denoted by ch(C S )), then the diameter is at least d, giving also an upper bound: Good choices for q are distributions that are likely to lie "in-between" the two maximally distant rows, for instance the centroid of C S (mean of all rows).
Several advanced approximation algorithms exist for the L 2 diameter problem [23]; these could be employed using some embedding of L 1 into L 2 .The trivial embedding has distortion √ m (since ∥x∥ 2 ≤ ∥x∥ 1 ≤ √ m∥x∥ 2 ), hence the approximation factor may be too loose as |O| grows.Low distortion embeddings of L 1 into L 2 exist [3], but it is unclear if they can be applied to the diameter problem.In Section I, we conduct an empirical study of these approximations.

D. Black-box estimation of β *
The previous sections assume full knowledge of the channel C. In practice, this assumption may fail: systems may be too complex to analyze, or their behavior may be unknown.In such cases, we can estimate the Bayes risk, and therefore β, using black-box estimation tools (i.e., only observing the system's inputs and outputs), such as F-BLEAU [10].As with the white-box case, we need to reduce the number of priors π for which we estimate β(π , C).
Bounds.A first approach is to use the bounds given by Proposition 5, which can be computed in a black-box setting.One can interact with the system to obtain observations for q.For instance observe: the mean row, by drawing observations from the channel with secrets that are chosen uniformly at random; the any row of the channel, by drawing observation for a secret chosen arbitrarily; or a row with arbitrary distribution, e.g., by sampling q uniformly at random from the set O. Building upon R * black-box estimators [10].If domain constraints do not enable identifying the pair of leakiest secrets (Section VII-A), we can try to reduce the search space.For instance, we can exploit the triangle inequality on the total variation distance to discard some solutions before computing them.E.g., given the Bayes security for the priors π ac and π bc : Thus, if β(π ab , C) is larger than some already-known β(π ij , C) there is no need to compute it.Conversely, if it is upper bounded by a small quantity, we can compute it earlier aiming at discarding other combinations.

VIII. DISCUSSION AND CONCLUSIONS
This paper provides building blocks for studying complex algorithms on the basis of Bayes security, a metric that generalizes the cryptographic advantage.Bayes security inherits benefits from both average-case metrics, such as advantage and Bayes risk, and worst-case metrics, such as DP.Similarly to the advantage, Bayes security is threat-specific: it captures the risk for the users in a specified threat model (e.g., what's the probability that a user's data record is leaked).Like DP, Bayes security is easily composable, and it reflects the worst-case for the two most vulnerable secrets (e.g., data records).Yet, Bayes security is a weaker worst-case notion than DP, which may enable utility gains in high-security regimes (Section VI).Applications.The above characteristics make Bayes security suitable for a broad range of security and privacy settings.Below, we discuss some particularly fitting examples.Website fingerprinting.In website fingerprinting (WF), an adversary with access to an encrypted network tunnel (e.g., VPN or Tor) aims to infer the websites being visited by a user.The success rate (or accuracy) of an attacker has been used for years as a way of evaluating an attack's goodness.However, this metric suffers from some drawbacks [24], [36].First, comparing success rate across studies is meaningless, as the number of websites the user can visit strongly affects it: the attack is very simple is the user is only allowed to visit 2 websites as opposed to 100.Second, the prior probability of each website being visited highly skews the success rate; if a website is easy to distinguish from the others and it is very likely to be visited, then the attacker's accuracy would be largely inflated.The use of Mutual Information was suggested as an alternative metric [28].However, Smith showed that this metric does not capture the standard threat model used in WF, and it may be misleading if we are ultimately interested in learning about an attacker's success probability [32].
β was introduced for WF evaluation [11], although without any theoretical justification.In this work, we developed a theory for β, and we showed that its minimizer, the Bayes security metric, is particularly suited for WF: i) it is prior independent; ii) it measures the risk for the two leakiest secrets (i.e., the two websites that are the easiest to tell apart); iii) as shown in Figure 5, it is captures particularly well the case of sparse prior -in WF, the prior over websites is highly sparse.Overall, this suggests Bayes security is an appropriate choice evaluating the user's risks against WF and, similarly, the information leakage of WF defenses.Future work may study if Bayes security implies bounds w.r.t.other metrics of interest, such as True/False positives or Precision and Recall (Section V).PPML.We suspect privacy preserving ML (PPML) algorithms can be easily studied by using Bayes security.Its strengths for this kind of analysis are: i) it is easy to derive it analytically (e.g., as the total variation of the posterior for the two leakiest secrets) (Section III); ii) for a large secret space (e.g., data records in a dataset), it characterizes the risk for the most vulnerable ones; this, we argue, gives an easy interpretation of its guarantees; iii) its prior independence helps studying mechanisms irrespective of the adversary's prior knowledge; and, once the attacker's prior is known, it can be plugged in to better capture the risk (Section V); iv) where an analytical study is not possible, Bayes security can be easily estimated in a black-box manner (Section VII).Overall, we expect future work can provide Bayes security-style guarantees for complex ML training pipelines.For example, by exploiting our results on the Gaussian mechanism (Section VI), it may be possible to study the security of DP-SGD against common attacks such as membership inference [31], attribute inference [20], and reconstruction [4], [7].This will enable bypassing bounds relating ε and the advantage [22], [40], by computing the advantage (or Bayes security) directly.One immediate implication of Theorem 1 is that evaluating membership inference attacks via the cryptographic advantage (which, in this case, matches Bayes security), gives guarantees for any prior probability that "members" may have.Data release mechanisms.Our analysis in Section VI suggests that, when defending large datasets, Bayes security may help getting better utility than DP in high-privacy regimes.Fairness.Bayes security captures the risk for the most vulnerable pair of users (Theorem 1).We suspect this characteristic can be adapted for evaluating privacy fairness (e.g., whether some population subgroups enjoy better privacy than others).Further extensions.In this paper, we discussed various extensions that may further improve Bayes security's suitability to tackle complex algorithms.For example, proving a form of the miracle theorem (Section V-E) would give analysts even further flexibility when defining threat models for realworld attacks.Moreover, given the equivalence between Bayes security and total variation (Theorem 2), it may be possible to exploit research on total variation estimation to improve black-box leakage estimation techniques.
In conclusion, Bayes security opens a new space in the security metrics space, offering designers the opportunity to obtain different trade-offs than previous metrics.As we showed in Section VI, these trade-offs enable the choice of security parameters that provide strong protection and potentially with less utility impact under the threat model one chooses.
Proof.Consider the k-dimensional simplex Simp determined by the k non-zero components of π.Since π ′ has at most the same k non-zero components, it is an element of Simp.Consider imaginary lines from π ′ to each of the vertices of Simp.A vertex of Simp is a vector of the form (0, ..., 0, 1, 0, ..., 0), i.e., one component is 1 and all the others are 0. Furthermore, the 1 must be in correspondence of a non-zero component of π.These lines determine a partition of Simp in convex subspaces, and π must belong to one of them.Hence π can be expressed as a convex combination of π ′ and some vertices of Simp, say π 1 , ..., π h .Namely, π = cπ ′ + c 1 π 1 + ... + c h π h for suitable convex coefficients c, c 1 , ..., c h .Furthermore, since π has k non-zero components, it is an internal point of Simp, and therefore c must be non-zero.Hence, we have: where the third step comes from the concavity of R * , and the last one is because R * (π j , C) = 0, ∀j, since π j is a vertex.Therefore, since c is not 0, R * (π ′ , C) must be 0.
The second claim of the theorem states the existence of a channel C ′ for which equality is reached.We define C ′ so that it coincides with C in the rows corresponding to the non-zero components of π.Define all the other rows identical to the previous ones (it does not matter which ones are chosen).Then: therefore proving the second part of the theorem.

B. Proof of Theorem 1
Let U (k) , for k = 1, ..., n, be the set of priors with exactly k non-zero components, and such that the distribution on those components is uniform.In the following we indicate with U the set U = U (1) ∪ U (2) ∪ ... ∪ U (n) .
We start by recalling the following definition from [8] (Definition 3.2, simplified).Definition 1.Let S be a subset of a vector space, let g : S → R, and let S ′ ⊆ S. We say that g is convexly generated by S ′ if for all v ∈ S there exists S ′′ ⊆ S ′ such that there exists a set of convex coefficients {c u } u∈S ′′ (i.e., satisfying u c u = 1 and c u ≥ 0 ∀u ∈ S ′′ ) such that: The following results were also proven in the same reference (Proposition 3.9 in [8]).Proposition 6. G is convexly generated by U.
The elements of U are called corner points of G, and the elements of each set U (k) are the corner points of order k.
We now prove that if a function is defined as the ratio of a concave function and a convexly generated one, then its minimum is attained on one of the corner points of the function in the denominator.This will be important to characterize the minimum of the Bayes security metric, which is indeed defined as the ratio of the Bayes risk and the guessing error.
Lemma 2. Let S be a subset of a vector space.Let f : S → R ≥0 be a concave function, and let g : S → R ≥0 be a function that is convexly generated by a finite S ′ ⊆ S, and which is positive in at least some of the elements of S. Then there exists u ∈ S ′ such that u = arg min v:g(v)>0 f (v) /g(v).
Proof.Assume by contradiction that ∃v ∈ S such that g(v) > 0 and ∀u ∈ U with g(u) > 0 Since g is convexly generated by S ′ , and g(v) > 0, ∃S ′′ ⊆ S ′ such that g(v) = u∈S ′′ c u g(u), where c u are suitable convex coefficients, and ∀u ∈ S ′′ g(u) > 0. Therefore: (by concavity of f ) u∈S ′′ cug(u) (by Equation 8) which is impossible.Furthermore, S ′ is finite, hence f (u) g(u) | u ∈ S ′ , g(v) > 0 has a minimum.Furthermore, note that because G in its corner points of order 1 takes value 0, the corner point of G on which β is minimized must have order k ≥ 2.
We now can prove Theorem 1.It remains to show that the corner points on which β is minimized have order k = 2. Proof.We show this result by induction over n, the cardinality of S, where we assume n ≥ 2.
Inductive step.Assuming we proved the result for n, we prove it for n + 1.By Corollary 3, it is sufficient to show that: Consider the (n + 1) × m channel matrix C. For each row i, we define p i as the sum of elements of the row which are the maximum in their column.(Ties are broken arbitrarily.)I.e., where I(S) is the indicator function, i.e., the function that gives 1 if the statement S is true, and 0 otherwise.Similarly, we define q i as the sum of elements which are the second maximum in the columns that have maximum in column i.More precisely, let smax(A) be the function returning the second maximum in a set A; for instance, if a 1 ≥ a 2 ≥ a 3 ≥ . .., then smax({a i }) = a 2 .Again, ties are broken arbitrarily.Then: Note that the elements that compose q i are in rows different from i and possibly different from each other.
Without loss of generality, assume that we have: We further denote by r o , for o = 1, . . ., k, the elements of the (n + 1)-th row that are not the components of p n+1 , namely The following observation is immediate: Fact 1.For all i ∈ {1, ..., n + 1}, we have q i ≥ r i .
Observe that: Therefore, to prove Equation 9 we need to demonstrate that: By simplifying and rearranging: By the assumption in Equation 10, we have: C. Proofs of Section VII Theorem 2. For any channel C, it holds that Proof.Denote by π ab the prior assigning probability 1/2 to a, binS, a ̸ = b.We show that Hence, we have that and (11) follows directly.We conclude by Theorem 1.
We now show that taking convex combinations of vectors cannot increase the diameter of a set, which will be useful for both Proposition 5 and Theorem 4. Denote by diam(S), ch(S) the diameter and the convex hull of S respectively.
Proof.Let s, s ′ ∈ S, from the triangle inequality we have that hence diam(C S ) ≤ 2d; Theorem 2 implies the lower bound.Moreover, assume that q ∈ ch(C S ).Since C s ∈ ch(C S ) for all s ∈ S, it holds that diam(ch(C S )) ≥ max Proof.Recall that π ab ∈ D(S) denotes the prior that assigns probability 1/2 to both a, b ∈ S, a ̸ = b.We will use the fact that for such priors, β(π ab , C) can be written as This comes from the definition of β and the fact that the rows C a and C b are probability distributions, hence We also use the basic fact that for non-negative {q i , r i } i : The proof proceeds as follows: This bound is tight.E.g., Theorem 4. For all channels C 1 , C 2 it holds that ) part is easy, and comes from the fact that R for all priors π, hence also for the one achieving β * (C 1 C 2 ).
The more interesting part is to show that β * (C 1 C 2 ) ≥ β * (C 2 ).The key observation is that the rows of C 1 C 2 are convex combinations of those of C 2 : Denote by C S = {C s | s ∈ S} the set of C's rows, we have: from which we can compute the Bayes risk of C in π * , as a function of x and y: In order to find the minimum of f (x, y), we compute its partial derivatives:  Hence the bound is a minimum.Figure 3 shows two examples of such matrices.

F. Generalization using gain/loss functions
We describe here the generalizations of L × and β, parameterized by a gain function g (for vulnerability) or a loss function ℓ (for risk), discussed in Section V.
Let W be the set of guesses the adversary can make about the secret; a natural choice is W = S, but other choices model a variety of adversaries, (e.g., guessing a part or property of the secret, or making an approximate guess).A gain function g(w, s) models the adversary's gain when guessing w ∈ W and the actual secret is s ∈ S. Prior and posterior g-vulnerability [2] are the expected gain of an optimal guess: V g (π) = max w s π s g(w, s) , V g (π, C) = o p(o)V g (δ o ) , where δ o is the posterior distribution on S produced by the observation o.Then g-leakage expresses how much vulnerability increases due to the channel: L × g (π, C) = V g (π, C)/V g (π).
Similarly, we use a loss function ℓ(w, s), modelling how the adversary's loss in guessing w when the secret is s.Prior and posterior ℓ-risk are the expected loss of an optimal guess: R ℓ (π) = min w s π s ℓ(w, s) , R ℓ (π, C) = o p(o)R ℓ (δ o ) .

G. Proofs of Section VI
We first prove results on the Bayes security of Gaussian mechanisms, which serves as a template for the security derivation for the Laplace distribution. .
We apply the above result to compute the total variation between Λ( µp /λ, 1) and Λ( µq /λ, 1) (which is equal to the total variation between Λ(µ p , λ) and Λ(µ q , λ)), and use the equivalence between total variation and Bayes security metric to conclude the proof.

H. Empirical evaluation of Randomized Response
Dataset.The US 1990 Census dataset (Census1990) comprises 2,458,285 records with a number of attributes for each record.As Murakami and Kawamoto [30], we reduced the attributes to those we judged potentially sensitive: age (8 values), income (5 values), marital status (5 values), and sex (2 values).Overall, the number of values that can be taken by the vectors describing an individual is 8 × 5 × 5 × 2 = 400.This is the size of both the secret and output spaces for RR.Methodology.First, we use RR to obfuscate the Census1990 dataset to guarantee different levels of ε-LDP, and measure the resulting utility by computing the total variation distance between the empirical estimation p and the true distribution p.
Second, we compute the Bayes security for RR, both analytically using Equation 7 for different number of secrets n = |S|, and empirically using fbleau [10].Because the Bayes security between any pair of secrets obfuscated by RR is identical, one computation for one arbitrary pair suffices to obtain β * .Results.We show in Figure 7 the security of the empirical estimation p (Bayes security, empirical and estimation, in green, and DP in the x-axis), and the utility after applying RR to obtain ε-LDP for ε ∈ [0.01, 10].We observe that utility is low for values ε < 2. Concretely, utility reaches 95% for ε = 3.3.While this is weak protection in terms of differential privacy, we obtain β * = 0.96 fbleau estimate (β * = 0.99999 from Equation 7), which means that the adversary's probability of success is small.Even for ε = 4.8, which yields utility of 99%, β * is above 0.9 (β * = 0.99995 analytic value).

I. Bayes security approximation via Proposition 5
In Figure 8, we show β * and various lower bounds for the Randomized Response mechanism (RR) (Section VI).Two of the bounds are obtained via Proposition 5, by setting q to be the mean row (the uniform distribution for RR) and any row of C (all rows are equal for RR).The third bound is obtained via the L 2 -diameter by using the trivial embedding.The mean-row bound is the most accurate in this case.

Fig. 1 :
Fig. 1: Posterior probability distribution for 5 secrets obfuscated with a two-dimensional Laplace.The Bayes security metric is the complement of the total variation distance between the posterior of the most distinguishable secrets (shown in red).

Fig. 4 :
Fig. 4: The blue line illustrates the lower bound of ε-DP on β expressed by Corollary 2).The orange line represents the lower bound on β derived from the one proved in Yeom et al. [40] for the advantage of a membership inference adversary.

Fig. 5 :
Fig. 5: Tightness of the bounds on Bayes security and multiplicative leakage with respect to sparsity.Note that, because the two metrics have different scales, these plots are useful to compare their behavior, and not their actual values.

A
. Randomized Response Randomized Response (RR) is a simple obfuscation protocol that guarantees ε-LDP.It randomly assigns a data record to a new data record from the same range.Assuming S = O, RR is represented as the following channel matrix, R : S → O: R s,o = P (o|s

Theorem 7 .≥ 1
Let n = |S|, and let υ denote the uniform prior on S. For any prior π with k non-zero components, ifβ(π, C) = 0 then β(υ, C) ≤ 1 − k /n 1 − 1 /n .Moreover, there exists a channel C for which equality is reached.Proof.Let m = |O|, and let S ′ be the set of the non-zero components of π.It is sufficient to note that:o∈O C s,o υ(s) /n (C s1,o1 + ... + C sm,om ) where s i = arg max s∈S ′ C s,oi = k /n ;the last equality is due to the fact that, by definition of s i , o max s∈S ′ C so = (C s1,o1 + ... + C smom ) .
Denote C ↑,o = max s∈a,b C s,o and C ↓,o = min s∈a,b C s,o .The fact that β(π ab , C) = 2 − o C ↑,o comes directly from the definition of β.Since C a and C b are probability distributions, it holds that o (C ↑,o + C ↓,o ) = 2

Lemma 3 .Proposition 5 .
For any S ⊆ R n , it holds that diam(ch(S)) = diam(S), where distances are measured wrt any norm ∥ • ∥.Proof.Let d = diam(S).Since S ⊆ ch(S) we clearly have d ≤ diam(ch(S)), the non-trivial part is to show that d ≥ diam(ch(S)).We first show that ∀a ∈ S, b ∈ ch(S) : ∥a − b∥ ≤ d .(12) Let a ∈ S, b ∈ ch(S) and denote by B d [a] the closed ball of radius d centered at a.The diameter of S is d, hence B d [a] ⊇ S , and since balls are convex B d [a] = ch(B d [a]) ⊇ ch(S) , which implies ∥a − b∥ ≤ d.Finally we show that ∀b, b ′ ∈ ch(S) : ∥b − b ′ ∥ ≤ d .Let b, b ′ ∈ ch(S), from (12) we know that B d [b] ⊇ S, and since balls are convex we have that B d [b] ⊇ ch(S), which implies ∥b − b ′ ∥ ≤ d.Let C be a channel, q ∈ D(O), and d = max s∈S ∥C s − q∥ 1 .Then 1 s∈S ∥C s − q∥ 1 = d .From Lemma 3 we get that diam(C S ) = diam(ch(C S )) ≥ d , which gives us an upper bound from Theorem 2. D. Proofs of Section IV Theorem 3.For all channels C 1 , C 2 it holds that

C
1+exp(ε)  .Examples of such C * 's are illustrated in Figure3.Proof.From Theorem 1 we know that for every C there exists a π * with a support containing only two secrets, and uniformly distributed on them, such that β * (π * ) = min π β(π, C).Given C, let us assume, without loss of generality, that the two secrets are s 1 and s 2 , and that for each o in the first k columns we have C s1,o ≥ C s2,o , and in the last m − k columns we haveC s1,o < C s2,o .Then, if we define a s2,oj ,(15)we have 1 − b ≤ a and 1 − a < b.From the constraints (2), we also know that a ≤ exp(ε)(1−b) and b ≤ exp(ε)(1−a).Hence there exists x, y with 1 ≤ x ≤ exp(ε) and 1 < y ≤ exp(ε) such that a = x(1 − b) and b = y(1 − a).The figure below illustrates the situation in the first two rows of the matrix: a = x (1 − b) 1 − a 1 − b b = y (1 − a) From a = x(1 − b) and b = y(1 − a) we derive a = xy − x xy − 1 and b = xy − y xy − 1 ,

TABLE I :
Notation A channel matrix, where Cs,o = P (o | s) for s ∈ S, o ∈ O.