A program logic for union bounds

We propose a probabilistic Hoare logic aHL based on the union bound, a tool from basic probability theory. While the union bound is simple, it is an extremely common tool for analyzing randomized algorithms. In formal veriﬁcation terms, the union bound allows ﬂexible and compositional reasoning over possible ways an algorithm may go wrong. It also enables a clean separation between reasoning about probabilities and reasoning about events, which are expressed as standard ﬁrst-order formulas in our logic. Notably, assertions in our logic are non-probabilistic, even though we can conclude probabilistic facts from the judgments. Our logic can also prove accuracy properties for interactive programs, where the program must produce intermediate outputs as soon as pieces of the input arrive, rather than accessing the entire input at once. This setting also enables adaptivity, where later inputs may depend on earlier intermediate outputs. We show how to prove accuracy for several examples from the diﬀerential privacy literature, both interactive and non-interactive.


Introduction
Probabilistic computations arise naturally in many areas of computer science.For instance, they are widely used in cryptography, privacy, and security for achieving goals that lie beyond the reach of deterministic programs.However, the correctness of probabilistic programs can be quite subtle, often relying on complex reasoning about probabilistic events.Accordingly, probabilistic computations present an attractive target for formal verification.A long line of research, spanning more than four decades, has focused on expressive formalisms for reasoning about general probabilistic properties both for purely probabilistic programs and for programs that combine probabilistic and non-deterministic choice (see, e.g., [29,34,35]).
More recent research investigates specialized formalisms that work with more restricted assertions and proof techniques, aiming to simplify formal verification.As perhaps the purest examples of this approach, some program logics prove probabilistic properties by working purely with non-probabilistic assertions; we call such systems lightweight logics.Examples include probabilistic relational Hoare logic [3] for proving the reductionist security of cryptographic constructions, and the related approximate probabilistic relational Hoare logic [4] for reasoning about differential privacy.These logics rely on the powerful abstraction of probabilistic couplings to derive probabilistic facts from non-probabilistic assertions [7].

107:2 A program logic for union bounds
Lightweight logics are appealing because they can leverage ideas for verifying deterministic programs, a rich and well-studied area of formal verification.However, existing lightweight logics apply only to relational verification: properties about the relation between two programs.In this paper, we propose a non-relational, lightweight logic based on the union bound, a simple tool from probability theory.For arbitrary properties E 1 , . . ., E n , the union bound states that Pr[E i ] .Typically, we think of the events E i as bad events, describing different ways that the program may fail to satisfy some target property.Bad events can be viewed as propositions on single program states, so they can be represented as non-probabilistic assertions.For example, the formula x > 10 defines a bad event for x a program variable.If x stores the result from a random sample, this bad event models when the sample is bigger than 10.The union bound states that no bad events happen, except with probability at most the sum of the probabilities of each bad event.
The union bound is a ubiquitous tool in pen-and-paper proofs due to its flexible and compositional nature: to bound the probability of a collection of failures, consider each failure in isolation.This compositional style is also a natural fit for formal verification.To demonstrate this, we formalize a Hoare logic aHL based on the union bound for a probabilistic imperative language.The assertions in our logic are non-probabilistic, but judgments carry a numeric index for tracking the failure probability.Concretely, the aHL judgment β c : Φ =⇒ Ψ states that every execution of a program c starting from an initial state satisfying Φ yields a distribution in which Ψ holds except with probability at most β.We define a proof system for the logic and show its soundness.We also define a sound embedding of aHL into standard Hoare logic, by instrumenting the program with ghost code that tracks the index β in a special program variable.This is a useful reduction that also applies to other lightweight logics [5].
Moreover, our logic applies both to standard algorithms and to interactive algorithms, a richer class of algorithms that is commonly studied in contexts such as online learning (algorithms which make predictions about the future input) and streaming (algorithms which operate on datasets that are too large to fit into memory by processing the input in linear passes).Informally, interactive algorithms receive their input in a sequence of chunks, and must produce intermediate outputs as soon as each chunk arrives.In some cases the input can be adaptive: later inputs may depend on earlier outputs.Besides enabling new classes of algorithms, interactivity allows more modularity.We can decompose programs into interacting parts, analyze each part in isolation, and reuse the components.
We demonstrate aHL on several algorithms satisfying differential privacy [15], a statistical notion of privacy which trades off between the privacy of inputs and the accuracy of outputs.Prior work on verifying private algorithms focuses on the privacy property for non-interactive algorithms (see, e.g.[4,18,37]).We provide the first verification of accuracy for both non-interactive and interactive algorithms.We note however that aHL, like the union bound, can be applied to a wide range of probabilistic programs beyond differential privacy.

107:3
into Hoare logic.The semantics of the language and the proof of soundness are deferred to the appendix.

Language
We will work with a core imperative language with a command for random sampling from distributions, and procedure calls.The set of commands is defined as follows: Here, X is a set of variables, E is a set of expressions, and D is a set of distribution constructors, which can be parameterized by standard expressions.Variables and expressions are typed, ranging over booleans, integers, lists, etc.The expression grammar is entirely standard, and we omit it.
We distinguish two kinds of procedure calls: A is a set of external procedure names, and F is a set of internal procedure names.We assume we have access to the code of internal procedures, but not the code of external procedures.We think of external procedures as controlled by some external adversary, who can select the next input in an interactive algorithm.Accordingly, external procedures run in an external memory separate from the main program memory, which is shared by all internal procedures.
For simplicity, procedures take a single argument, do not have local variables, and are not mutually recursive.A program consists of a sequence of procedures definitions, each of the following form: proc f (arg f ){c; return r; } Here, f is a procedure name, arg f ∈ Vars is the formal argument of f , c is the function body and r is its return value.We assume that distinct procedure definitions do not bind the same procedure name and that the program variable arg f can only appear in the body of f .Before we define the program semantics, we first need to introduce a few definitions from probability theory.Definition 1.A discrete sub-distribution over a set A is defined by a mass function µ : A → [0, 1] such that: the support supp(µ) of µ-defined as {x ∈ A | µ(x) = 0}-is countable; and the weight wt(µ) of µ-defined as x∈A µ(x)-satisfies wt(µ) ≤ 1.
A distribution is a sub-distribution with weight 1.The probability of an event P w.r.t.µ, written Pr µ [P ] (or Pr[P ] when µ is clear from the context), is defined as x∈A|P (x) µ(x).When Φ is an assertion (assuming that A ≡ State), we write Pr

Logic
Now that we have seen the programs, let us turn to the program logic.Our judgments are similar to standard Hoare logic with an additional numeric index representing the probability of failure.Concretely, the judgments are of the following form: where Φ and Ψ are first-order formulas over the program variables representing the pre-and post-condition, respectively.We stress that Φ and Ψ are non-probabilistic assertions: they do not mention the probabilities of specific events, and will be interpreted as properties of individual memories rather than distributions over memories.This is reflected by the validity relation for assertions: m |= Φ states that Φ is valid in the single memory m, rather than in a distribution over memories.Similarly, |= Φ states that Φ is valid in all (single) memories.By separating the assertions from the probabilistic features of our language, the assertions are simpler and easier to manipulate.The index β is a non-negative real number (typically, from the unit interval [0, 1]).Now, we can define semantic validity for our judgments.In short, the index β will be an upper bound on the probability that the postcondition Ψ does not hold on the output distribution, assuming the precondition Φ holds on the initial memory.

Definition 2 (Validity).
A judgment β c : Φ =⇒ Ψ is valid if for every memory m such that m |= Φ, we have: We present the main proof rules of our logic in Figure 1.The rule for random sampling [Rand] allows us to assume a proposition Ψ about the random sample provided that Ψ fails with probability at most β.This is a semantic condition which we introduce as an axiom for each primitive distribution.
The remaining rules are similar to the standard Hoare logic rules, with special handling for the index.The sequence rule [Seq] states that the failure probabilities of the two commands add together; this is simply the union bound internalized in our logic.The conditional rule [If] assumes that the indices for the two branch judgments are equal-which can always be achieved via weakening-keeping the same index for the conditional.Roughly, this is because only one branch of the conditional is executed.The loop rule [While] simply accumulates the failure probability β throughout the iterations; the side conditions ensure that the loop terminates in at most k iterations except with probability k • β.To reason about procedure calls, standard (internal) procedure calls use the rule [Call], which substitutes the argument and return variables in the pre-and post-condition, respectively.External procedure calls use the rule [Ext].We do not have access to the implementation of the procedure; we know just the type of the return value.
The structural rules are also similar to the typical Hoare logic rules.The weakening rule [Weak] allows strengthening the precondition and weakening the postcondition as usual, but also allows increasing the index-this corresponds to allowing a possibly higher probability of c does not modify variables in Φ We can show that our proof system is sound with respect to the semantics; the proof is deferred to the appendix.

Theorem 3 (Soundness). All derivable judgments
In addition, we can define a sound embedding into Hoare logic in the style of Barthe et al. [5].Assuming a fresh program variable x β of type R, we can transform a command c such that β c : Φ =⇒ Ψ to a new command c and a proof of the standard Hoare logic judgment The command c is obtained from c by replacing all probabilistic sampling x $ ← d(e) with a call to an abstract, non-probabilistic procedure call x ← Sample (d(e)), whose specification models the postcondition of [Rand]:

Accuracy for differentially private programs
Now that we have presented our logic aHL, we will follow by verifying several examples.Though our system applies to programs from many domains, we will focus on programs satisfying differential privacy, a statistical notion of privacy proposed by Dwork et al. [15].At a very high level, these programs take private data as input and add random noise to protect privacy.(Interested readers should consult a textbook [14] for a more detailed presentation.) In contrast to existing formal verification work, which verifies the privacy property, we will verify accuracy.This is just as important as privacy: the constant function is perfectly private but not very useful.All of our example programs take samples from the Laplace distribution.
Definition 4. The (discrete) Laplace distribution L (e) is parameterized by a scale parameter > 0 and a mean e.The distribution ranges over the real numbers {ν = k + e} for k an integer, releasing ν with probability proportional to: This distribution satisfies a basic accuracy property.Lemma 5. Let β ∈ (0, 1), and let ν be a sample from the distribution L (e).Then, Thus, the following sampling rule is sound for our system for every β ∈ (0, 1): Before presenting the examples, we will set some common notations and terminology.First, we consider a set db of databases,1 a set query of queries, and primitive functions Concretely, one can identify query with the functions db → R and obtain an easy realization of the above functions and axioms.
In some situations, we may need additional structure on the queries to prove the accuracy guarantees.In particular, a query q is linear if for every two databases d, d , we have q(d + d ) = q(d) + q(d ) for a commutative and associative operator + on databases; and for the database d 0 that is the identity of +, we have q(d 0 ) = 0.
Concretely, we can identify db with the set of multisets, + with multiset union, and d 0 with the empty multiset.

Report-noisy-max
Our first example is the Report-noisy-max algorithm (see, e.g., Dwork and Roth [14]).Reportnoisy-max is a variant of the exponential mechanism [32], which provides the standard way to achieve differential privacy for computations whose outputs lie in a finite (perhaps nonnumeric) set R. Both algorithms perform the same computations, except that the exponential mechanism adds one-sided Laplace noise whereas Report-noisy-max adds regular Laplace noise.Thus, accuracy for both algorithms is verified in essentially the same way.We focus on Report-noisy-max to avoid defining one-sided Laplace.
Report-noisy-max finds an element of a finite set R that approximately maximizes some quality score function qscore, which takes as input an element r ∈ R and a database d.Operationally, Report-noisy-max computes the quality score for each element of R, adds Laplace noise, and returns the element with the highest (noisy) value.We can implement this algorithm with the following code, using syntactic sugar for arrays: The scale /2 of the Laplace distribution ensures an appropriate level of differential privacy under certain assumptions; we will not discuss privacy in the remainder.Theorem 6.Let β ∈ (0, 1), and let res ∈ R be the output of Report-noisy-max on input d and quality score qscore.Then, we have the following judgment: where |R| denotes the size of R.This corresponds to the existing accuracy guarantee for Report-noisy-max (see, e.g., Dwork and Roth [14]).
Roughly, this theorem states that while the result res may not be the element with the absolute highest quality score, its quality score is not far below the quality score of any other element.For a brief sanity check, note that the guarantee weakens as we increase the range R, or decrease the failure probability β.
The proof of accuracy is based on an instantiation of the rule [LapAcc] with e set to qscore(r, d), β set to β/|R|, and set to /2.First, we can show

A program logic for union bounds
In order to prove this judgment, the loop invariant quantifies over all previously seen r ∈ R.
Combined with a straightforward invariant showing that r * stores the index of the current maximum (noisy) score, the above judgment suffices to prove the accuracy guarantee for Report-noisy-max (Theorem 6).

Sparse Vector algorithm
Our second example is the Sparse Vector algorithm, which indicates which numeric queries take value (approximately) above some threshold value (see, e.g., Dwork and Roth [14]).
Simpler approaches can accomplish this task by releasing the noisy answer to all queries and then comparing with the threshold, but the resulting error then grows linearly with the total number of queries.Sparse Vector does not release the noisy answers, but the resulting error grows only logarithmically with the total number of queries-a substantial improvement.
The differential privacy property of Sparse Vector was recently formally verified [8]; here, we consider the accuracy property.
In the non-interactive setting, the algorithm takes as input a list of queries q 1 , q 2 , . . ., a database d, and a numeric threshold t ∈ R. 2 First, we add Laplace noise to the threshold t to calculate the noisy threshold T .Then, we evaluate each query q i on d, add Laplace noise, and check if the noisy value exceeds T .If so, we output ; if not, we output ⊥.
Sparse Vector also works in the interactive setting.Here, the algorithm is fed one query at a time, and must process this query (producing ⊥ or ) before seeing the next query.The input may be adaptive-future queries may depend on the answers to earlier queries.
We focus on the interactive version; the non-interactive version can be handled similar to Report-noisy-max.We break the code into two pieces.The first piece initializes variables and computes the noisy threshold, while the second piece accepts a single new query and returns the answer.proc SV.Init(T in , in ) : proc SV.Step(q) : a $ ← L /4 (evalQ(q, d)); if (a < T ) then {z ← ⊥; } else {z ← ; } return z; The main procedure performs initialization, and then enters into an interactive loop between the external procedure A-which supplies the queries-and the Sparse Vector procedure SV.Step: Step(q[u]); return ans; Sparse Vector satisfies the following accuracy guarantee.
To prove this theorem, we first specify the procedures SV.Init and SV.
Step.For initialization, we have For the interactive step, we have Combining these two judgments, we can prove accuracy for SV.main (Theorem 7).

Online Multiplicative Weights
Our final example demonstrates how we can use the union bound to analyze a complex combination of several interactive algorithms, yielding sophisticated accuracy proofs.We will verify the Online Multiplicative Weights (OMW) algorithm first proposed by Hardt and Rothblum [21] and later refined by Gupta et al. [20].Like Sparse Vector, this interactive algorithm can handle adaptive queries while guaranteeing error logarithmic in the number of queries.Unlike Sparse Vector, OMW produces approximate answers to the queries instead of just a bit representing above or below threshold.
At a high level, OMW iteratively constructs a synthetic version of the true database.The user can present various linear queries to the algorithm, which applies the Sparse Vector algorithm to check whether the error of the synthetic database on this query is smaller than some threshold.If so, the algorithm simply returns the approximate answer.Otherwise, it updates the synthetic database using the multiplicative weights update rule to better model the true database, and answers the query by adding Laplace noise to the true answer.An inductive argument shows that after enough updates, the synthetic database must be similar to the true database on all queries.At this point, we can answer all subsequent queries using the synthetic database alone.

I C A L P 2 0 1 6 107:10 A program logic for union bounds
In code, the following procedure implements the Online Multiplicative Weights algorithm.
The main routine depends on the multiplicative weights subroutine (MW), which maintains and updates the synthetic database.Roughly, MW takes as input the current synthetic database and a query where the synthetic database gives an answer that is far from the true answer.Then, MW improves the synthetic database to better model the true database.Our implementation of MW consists of two subroutines: MW.init initializes the synthetic database, and MW.step updates the current database with a query that has high error.The code for these subroutines is somewhat technical, and we will not present it here.
Instead, we will present their specifications, which are given in terms of an expression Ψ(x, d) where x is the current synthetic database and d is the true database.We omit the definition of Ψ and focus on its three key properties: Ψ(x, d) ≥ 0; Ψ(x, d) is initially bounded for the initial synthetic database; and Ψ(x, d) decreases each time we update the synthetic database.
Functions satisfying these properties are often called potential functions.
The first property follows from the definition of Ψ, while the second and third properties are reflected by the specifications of the MW procedures.Concretely, we can bound the initial value of Ψ with the following specification for MW.init: We can also show that Ψ decreases with the following specification for MW.step: We make two remarks.First, these specifications crucially rely on the fact that q is a linear query.Second, both procedures are deterministic.For such procedures, the fragment of aHL with index β = 0 corresponds precisely to standard Hoare logic.Now, let us briefly consider the key points in proving the main specification (Theorem 8).First, the key part of the invariant for the main loop is Roughly, Ψ is initially at most log X by the specification for MW.init, and every time we call MW.step we decrease Ψ by at least α 2 /4n 2 if the update query up has error at least α.Since Ψ is always non-negative, we can find at most c queries with high error-after c updates, the synthetic database mwdb must give accurate answers on all queries.
Prior to making c updates, there are two cases for each query.If at least one of the Sparse Vector calls returns above threshold, we set the update query up to be q[u] if the approximate answer is too high, otherwise we set up to be the negated query neqQ(q[u]) if the approximate answer is too low.With this choice of update query, we can show that evalQ(up, mwdb) − evalQ(up, d) ≥ α so Ψ decreases by at least α 2 /4n 2 .Then, we answer the original query q[u] by adding Laplace noise, so our answer is also within α of the true answer.Otherwise, if both Sparse Vector calls return below threshold, then the query q[u] is answered well by our approximation mwdb and there is no need to update mwdb or access the real database d.
The above reasoning assumes that Sparse Vector and the Laplace mechanisms are sufficiently accurate.To guarantee the former, notice that the Sparse Vector subroutine will process at most 2Q queries, so we assume that α is larger than the error α sv guaranteed by Theorem 7 for 2Q queries and failure probability β/2.To guarantee the latter, notice that we sample Laplace noise at most c times-once for each update step-so we assume that α is larger than the error α lap guaranteed by [LapAcc] for failure probability β/2c; by a union bound, all Laplace noises are accurate except with probability β/2.Taking α ≥ max(α sv , α lap ), both accuracy guarantees hold except with probability at most β, and we have the desired proof of accuracy for OMW (Theorem 8).

Related work
The semantics of probabilistic programming languages has been studied extensively since the late 70s.Kozen's seminal paper [28] studies two semantics for a core probabilistic imperative language.Other important work investigates using monads to structure the semantics of probabilistic languages; e.g.Jones and Plotkin [24].More recent works study the semantics of probabilistic programs for applications like statistical computations [9], probabilistic inference for machine learning [10], probabilistic modeling for software defined networks [17], and more.Likewise, deductive techniques for verifying probabilistic programs have a long history.Ramshaw [35] proposes a program logic with basic assertions of the form Pr[E] = p.Hart et al. [22], Sharir et al. [39] propose a method using intermediate assertions and invariants I C A L P 2 0 1 6 107:12 A program logic for union bounds for proving general properties of probabilistic programs.Kozen [29] introduces PPDL, a logic that can reason about expected values of general measurable functions.Morgan et al. [34] (see McIver and Morgan [31] for an extended account) propose a verification method based on computing greatest pre-expectations, a probabilistic analogue of Dijkstra's weakest pre-conditions.Hurd et al. [23] formalize their approach using the HOL theorem prover.Other approaches based on interactive theorem provers include the work of Audebaud and Paulin-Mohring [1], who axiomatize (discrete) probability theory and verify some examples of randomized algorithms using the Coq proof assistant.Gretz et al. [19] extend the work of Morgan et al. [34] with a formal treatment of conditioning.More recently, Rand and Zdancewic [36] formalize another Hoare logic for probabilistic programs using the Coq proof assistant.Barthe et al. [6] implement a general-purpose logic in the EasyCrypt framework, and verify a representative set of randomized algorithms.Kaminski et al. [25] develop a weakest precondition logic to reason about expected run-time of probabilistic programs.
Most of these works support general probabilistic reasoning and additional features like non-determinism, so they most likely could formalize the examples that we consider.However, our logic aHL aims at a sweet spot in the design space, combining expressivity with simplicity of the assertion language.The design of aHL is inspired by existing relational program logics, such as pRHL [3] and apRHL [4].These logics support rich proofs about probabilistic properties with purely non-probabilistic assertions, using a powerful coupling abstraction from probability theory [7] rather than the union bound.
Finally, there are many algorithmic techniques for verifying probabilistic programs.Probabilistic model-checking is a successful line of research that has delivered mature and practical tools and addressed a broad range of case studies; Baier and Katoen [2], Katoen [26], Kwiatkowska et al. [30] cover some of the most interesting developments in the field.Abstract interpretation of probabilistic programs is another rich source of techniques; see e.g.Cousot and Monerau [13], Monniaux [33].Katoen et al. [27] infer linear invariants for the pGCL language of Morgan et al. [34].There are several approaches based on martingales for reasoning about probabilistic loops; Chakarov and Sankaranarayanan [11,12] use martingales for inferring expectation invariants, while Ferrer Fioriti and Hermanns [16] use martingales for analyzing probabilistic termination.Sampson et al. [38] use a mix of static and dynamic analyses to check probabilistic assertions for probabilistic programs.

Conclusion and perspective
We propose aHL, a lightweight probabilistic Hoare logic based on the union bound.Our logic can prove properties about bad events in cryptography and accuracy of differentially private mechanisms.Of course, there are examples that we cannot verify.For instance, reasoning involving independence of random variables, a common tool when analyzing randomized algorithms, is not supported.Accordingly, a natural next step is to explore logical methods for reasoning about independence, or to embed aHL into a more general system like pGCL.
This appendix details the missing definitions of this paper main body, along with the proof of soundness of the presented logic.

A A bit more on discrete distributions
We start by defining some standard sub-distributions that are needed for giving the denotation semantic of our language: Definition 9. Let T be some set and x ∈ T .We denote by 1 T x ∈ Distr(T ) (resp.0 T ∈ Distr(T )) the Dirac distribution over T and centered on x (resp.the null subdistribution over T ): We write 1 x and 0, stripping T , when it is clear from the context.Let T and U be two sets.We denote by x←µ E(x), where µ is a sub-distribution over T and E : T → Distr(U ), the sub-distribution with mass function λv.x∈T E(x)(v) µ(x).
It is convenient to introduce the notion of restriction of a distribution.Definition 10 (Restriction of a sub-distribution).Let µ be a sub-distribution over T , and let P be a predicate over T .Then, the restriction of µ to P is defined as From the definition, it is clear that Pr µ|P

B Denotational semantics
We now give the denotation semantics of our language.We start by interpreting the expressions and distribution expressions, and then move to the interpretation of commands.

B.1 Types, expressions and distribution expressions
We fix a set T = {τ, σ, . ..} of types.We assume that T contains at least the unit type (unit), along with the types for booleans (bool) and integers (int).We now move to the interpretation of types, expressions and distribution expressions.

B.4 Denotational semantics
A memory m is any map of type Π(x : Vars ∪ {a}).τ x , where a is a special variable dedicated to the storage of the (shared) state of the external procedures -associating an abstract type τ a = A to it.Note that memories can be considered as valuations, simply forgetting the binding for a.For any external procedure A taking a parameter of type τ and returning a value of type σ, we assume given an interpretation A : A × τ → Distr(A × σ ).
Finally, if f is a internal procedure, we denote by f.arg (resp.f.body, f.res) the argument name (resp.the body, the return expression) of f .Definition 12 (Denotational Semantics).The denotational semantics of a command maps a memory to a sub-distribution over memories and is given in Figure 2, where abort is an extra command that never returns, and (if e then c else ) n ⊥ is inductively defined by: (

6 107: 4 A
µ [Φ] for Pr µ [λm.m |= Φ].Likewise, when v ∈ A, we write Pr µ [v] for Pr µ [λx.x = v].Commands are interpreted as a function from memories to sub-distributions over memories, where memories are finite maps from program and external variables to values.More formally, if State is the set of memories then the interpretation of c, written c , is a function from I C A L P 2 0 1 program logic for union bounds State to Distr(State), where Distr(T) denotes the set of discrete sub-distributions over T. The definition of c enforces the separation between the internal and external statesonly commands performing external procedure calls can act on the external memory.The interpretation of external procedure calls is parameterized by functions-one for each external procedure-of type State |A → Distr(State |A ), where State |A is the set of memories restricted to the external variables.Thus, external procedures can only access the external memory.

Figure 1
Figure 1 Selected proof rules.
For a variable x ∈ Vars, we denote the type associated to x by τ x .Moreover, for τ ∈ T , we write Vars τ for the subset {x ∈ Vars | τ x = τ } of Vars, and require that it is infinite.We also assume given a set O of operators and O D of distribution operators.To each operator o ∈ O is associated an arity o : [τ i ] i≤n → τ , where [τ i ] i is the domain of o and τ its codomain.Likewise, to each distribution operator d ∈ O D is associated an arity d : τ → σ, meaning that d is a distribution over σ parameterized by a value of type τ .We can now give the syntax of expressions and distribution expressions: Definition 11 (Expressions & distribution expressions).The set of expressions of type τ , written E τ , is defined by: E τ ::= x ∈ Vars τ | o(e 1 , . . ., e n ) with o : [τ i ] i → τ and ∀i.e i ∈ E τi .Likewise, the set of distribution expressions over σ is defined by: D τ ::= d(e) with d : σ → τ and e ∈ E σ .

if e then c) 0 ⊥=Figure 2 1 2 [
Figure 2 Denotational Semantics β I coming form β I c : I =⇒ I, whereas the comparison to 0 is a direct consequence of m |= I ∧ e ∧ e v = k → e v < k, obtained by instantiation of the logical premises.[Ext] We have c ≡ x ← A(e), Φ ≡ ∀v.Ψ[v/x] and β = 0. Let m |= Φ.Then, I C A L P 2 0 1 6 For any operator o with arity [τ i ] i → τ , we assume given o : × i τ i → τ .The interpretation of an expression e w.r.t a typed valuation ρ : Π(x : Vars).τx(i.e.w.r.t. a function that associates a value in τ x to any variable x ∈ Vars) is defined as usual:x ρ = ρ(x)and o(e 1 , . . ., e n ) ρ = o( e 1 ρ , . . ., e n ρ ).If e ∈ E τ , we have that e ρ ∈ τ .Likewise, for any distribution operator d : σ → τ is associated a function d from σ to Distr( τ ).The interpretation of a distribution expression d(e) w.r.t. a valuation ρ, written d(e) ρ , is defined by d(e) ρ = d( e ρ ).