Information Leakage of Non-Terminating Processes

In recent years, quantitative security techniques have been providing effective measures of the security of a system against an attacker. Such techniques usually assume that the system produces a ﬁnite amount of observations based on a ﬁnite amount of secret bits and terminates, and the attack is based on these observations. By modeling systems with Markov chains, we are able to measure the effectiveness of attacks on non-terminating systems. Such systems do not necessarily produce a ﬁnite amount of output and are not necessarily based on a ﬁnite amount of secret bits. We provide characterizations and algorithms to deﬁne meaningful measures of security for non-terminating systems, and to compute them when possible. We also study the bounded versions of the problems, and show examples of non-terminating programs and how their effectiveness in protecting their secret can be measured.


Introduction
Information-theoretical quantitative security techniques evaluate the effectiveness of a system in protecting a secret it depends on.Given a known finite size of a secret in bits, they quantify how many bits of the secret can be inferred by an attacker able to observe the system's output.This value is referred to as information leakage, or just leakage.Leakage quantification techniques have been successfully applied to security problems, including proving the effectiveness of bug fixes to the Linux kernel [10], quantifying anonymity [6], and analyzing side channel attacks to the cache of a processor [11].
The theory behind these techniques commonly assumes that the program under analysis terminates at some point, and the computed leakage corresponds to the amount of information that the attacker gains at program termination time.When the secret does not change during computation, a program can be modeled as a channel matrix, assigning the conditional probability of each possible output for each possible input.The size of the channel matrix is usually exponential in both secret and output size.We have previously proposed the use of Markovian models instead to overcome this problem [4].
Markovian models can also be conveniently used to model non-terminating processes, something finite-size channel matrices cannot do.This allows us to study leakage properties of systems like webservices, server modules and operating system daemons.
In this paper we provide techniques and algorithms to quantify the Shannon leakage and leakage rate of non-terminating processes.Shannon leakage has a clear operational significance related to the number of attempts that an attacker has to try to guess a secret [13].Other measures exist, modeling different security properties (e.g.[16]).Our contributions are: We characterize program-attacker scenarios according to the finiteness of the system's secret and the finiteness of the attacker's observation.We show how this characterization influences the finiteness of the information leakage in the scenario.We provide a method to compute information leakage of a scenario with either an infinite secret or an infinite observation.Such scenarios cannot even be modeled with a finite size channel matrix.
We demonstrate this method with an example.We provide a method to compute the rate of information leakage per time unit, when the leakage itself is infinite.This is the case when a scenario has an infinite secret and an infinite observation, as is common e.g. in webservices.We demonstrate this method using a mix node as an example.
We provide an algorithm to compute how much information is leaked from a given time to another given time.
We show that determining the exact time in which a given amount of information is leaked is hard, by reduction to the knowingly hard to decide Skolem's problem.We distinguish four possible scenarios, according to whether the observation by the attacker is finite or infinite and whether the secret itself is finite or infinite.The cases are summarized in Fig. 1.The case with finite observation over a program depending on a finite secret is the terminating case we considered previously [4], while the others will be considered in this paper.When only one of observation or secret is finite the leakage is finite but cannot be computed using the method we introduced previously [4], thus we provide a new technique in Section 3. When both observation and secret are infinite, the leakage is potentially infinite.In this case we compute the rate of leakage, i.e. the amount of information leaked for each time unit.Intuitively, this quantifies the average amount of information the attacker infers for each time unit over an infinite time.This is presented in Section 4. In Section 5 we analyze how much information is leaked in a given time frame and how much time it takes to leak a given amount of information.Section 6 concludes the paper and discusses related work.

Background
We refer to literature [8] for the definitions of sample space S, probability of event P (E) and so on.X is a discrete stochastic process if it is an indexed infinite sequence of discrete random variables . A Markov chain on a sample space S can also be defined as follows: ◮ Definition 1.A tuple C = (S, s 0 , P ) is a Markov Chain (MC), if S is a finite set of states, s 0 ∈ S is the initial state and P is a single |S| × |S| probability transition matrix, so ∀s, t ∈ S. P s,t ≥ 0 and ∀s ∈ S. t∈S P s,t = 1.
The probability of transitioning from any state s to a state t in k steps can be found as the entry of index (s, t) in P k [8].We write π (k) for the probability distribution vector over S at time k and π (k) s the probability of visiting the state s at time k; note that π (k) = π 0 P k , where π (0) s is 1 if s = s 0 and 0 otherwise.A probability distribution π over the states of the chain is stationary if π = πP .Given an initial distribution π (0) we compute the unique stationary limit distribution µ as µ = lim k→∞ π (0) P k .We write ξ s for the expected residence time of state s ∈ S: ξ s = ∞ k=0 P k s0,s .A state s ∈ S is absorbing if P s,s = 1.In the figures we do not draw the looping transition of the absorbing states, to reduce clutter.
We will enrich our Markovian models with a finite set V of natural-valued variables, and for simplicity we assume that there is a very large finite bit-size M such that a variable is at most M bit long.We define an assignment function A : S → [0, 2 M − 1] |V| assigning to each state the values of the variables in that state.We write v(s) to denote the value of the variable v ∈ V in the state s ∈ S. Consider a stochastic process representing the value of a variable v over time, derived fro the behavior of a Markov chain labeled with valuations of this variable.We will call this process the marginal process, or just marginal, C |v on v, formally: ◮ Definition 2. Let C = (S, s 0 , P ) be a Markov chain and v ∈ V a variable.Then we define the marginal process C |v of C on v as a stochastic process (v 1 , v 2 , ...) where ∀n.
We will use v to denote the marginal process when it is clear from the context that we refer to it.Note that C |v is not necessarily a Markov chain.When it is, it can be drawn like in Fig. 3bcd.In the paper we will allow assignments of sets of values to variables and marginals on sets of variables; such extensions are straightforward, since multiple variables can be seen as a single variable on their product space.Assume that the system modeled by C has a single secret variable h and a single observable variable o.Then the distributions over the marginal processes C |h and C |o model the behavior of the secret and observable variable respectively at each time step, and their correlation quantifies the amount of information about the secret that can be inferred by observing the observable variable.
Entropy is a measure of the uncertainty of a probability distribution.The following definitions are standard: ◮ Definition 3 ([8]).Let X and Y be two random variables with probability mass functions p(x) and p(y) respectively and joint probability mass function p(x, y).Then we define the following non-negative real-valued functions: Entropy Mutual information can be generalized to two vectors of random variables X, Ȳ as ◮ Definition 4. [8] Let X = (X 1 , X 2 , ...) and Y = (Y 1 , Y 2 , ...) be two stochastic processes.Then we define the following non-negative real-valued functions: Entropy .., Y k ) when the limit exists Entropy and mutual information of stochastic processes always exist, as shown in Section 3. Entropy rate and mutual information rate may not exist in general, but exist when the stochastic processes are Markov chains [8]; we discuss them in Section 4.
Since every state s in a MC C = (S, s 0 , P ) has a discrete probability distribution over the successor states we can calculate the entropy of this distribution, the local entropy: ◮ Definition 5. Let C = (S, s 0 , P ) be a Markov chain.Then for each state s ∈ S we define the local entropy of s as L(s) = − t∈S P s,t log 2 P s,t Note that L(s) ≤ log 2 (|S|) [5].If a stochastic process is a Markov chain C, its entropy H(C) can be computed by considering the local entropy L(s) as the expected reward of a state s and then computing the expected total reward of the chain [5]: H(C) = s∈S L(s)ξ s .It is also known that the entropy rate can be computed similarly by summing the local entropies of each state weighted by the state's probability in the limiting distribution [8]: In this paper we use information theory to compute the amount of bits of a secret variable h that can be inferred by an attacker able to observe the value of an observable variable o at any moment in time.We call this amount Shannon leakage or just leakage, and it corresponds to the mutual information I(C |o , C |h ) between the marginal on the secret and the marginal on the observable variable.
Operationally, Shannon leakage is related to the number of attempts that an attacker has to do to guess the value of the secret.Other leakage measures exist, but Shannon leakage is the only one for which the chain rule of Definition 3 holds; since the chain rule is used in many results in this work, we do not expect such results to extend to other leakage measures.The modeling of a process as a Markov chain in our context starts by dividing the variables in private and public variables.Private variables, including the secret variable h, are the ones whose value is not defined at compilation time.In each state of the Markov chain a set of allowed values is assigned to each private variable.Public variables, including the observable variable o and the program counter pc, are variables whose value is known to the analyst.On each state a given value is assigned to each public variable.
Given the source code and a prior distribution over the private variables, we have enough information to build a Markov chain representing the semantics, since for each state we can determine its successor states and the corresponding transition probabilities.We show a simple example, and refer to [4] for the complete semantics.The source code for the example is shown in Fig. 2 and the corresponding Markov chain semantics in Fig. 3a.
Let h be a secret bit, o an observable bit and r a random bit being assigned the value 0 with probability 0.75 and 1 otherwise.We assign to o the result of the exclusive OR between h and r and terminate.We want to quantify the amount of information about h that can be inferred by knowing the value of o.
To compute the leakage we need to compute three marginals from the Markov chain semantics: Joint marginal The joint marginal process C |(o,h) models the joint behavior of the secret and observable variables.It is shown in Fig. 3b.
Secret's marginal The secret's marginal process C |h models the behavior of the secret variable.It is shown in Fig. 3c.Observer's marginal The observer's marginal process C |o models the behavior of the observable variable.It is shown in Fig. 3d.Finally we compute the mutual information between the secret and observable variable using the formula 1887 bits, proving that the program leaks ≈ 0.1887 bits, or 18.87% of the secret.

Non-terminating Processes with Finite Leakage
The Markov chain semantics of the system describes the joint behavior of all variables.To compute information leakage we are only interested in the secret and the observable variables, so we can restrict to them only for simplicity.We assume that the system has a single secret variable h with uniform prior distribution and a single observable variable o, but the procedure does not change for multiple secret or observable variables.We remark that, even though the attacker can perform multiple observations, we do not model the case in which the attacker actually interacts with the system.In such case directed information would have to be used as the leakage metric instead of mutual information.We refer to Alvim et al. [2] for the details.
The behavior of the secret variable h is modeled by the marginal C |h , and similarly the behavior of o is modeled by C |o and the joint behavior of the two variables by C |o,h .The following lemma shows the existence of the entropy values of such marginals and a sufficient condition for the finiteness of their mutual information: ◮ Theorem 6.Let C = (S, s 0 , P ) be a Markov chain with secrets and observations and C |o and C |h its marginals on the observable and secret variables, respectively.Then: then there is an infinite number of secret bits but only a finite amount of observations we can analyze.In the opposite case where H(C |o ) = ∞ and H(C |h ) < ∞ we can analyze an infinite number of observations, but there is only a finite amount of secret bits to be discovered.
It follows that if either the observation or the secret are infinite but not both, the formula ) will produce an indeterminate form ∞ − ∞ and thus cannot be directly used to compute the leakage.Nonetheless, in both cases leakage has a finite value by Theorem 6.
If any marginal is a Markov chain, it is possible to compute its entropy in polynomial time in the size of the chain [5].Otherwise, consider that the entropies of the marginals are limit computations, since H(C |v ) = lim k→∞ H(v 1 , ..., v k ).This allows us to compute mutual information using the limit of the entropies of the marginal processes: The limit above computes information leakage in any case, but is not always the most efficient option available.When it is known that the secret (resp.observation) is finite, it is more efficient to use the formula Remember that when both secret and observation are finite the process terminates, so the procedure we proposed in [4] can be used with some additional assumptions.

Example: A Non-terminating Program on a Finite Secret
We now solve a case in which the secret is finite and Markovian and the observation infinite.Consider a program (Figure 4) with a secret bit h.If h is 0 the program produces an infinite string of zeroes and ones with the same probability 0.5, starting with a zero.If h is 1 the program also produces a string of zeroes and ones starting with a zero, but the probability that it will produce a zero is 0.75.Note that this program cannot be encoded as a finite channel matrix, as it has an infinite amount of possible outputs.
An attacker may be able to observe this infinite string and infer information about the secret by studying the frequencies of zeroes and ones.The attacker starts with no knowledge of the secret, which is encoded as an initial uniform distribution over the secret bit h.Reasonably, an attacker observing the output for an infinite time would be able to decide whether the frequency of zeroes is 0.5 or 0.75 and infer the value of h consequently.The Markov chain semantics for it is shown in Fig. 4a on the right.
Since h 1 = h 2 = h 3 = ... we will just call it h.The behavior of h is modeled by the Markov chain in Fig. 4b on the right, and its entropy is We compute H(h|o 1 , ..., o k ) for k → ∞.Note that at time 1 o is always 0, then it changes randomly depending on the value of h.We will write down the joint distribution of h and o as a function of k and use it to compute the marginal over o and finally the conditional entropy.
The joint distribution of h and o is shown in the Appendix due to space constraints.Now let w k ∈ {0, 1} k be a sequence of k bits.Consider the formula for conditional entropy: In our case it holds that The leakage of the program in Fig. 4 is 1 bit, proving that an attacker able to analyze the bit streams produced by the system will eventually learn the value of the secret h with an arbitrary confidence.Note that this considers an attacker able to observe the system for an infinite time.
More importantly, note that the marginal process of o is not a Markov chain.This depends on the fact that the joint distribution depends also on the information that the attacker has about h, so while the attacker gathers information about o and h the joint distribution changes and thus the marginal distribution of o changes also.Nonetheless, the marginal process can be represented in a closed form like the one in Fig. 6d.

Leakage Rate of a Markov Chain
In the case in which H(C |o ) = ∞ and H(C |h ) = ∞, i.e. when the secret is an infinite number of bits and the observer can observe it for an infinite time, then the leakage I(o, h) can be infinite.In this case it is more interesting to compute how much information the process leaks for each time step.This quantity is known as leakage rate, and corresponds to the mutual information rate of the secret and observable.
Note that the computation of leakage as a rate over time assumes that the attacker is able to keep track of the discrete time, so in this section we will assume that every constant-time operation takes 1 time step.This can be equivalently stated as saying that all transitions between states of the Markov chain semantics represent observable steps.
To compute leakage rate, we encode the process-attacker scenario with a Markov chain as shown in Section 2 and compute the joint, secret and attacker's marginal, which may not be Markovian.Then we can use the marginals to compute leakage rate by applying the following definition: ◮ Definition 7. Let C = (S, s 0 , P ) be a Markov chain and C |o,h , C |o and C h its marginals on (o,h), o and h respectively.Then the leakage rate Ī is defined as Leakage rate can also be computed as a limit, since when the limit exists.
Generally both entropy and leakage rate could be infinite, for instance for a program that leaks 1 bit at time 1, 2 bits at time 2, and so on, the leakage rate would be infinite.Since we postulated that there exists a very large but finite maximum size M for the variables declared in the system, Data: A Markov Chain C = (S, s 0 , P ) and its initial probability distribution π (0) .Result: The limit probability distribution µ of the chain.
s and E i be a system of linear equations; Ri to obtain µ r for each r ∈ R i ; 15 end Algorithm 1: Compute the limit distribution of a Markov chain.
it is impossible to declare an unbounded amount of secret or observable bits on each step of the program execution.We do not think that this restriction limits significantly the programs that can be analyzed, while guaranteeing that the entropy and leakage rate do not diverge to positive infinity is a significantly useful result.entropy rate and leakage rate may still oscillate, even though since they are defined in terms of Cesàro limits this happens only in pathological cases.We do not expect these cases to be common in normal secret-dependent systems, and leave finding a meaningful measure of leakage for these cases an open problem.This reflects similar issues in related definitions of leakage rate [7,12].
A case in which both entropy and leakage rate exist is when the marginal processes modeling the behavior of the observable and secret variables are both Markovian.Intuitively, this happens when the secret gets periodically replaced with a new one, and thus the information the attacker has on it is reset to the prior information.We will show this with an example in Section 4.1.
When any of the marginals is a Markov chain it is possible to compute its entropy rate efficiently as H(C) = s∈S L(s)µ s , where µ is the limit distribution of the chain.The entropy rate of a Markov chain with a given initial probability distribution π (0) exists and is unique [8].
Computing the limit distribution can be accomplished on irreducible Markov chains by solving a system of linear equations, but the Markov chains we consider are usually reducible, so a new algorithm is required.Algorithm 1 computes the limit distribution of any Markov Chain C = (S, s 0 , P ).The algorithm uses the well-established concept of end components of a MC; each end component R i behaves as an irreducible MC, so it is sufficient to compute the probability π (∞)

Ri of eventually visiting R i and redistribute π (∞)
Ri among the states of R i by solving a system of linear equations.

◮ Theorem 8. Algorithm 1 terminates in polynomial time in |S| and when it does it returns the limit distribution µ.
Due to space constraints we refer to the Appendix for the proof of this theorem and a full explanation of the steps of Algorithm 1. 2) is trivial, the formula H(C) = s∈S L(s)µ s can be used to compute entropy rate of a Markov chain in polynomial time.Note that this is a particular case of the computation of an expected infinite-horizon reward rate of a reward function defined on the transitions of a Markov chain.

Since computing the local entropy of each state in time O(|S|
Having computed the entropy rates of the joint, secret and attacker's marginals we can apply Definition 7 to obtain the leakage rate of the system.

Example: Leaking Mix Node Implementation
We show an example of a program leaking an infinite amount of information and we compute its leakage rate using the method described above.A mix node [9] is a program meant to scramble the order in which packages are routed through a network, to increase the anonymity of the sender.Even if the packages are encrypted, some information about the sender could be inferred by observing the order in which they are forwarded.A mix node changes this order to a random one, thus making it harder for an attacker to connect each package to its sender.
A mix node waits until it has accumulated a fixed amount of packages, and then forwards them in a random order.If the exit order of the packages is independent from the entrance order, then no information about the latter can be inferred by observing the former.We will present an implementation of a mix node where the entrance and exit order are not independent and compute the rate of the information leakage.
The implementation of the mix node is shown in Fig. 5.This particular node waits until it has accumulated 3 packages and then sends them in a random order.Naming the packages A, B and C there are 6 possible entrance orders: ABC, ACB, BAC, BCA, CAB, and CBA.We will number them from 0 to 5.
In line 5 of the code a random number from 0 to 5 is assigned to the secret variable inorder, modeling the secret entrance order.Then in line 6 a random value uniformly distributed from 0 to 5 is assigned to the variable rand.Finally, in line 7 the bitwise exclusive OR modulo 6 of the variables inorder and rand is assigned to the observable variable outorder, which represents the order in which the packages exit from the mix node and is observable to the attacker.After producing an exit order the mix node receives three more packages in a new entrance order, scrambles them the same way and forwards them, and so on forever.
The leakage rate for each time unit is ≈ 0.09564 bits.Since the entropy rate of the secret is ≈ 0.86165 bits, we can conclude that this implementation of a mix node has a rate of leakage Data: A Markov Chain C = (S, s 0 , P ) with the variables o and h, two integers t 1 and t 2 satisfying t 1 ≤ t 2 .Result: The leakage from time t 1 to time t 2 I (t1,t2) (o, h) Compute the marginal C |x , and let π (t1) |x be the probability distribution over its states at time t 1 and 11 end 12 return I (t1,t2) ; Algorithm 2: Compute the leakage of a MC from a time t 1 to a time t 2 .0.09564/0.86165≈ 11.1% of each of its infinite secrets.
Note that in this simple case the loop always takes 3 time units to complete, so it would have been possible to just compute the leakage of one loop and divide it by 3, but in general loops do not compute for a fixed number of time units, e.g. if they contain multiple return statements.

Bounded Time/Leakage Analysis
We consider two similar bounded approaches to the leakage problem: computing the leakage of a Markov chain within a given time frame, or computing how long it takes for the Markov chain to leak a given amount of information.

Bounded Time
We want to compute the leakage for an attacker that is able to observe the behavior of the program for t < ∞ time units.We abstract time by considering each time unit as a step in the evolution of the Markov chain modeling the system.The definition of mutual information from a time t 1 to a time t 2 > t 1 is as follows: ◮ Definition 9. Let X and Y be two stochastic processes.Then the mutual information between X i and We will refer to it as I (t1,t2) (X ; Y) for simplicity.Consider as usual the Markov chain semantics C = (S, s o , P ) modeling the behavior of the system.We present an iterative algorithm to compute I (t1,t2) (o, h) in time O(t 2 |S| 2 ): the algorithm first computes the distribution at time t 1 and then computes the behavior of the chain until time t 2 while keeping track of the amount of leakage accumulated.In the algorithm let S |x be the state space of the marginal, π Note that Algorithm 2 is pseudopolynomial, as it depends not only on the size of the chain but also on the parameter t 2 .Also note that due to the Markov property it holds that I (t1,t2) = I whenever π |o,h and

Bounded Leakage
We want to determine how many time units it takes for the system to leak a given amount c of bits of information.The problem is more complex than the one analyzed in the previous section, since leakage is a complex function of the behavior of the system in time and finding a way to bound or reverse it is not obvious.We start by considering the qualitative version of the problem: does there exist a time t such that I t (C O , C h ) ≥ c? To answer we note that the the sequence of leakages is monotonic non-decreasing over time, so if the answer is yes then the leakage will remain greater than c for each time t ′ ≥ t.This allows us to answer the qualitative question by computing the leakage on the infinite time horizon as shown in Section 3; let it be l ∞ .If l ∞ < c then there is no time t such that the leakage is c, while if l ∞ > c then such time exists.If l ∞ = c then the system leaks c bits on the infinite time horizon but we have no guarantee that this amount will be reached in finite time.
In the case in which l ∞ ≥ c we can ask the quantitative question, i.e. at what time t the system will have leaked at least c bits.If l ∞ > c we know that such time t exists, while if l ∞ = c it may not.We will define the bounded leakage problem as follows: given a Markov chain C = (S, s 0 , P ) labeled with secrets and observations and a positive real number c, determine if there exists a finite time t such that the information leakage of the chain at time t is exactly c.
The problem is harder than it seems.For deterministic programs, it has been shown by Terauchi that it is not a k-safety property for any k [17].The problem has also been addressed computationally by Heusser and Malacaria [10].For randomized programs, we will show that the problem can be reduced from Skolem's problem [14].While smaller instances have been shown to be decidable, the full decidability of Skolem's problem is still an open question [15].Akshay et al. [1] show that Skolem's problem is equivalent to the following: given a Markov chain C = (S, s 0 , P ), a state s and a probability r determine whether there is a time t such that π (t) s = r.We will call this Skolem's Markov chain reachability problem.Intuitively, information leakage is a harder problem than reachability, as formally stated by the following theorem: ◮ Theorem 11.Let A be an algorithm deciding the bounded leakage problem.Then A decides Skolem's Markov chain reachability problem.

Conclusions and Related Work
We have shown how to provide meaningful measures of the effectiveness of a secret-dependent non-terminating program in protecting its secret, by computing Shannon leakage when its value is finite and Shannon leakage rate otherwise.Operationally, Shannon leakage is related to the expected number of guesses it will take for the attacker to find out the secret's value, so leakage and leakage rate can be used to understand the amount of time that the attacker will require to infer the system's secret [13].To the same aim, we provided an algorithm that computes the amount of leakage from a program in a given time frame.Finally, we have shown that a precise quantification of the time required to leak a given amount is a hard problem, proving the complexity of the problem.
The quantification of leakage for an infinite observation and a finite or infinite secret has been recently considered by Chothia et al. [7].Their framework is different from ours as they study probabilistic "point to point" leakage; also they provide no algorithms to compute leakage.The authors do not consider the case in which the observation is finite and the secret infinite.Also, in the infinite secret and observation case they explicitly do not consider the leakage rate per time unit, preferring to compute the leakage for each occurrence of the secret command in their framework.
Alvim et al [2] study the setting of interactive systems, where secrets and observables can alternate during the computation and influence each other.They show that in this case mutual information is only an upper bound on leakage and "direct information" is a more precise leakage measure.This work is related to ours in that it investigates multi-stage processes but it presents significant differences as it doesn't investigate infinite leakage nor Markovian processes.
Recently Backes et al. also present a method for leakage rate computation based on stationary distribution of Markov chains, which they compute using PageRank [3].We expect that Algorithm 1 would be a useful addition to their approach.

Appendix
Fig. 6 depicts the joint distribution of variables o and h for the Markov chain semantics in Fig. 4 .For compactness we do not represent the cases with probability 0 in Fig. 6cd, i.e. all the cases in which o 1 = 1.

a)
o1   Proof of Theorem 8. Let C = {S, s 0 , P } be a Markov chain and π (0) its initial distribution.We want to compute its limiting distribution µ.We call a state t reachable from a state s if ∃k.P k s,t > 0. We write s ↔ t if s is reachable from t and vice versa.A subset of states R ⊆ S is a end component if for each pair of states s, t ∈ R it holds that s ↔ t and no state in the component has transitions with nonzero probability to states not in the component, i.e. r∈R P r,t = 0 for t / ∈ R. We call a state s recurrent if ξ s = ∞, transient otherwise.It is known that the states in the end components are all and only the recurrent states of the chain.Thus, the limit probability distribution ently under the condition that r∈Ri µ r = π (∞) Ri by solving the system of equations ∀r ∈ R i .µ r = r ′ ∈Ri µ r ′ P r ′ ,r Solving such system of equation provides the limit probability of each state in an end component, and doing it for each end component provides the limit probability of each state in the chain.
For the time complexity, the loop in lines 1-10 runs in time O(|S| 3 ).For the system of equations in lines 11-15 we have to solve O(|S|) systems of O(|S|) equations.The Coppersmith-Winograd complexity for solving systems of n linear equations is ∼ O(n 2.3 ), thus the final complexity is ∼ O(|S| 3.3 ).

◭
Proof of Theorem 10.The algorithm uses a chain rule for mutual information.Let X and Y be two time-indexed Markovian stochastic processes.Then This can be proved as follows: The algorithm starts by computing the mutual information at time t 1 , then updates it step by step until time t 2 .The correctness of an update step comes from considering that due to the Markov property it holds that H(X i |X 0 , ..., X i−1 ) = H(X i |X i−1 ) = s∈S π (i−1) s

L(s)
The algorithm clearly terminates for a finite t 2 and |S|.The computation of the marginals can be solved in O(|S| 2 ), while the probability distributions at time t 1 require time O(t 1 |S| 2 ), dominating the cost of the other operations before the for cycle.Inside the cycle the operations have cost O(|S| 2 ) and the cycle gets repeated t 2 − t 1 times, bringing the total cost to O(t 2 |S| 2 ) ◭

1 3 observableFigure 4
Figure 4 Non-terminating leaking program example.On the left: program code.On the right: Markov chain semantics a) joint marginal C |o,h b) secret's marginal C |h

1 7 assign outorder := 8 (Figure 5 A
Figure 5 A leaking implementation of a mix node. |x the distribution on the marginal at time i, and P (i,j) |x the probability of transitioning from i to j in the marginal.◮ Theorem 10.Algorithm 2 terminates in time O(t 2 |S| 2 ) and when it does it outputs I (t1,t2) (o, h).

Figure 6
Figure 6 Non-terminating example: joint distribution of the secret h and observables o k : a) for 1 step; b) for 2 steps; c) for 3 steps; d) for k steps.