State-Recovery Attacks on Modiﬁed Ketje Jr

. In this article we study the security of the authenticated encryption algo-rithm Ketje against divide-and-conquer attacks. Ketje is a third-round candidate in the ongoing CAESAR competition, which shares most of its design principles with the SHA-3 hash function. Several versions of Ketje have been submitted, with diﬀerent sizes for its internal state. We describe several state-recovery attacks on the smaller variant, called Ketje Jr . We show that if one increases the amount of keystream output after each round from 16 bits to 40 bits, Ketje Jr becomes vulnerable to divide-and-conquer attacks with time complexities 2 71 . 5 for the original version and 2 82 . 3 for the current tweaked version, both with a key of 96 bits. We also propose a similar attack when considering rates of 32 bits for the non-tweaked version. Our ﬁndings do not threaten the security of Ketje , but should be taken as a warning against potential future modiﬁcations that would aim at increasing the performance of the algorithm.


Introduction
Authenticated encryption algorithms aim at providing both integrity and confidentiality with only one cryptographic primitive.For example, AES-GCM is widely used but is often considered not strong enough.Indeed, when too long messages are encrypted under the same key, or if an initialization value is reused, its security collapses and the integrity key leaks.These features impose heavy conditions on the use of AES-GCM.Hence the CAESAR competition was launched in 2014.The purpose of this competition is to provide a portfolio of robust algorithms with better performances for different environments.These algorithms aim at providing confidentiality and integrity to a message, along with integrity of the so-called (unencrypted) associated data.Such mechanisms are called Authenticated Encryption with Associated Data (AEAD) algorithms.
Ketje [BDP + df] is a family of Authenticated Encryption algorithms that was submitted to this competition, and that is one of the 15 candidates that were selected for its (currently ongoing) third round.It was designed by Bertoni, Daemen, Peeters, Van Assche and Van Keer and reuses some of the internal components of Keccak [BDPA13], the winner of the SHA-3 competition.Ketje was initially submitted to the competition as a set of two lightweight AEAD, denoted Ketje Jr and Ketje Sr [BDP + df].These algorithms rely on a version of the internal permutation of Keccak, used in a specific mode of operation called the MonkeyWrap construction [BDPA12], which is derived from the sponge construction [BDPA08].Ketje Jr and Ketje Sr act on internal states of respective sizes 200 and 400 bits.At the beginning of the third round of the CAESAR competition, the designers proposed a new version of Ketje [BDP + df].This new version includes two new variants called Ketje Minor and Ketje Major, with larger internal states of sizes 800 bits and 1600 bits, as well as one modification of previous variants.

Related work.
Since the competition SHA-3, the Keccak-p permutation (described in 2.3) has been widely analyzed [BCC11,DGPW12,JN15].In the literature, reduced versions of Ketje have been cryptanalyzed with cube attacks [DLWQ17] and linearization techniques [GLS16].However, all the previous available cryptanalysis focus on the initialization phase of Ketje.As our technique aims at recovering the internal state during the message processing phase, it might show new analysis direction of interest for the crypto community and the CAESAR competition.

Organization of the paper.
In Section 2, we give a brief description of Ketje Jr.Then, we describe generic divide-and-conquer algorithms and show how to apply them to recover the internal state from 3 consecutive output blocks of Ketje Jr in Section 3. We show in Section 4 how to improve our algorithm and apply it to 4 consecutive output blocks of Ketje Jr v1 with an increased rate of 40 bits.In Section 5 we propose an attack on a version with 32 bits of rate, and conclude in Section 6.

Description of Ketje Jr
Ketje is an authenticated encryption mode proposed by Bertoni et al.In this section we describe the variant Ketje Jr which is the one that we will center the analysis on in this paper.The Ketje family of AEAD algorithms is a third-round CAESAR candidate designed by Bertoni et al [BDP + df].It relies on the use of a round-reduced version of the Keccak permutation, together with the MonkeyWrap mode of operation.In the following we give a short description of Ketje Jr, and focus on those of its components that our attacks exploit.

The monkeyWrap mode of operation
The MonkeyWrap mode of operation [BDPA12] is a mode for authenticated encryption with associated data built upon the sponge construction [BDPA08].It allows for the authenticated encryption of sequences of plaintexts, each one with optional associated data.In the following, we focus w.l.o.g. on the encryption of one plaintext P , without associated data.As our attack targets the internal state during the processing of the plaintext, it does not depend on the number of plaintext and associated data fields.The encryption of P consists in the following operations.
Initialization.A b-bit internal state S is initialized with a key K of variable length (of bit length k recommended 96 bits, up to a max of 182) and a variable length nonce N .S is divided into two parts R and C of respective lengths r and c.The initial value of S is enc(k/8)||K||pad K ||N ||pad, where enc(k/8) is a specific encoding of integer k/8, pad K and pad are constant padding strings.Then, 12 rounds of the Keccak-p permutation are applied to S.

Plaintext processing.
The plaintext is padded and divided into a sequence of r-bit blocks, P 0 , . . ., P −1 , which are processed iteratively as follows.First, the state S is divided into S r ||S c , where S r are its first r bits1 .One then computes the i-th ciphertext block C i = P i + S r , and replace S with C i ||s c .Then, one round of the Keccak-p permutation is performed on S.

Tag extraction.
The final step is the extraction of the authentication tag.After the plaintext processing, 6 rounds are performed on S before the tag extraction.Then, the tag is computed as a sequence of r-bit blocks, which are defined as the S r part of the state.The state is updated by one round between consecutive extractions.

The Ketje Jr state
We briefly recall here that the claimed security on Ketje Jr is roughly given by min(96, k).The description of the Keccak-p permutation rounds that are used in Ketje Jr relies on a specific representation of the b = 200-bit internal state.The state of Ketje is a three-dimensional array of elements of F 2 of size 5 × 5 × 8.For simplicity, we use the same vocabulary as the authors, namely if A denotes the state, then: • A x,y,z denotes the bit at position (x, y, z).
• A row is a set of 5 bits with constant y and z coordinates (A * ,y,z ).
• A column is a set of 5 bits with constant x and z coordinates (A x, * ,z ).
• A lane is a set of 8 bits with constant x and y coordinates (A x,y, * ).
• A sheet is a set of 40 bits with constant x coordinate (A x, * , * ).
• A plane is a set of 40 bits with constant y coordinate (A * ,y, * ).
• A slice is a set of 25 bits with constant z coordinate (A * , * ,z ) The 200 bits of the state are ordered from 0 to 199 according to their position (x, y, z), as 40x + 8y + z.We adopt the same representation of the state as the designers.Each slice is represented by a 5 × 5 array, with index (x, y) = (0, 0) at the center:

The Keccak permutation
Ketje Jr is an iterated authenticated encryption mode where the round function is derived from the Keccak-p permutation.The Keccak-p round function consists of five steps: R = ι • χ • π • ρ • θ.We now give more details about each of these operations.

Specification of θ
θ is a linear transform that works as represented in figure 1.This operation provides linear diffusion, which is achieved by relying on the sum of all bits of each column of the state P (A) x,z = 4 y=0 A x,y,z .We refer to these sums as parity bits of the state.Each bit (at position x, y, z) is x-ored with the two parity bits P (A) x−1,z and P (A) x+1,z−1 , where indexes are taken modulo their maximal value (i.e. 5 for the x-coordinate and 8 for the z-coordinate).The output of θ is given by A x,y,z ← A x,y,z + P (A) x−1,z + P (A) x+1,z−1 .

Specification of ρ
ρ is linear and provides diffusion between the slices of the state.It consists of a different circular rotation applied to each lane.Its output is given by A x,y,z ← A x,y,z−(t+1)(t+2)/2 , with t satisfying 0 ≤ t < 24 and 0 1 2 3 More concretely, the number of positions each lane is rotated by is given by the following matrix:

Specification of π
π is also linear and is shuffles the positions of lanes.It is therefore applied slice by slice, according to the pattern represented on Figure 2. Its output is given by Figure 2: Representation of π on each slice of the state

Specification of χ
χ is the nonlinear layer of the permutation used in Ketje Jr.It can be interpreted as a layer of 5-bit to 5-bit Sboxes of algebraic degree 2, computed on the rows of the state.Moreover, each output bit depends on only 3 input bits, according to the formula:

Specification of ι
ι is an addition of a round constant.The round constants are of the following form: the other values are zero and the values rc[t] ∈ F 2 are defined as the output of a binary linear feedback shift register (LFSR): Eventually, the Ketje Jr permutation is defined by the application of n r rounds R, indexed with i r from 18 − n r to 17.

Keystream extraction
The MonkeyWrap construction with rate r states that the first r bits of the state are extracted as keystream.As the bits are ordered first on their x coordinate, then on their y coordinate, and finally on their z coordinate, keystream bits are extracted by lane.For Ketje Jr v1, as long as the rate does not exceed 40 bits, keystream bits are concentrated on the plane x = 0, and on lanes y = 0 . . .r/8 .Please note that if the rate is 40 bits, the full plane x = 0 is output as keystream.However, in Ketje Jr v2, the authors replace the direct use of the Keccak-p permutation by using a variant of it, called the twisted permutation.The twisted permutation Keccak-p is defined as: It is also important to notice that applying Keccak-p is equivalent to output different lanes, not belonging to the same plane, because all intermediate π and π −1 will cancel out.The keystream bits are now still concentrated on lanes, on the diagonal of the state with equation x = y, starting from x = y = 0 and up to x = y = r/8 .

Divide-and-Conquer attack using output blocks on Ketje Jr
In all the paper, we describe different attacks on Ketje Jr v1 or v2.Those attacks rely on the following fundamental idea: separate the state in 2 parts, then construct 2 lists independently that cover each part and eventually merge both lists using linear or non-linear sieving relations.
In this section we describe a state-recovery attack that works on both versions of Ketje Jr (i.e., with the initial or the twisted permutations).This attack enables the adversary to recover the state of Ketje Jr during one encryption, under the hypothesis that he knows three consecutive keystream blocks.We first show our attack on Ketje Jr v1, and then show how to adapt it to the new version with the twisted permutation.The strategy is rather generic and applies for any rate.We point out that its complexity decreases with the encryption rate, and is in any case too high to contradict the security claims made by the designers.The complexity of the attacks in this section can not be smaller than 2 200−3×r for a rate r, as this is the number of possible solutions that we recover, and have to test in order to find the correct one.For a rate of r = 40 the best possible achievable complexity is therefore 2 80 .Therefore, in order to have an attack better than generic ones, we have to consider a rate of at least r = 40 bits.

A basic attack against Ketje Jr v1 with rate 40 bits
The attack strategy against Ketje Jr v1 is rather straightforward.In this subsection we suppose that the rate of Ketje Jr is 40 bits.In other words, a full plane is output after each round.
Divide-and-conquer framework.Our attack is based on a divide-and-conquer technique.A generic formulation of the problem we aim at solving is the following: Given two sets U and V, two functions f : U → GF (2) c and g : V → GF (2) c and one element t ∈ GF (2) c , find all u ∈ U and v ∈ V such that f (u) + g(v) = t.
Solving this problem is folklore in cryptography (many examples can be found for instance in meet-in-the-middle attacks [BR10, DSP07, Sas11] or rebound attacks [LMR + 09, KNRS10]).One computes f (u) for all u ∈ U and stores (u, f (u)) in a table, sorted according to the value of f (u) + t.Then, one computes g(v) for all v ∈ V and search the table for a match.For all u such that f (u) + t = g(v), (u, v) is one solution to the problem.
The memory complexity is |U|(log(|U|) + c) bits, where |U| is the number of elements in U and assuming they can be represented on log(|U|) bits.The time required to mount the attack is |U| + |V| computations of functions f and g, and |U| × |V| × 2 −c to describe the set of solutions.
Divide-and-conquer against Ketje Jr.This technique can be applied to 3 consecutive output blocks of Ketje Jr, covering two Keccak rounds.Let us introduce the following notation: A 0 is the state containing the first known output, B 0 the value of the state after applying θ to A 0 and C 0 the state after ρ and π.Then, χ is applied and a new output block is computed.We also denote by A 1 , B 1 and C 1 the same values one round later, and by A 2 the value after the last χ layer.These notations are displayed on Figure 3.
More precisely, we define A u (resp.C u ) as the first four slices (with index 0 ≤ z ≤ 3) of A 1 (resp.C 1 ).
We can now describe the core of our attack.We guess the front half state A u .For each slice, the 5 bits at position y = 0 are already known, which leaves 20 bits to guess per slice, and a total of 80 bits to guess from A u .We denote our guess by i ∈ {0, 1} 80 .Then, we can compute the resulting value of C u , by applying χ −1 .We then define Similarly, by guessing 80 bits of the back half of the state A v , we can compute Our attack then consists in applying the divide-and-conquer strategy described above to A u , A v and t = (C 1 * ,0, * , A 0 * ,0, * ).We have |U| = |V| = 2 80 and c = 80, therefore the time complexity of our attack is 2 80 evaluations of f and g for the divide-and-conquer phase.Then, the remaining 2 80 candidates can be searched exhaustively, for a cost of 2 80 .
The general idea is summarized on Figure 4.
Known bits Bits derived from u Bits derived from v Bits computed as f (u) + g(v) Known bits used to sieve

An attack against
Ketje Jr v2 with rate 40 bits

Sieving with nonlinear relations
We now show how to modify the previous attack to apply it to Ketje Jr with the twisted permutation.By studying every step of the attack, we can notice that the only point that prevents it from being directly applied is that we can no longer compute 40 bits of C 1 from known bits of A 2 , as the output bit positions no longer cover outputs of the same Sbox.
We overcome this problem by sieving using A 2 instead of C 1 in our divide-and-conquer attack.
To this end, we need to modify our divide-and-conquer algorithm.Indeed, our previous attack consists in expressing known bits as the sum of two values computed from independent guesses u and v.One limitation of this strategy is that it cannot be applied as soon as such an expression involves a nonlinear combination of u and v.We now demonstrate how to modify our attack to encompass more cases without a significant increase in the time complexity of the algorithm.After that, we show how to use our new algorithm to improve the attack against Ketje Jr.

Divide-and-conquer with nonlinear interactions.
We now modify the divide-andconquer framework (that was previously limited to linearly independent expressions in u and v) to encompass cases that will help attacking Ketje Jr with the twisted permutation.
We define the following problem as a system of equations built from three subsystems with respective sizes α, β and γ.We consider the following boolean functions : The problem we try to solve then consists in finding all (u, v) ∈ U × V such that : In summary, we consider a system of c = α + β + γ equations and partially remove the condition on linearly independent contributions from u and v to all equations.More precisely, we can deal with equations involving one product of u and v dependent values, provided there is only one other term involving either u or v (but not both).A similar problem was treated in [LN15].
Link with Ketje.The nonlinear layer χ of Ketje involves only one product, namely (a, b, c) → a + bc.We focus on a very specific case of divide-and-conquer attacks, involving a χ-layer in which: • Every input bit can be fully computed by guessing either u or v; • Some output bits are known to the adversary.
Depending on which half of the state input bits a, b and c are computed from, the knowledge of the output bit a + bc can be expressed as an equation of type 1, 2 or 3.

Our modified divide-and-conquer strategy.
As in the case of the initial problem of Section 3.1, our solution consists in building two (sorted) lists of values L U and L V such that each solution (u, v) to our problem leads to a match between both lists.We also want to be able to recover u and v from the matching elements, and require that our algorithm avoid false positives, i.e., matches between lists that do not lead to values u and v that verify the system of equations.
We now describe how we build the list L U .Each element of this list is a couple (F, u), where F is a c-bit value.The bit at each position i of F is meant to be compared with the same bit of elements (G, v) of L V to verify the equation involving f i , f i , g i , g i .We append u to the elements of the list so as to recover a solution (u, v) from a match.

Dealing with equations independently.
We now describe how we deal with each equation.For each value u and v and for each equation, we aim at defining sets R i (u) and S i (v) such that there is a collision between R i (u) and S i (v) if and only if (u, v) is a solution to the i-th equation.We now study the three possible cases.
1. Case of eq.1.This is the usual case of divide and conquer attacks, with independent contributions from u and v.The equation is satisfied if 2. Case of eq.2.Our goal is to verify such an equation by comparing f i (u) with some v-dependent values.Therefore, we define R i (u) = {f i (u)}.We handle v in a more complicated manner.S i (v) is build from values g i (v), g i (v) only, but different cases occur.First, one computes and therefore we define , the i-th equation cannot be satisfied.Therefore S i (v) = ∅.Finally, if g i (v) = 0, the i-th equation holds independently of u.We need to have a match between any R i (u) and S i (v), therefore we define S i (v) = {0, 1}.
3. Case of eq.3.Equation 3 is equivalent to equation 2 by exchanging the roles of u and v.We therefore handle these equations as in case 2, exchanging u and v.
In each case, one can check that there is a collision between R i (u) and S i (v) if and only if (u, v) is a solution to the i-th equation.

Solving all equations simultaneously.
Let us now consider the cartesian products ) is a solution to all the α + β + γ equations, there is a collision between elements of sets R i (u) and S i (v) for all i, and thus there is a collision between elements of the cartesian products R(u) and S(v).Conversely, if there is a collision between elements of R(u) and S(v), each coordinate of this collision gives a collision between R i (u) and S i (v), and thus (u, v) is a solution of all the equations.

Solving the nonlinear divide-and-conquer problem.
Our algorithm then consists in building L U by enumerating all the elements (F, u) for all elements in R(u) and all u in U. Elements are sorted according to the value of F (in lexicographic order for instance).A similar operation is done to compute L V .Then, the solutions (u, v) to our problem can be retrieved by searching collisions in the lists.We proceed in two steps to build L U .First, we show in Algorithm 1 how to build the first α bits of all values in R(u), for any guess u, and store them in a list denoted L u .A similar list L v can be computed similarly by replacing loops from α + β to α + β + γ − 1 by loops from α to α + β − 1.Then, in Algorithm 2, we show how to build the full list L U , by completing all γ-bit elements of each list L u to a c-bit value and merging all these lists.Again, the list L V can be computed similarly.

Size of the lists.
Now, we have to evaluate the size of the lists L U and L V that are merged during our algorithm.To do this, we compute the expected size of L u , which is the expected number of entries that are added to the list L U for each value of u.We focus on the joint values of (f i (u), f i (u)) for all α + β ≤ i < α + β + γ.First, according to Algorithm 1, entries are added to the list only if none of these values is (1, 0), which happens with probability (3/4) α .We denote by ω this event, and by N (u) the number of entries added to the list.We have N (u) = 2 j , where j is the number of positions such that (f i (u), f i (u)) = (0, 0).We can now compute: The expected value of N (u) is then: Algorithm 1 Computing the list L u of elements Z such that equations α+β to α+β +γ −1 are satisfied for any (u, v) such that Z = (g α+β (v), . . ., g α+β+γ−1 (v)).
L u ← ( ) // Initialize L u with the empty string The average size of the list resulting from the first step of our algorithm is then |U|.Similarly, the number of values tried during the second step is |V|.

Complexity analysis.
For the sake of simplicity, we did not try to completely optimize the algorithm we use to build lists L U and L V .One of the reasons for it is that for our attack against Ketje Jr, we will precompute these lists, and the time required for that will not be the bottleneck of the attack.For each value of u, we need to compute at most α + β + 2γ functions f i or f i , and for each of the last γ equations, we need two comparisons to determine which values need to be added to the list.Each value added to the list also requires at most γ modifications on the list L u , and one insertion in the global list L U .The total complexity is therefore bounded by 2c(|U| + |V|) computations and comparisons and c(|U| + |V|) insertions in lists.
In the sieving step, we search for matches between two lists of respective (average) sizes |U| and |V|, which can be achieved in |U| + |V| operations.This is equivalent to the complexity of the divide-and-conquer without nonlinear sieving algorithm.

An improved attack against Ketje Jr v2
Adapting guesses on parts of the state.We can now come back to Ketje Jr.
The state A 1 is divided in two halves A u and A v , increasing the size of the lists by adding parity bits of 5 columns.More precisely, A u is defined as the first four slices (0 ≤ z ≤ 3) with the 5 parity bits of the columns of A 1 i, * ,7 , for i ∈ {0, 1, 2, 3, 4}, and A v corresponds to the last four slices (4 ≤ z ≤ 7) with the 5 parity bits of the columns of A 1 i, * ,3 , for i ∈ {0, 1, 2, 3, 4}.By adding those 5 parity bits, the application θ becomes transparent: B u and B v (first four slices and last four slices of B 1 ) are immediately derived from A u and A v independently.For guessing the first half augmented state A u , we know that for each slice the 5 bits at position y = 0 are already known (from A 1 * ,0, * ), but also there are 5 known bits of information from B 1 , which leaves a total of 4 × 5 × 5 + 5 − 4 × 5 − 4 × 5 = 65 bits.As the 5 additional bits from each list correspond to parity bits from the other list, there exist 10 extra linear equations (in addition to the 40 bit conditions given by the known values from A 0 ) that can sieve the number of possible combinations between the two lists (factor 2 −10−40 when considering all the linear relations).In other words there exist f 1 and g 1 : {0, 1} 65 → {0, 1} 10 such that In the same way as in 3.1, there exist two functions f 2 and g 2 such that Removing the condition on the rate.Our first attack from Section 3.1 could only work if the keystream blocks extracted from the state cover full Sboxes, so that one can invert the Sbox layer to prepare the divide-and-conquer attack.As we no longer need to invert a χ layer, this condition disappears.We can then apply our strategy to both versions of Ketje Jr even if the rate is smaller than 40 bits.The complexity of our attack however highly depends on the rate.
Saving time through pre-computation.Our algorithm involves the construction of lists L U and L V , which implies a time complexity linear in the number of equations, due to iterations of Algorithm 1.However, in the case of Ketje Jr, we can improve our algorithm by computing L u for all possible values of u in a pre-computation step.Indeed, let us

Deduce
Nonlinear Sieving Known bits Known bits used to sieve Bits derived from v Bits derived from u Figure 5: Summary of our advanced divide-and-conquer attack take a deeper look at C 1 and A 2 and look for nonlinear sieving relations (see Equations 2 and 3), i.e., known bits of A 2 which expression involves the product of two bits of C 1 , one computed from A u and the other from A v .We can count that for Ketje Jr v1, there are at most 20 such relations, leading to β ≤ 10 and γ ≤ 10.Overall, (at most) 20 bits of C 1 depending on u are involved in the γ case 3 equations, and similarly (at most) 20 bits depending on v are involved in the γ case 3 other relations.As a consequence, all the 2 20 possible lists L u (and similarly L v ) can be precomputed before starting Algorithm 2, for a (negligible) complexity of about 4 × 10 × 2 20 , and a memory of about 10 × 2 20 bits per list.
Please note that we consider Ketje Jr v1 here because if the rate is below 40, one can no longer invert the χ layer between C 1 and A 2 and thus, one uses A 2 to sieve nonlinearly.For Ketje Jr v2, we can compute that α ≤ 9 and β ≤ 9, leading to even lower complexities.
Therefore, partial lists L u and L v can be pre-computed using Algorithm 1 and stored, as only (at most) 2 20 cases can occur for both u and v.When computing L U (resp.L V ), one does not need to run Algorithm 1 and to recompute L u (resp.L v ) for each value of u (resp.v), but only search for it in a precomputed list and insert the corresponding values in L U (resp.L V ).
Complexity analysis.Each guess u or v consists of 4 slices of 25 bits and 5 parity bits from A 1 , but r/2 of these bits are known.Therefore, the complexity required to generate each list is The number of solutions given by our divide-and-conquer algorithm depends on the number of sieving relations.We have r sieving relations from the value of A 2 , r sieving relations from the value of A 0 , and 10 sieving relations from the parity bits.Therefore, the number of remaining values for the full state A 1 is When the rate is less than 40, the cost of the exhaustive search on the remaining solutions dominates the complexity of the attack.When the rate is 40, the complexity mainly comes from the computations of the values of u and v.We can however improve this complexity by noticing that some bits of A 2 might fully be computed from u or v, thus reducing the size of the lists (but not the number of solutions left after the divide-andconquer part).By looking carefully at the details of ρ and π, we find that in the case of Ketje Jr v2, 4 bits can be recovered from u and 4 bits from v. Thus, the complexity of searching all the values of u and v drops from 2 85 to 2 81 , leading to an overall complexity of 2 × 2 + 2 80 ≈ 2 82 .
The case of Ketje Jr v1 is studied in Section 3.3.We show that the list of 2 80 possible values of the internal state can be recovered with a complexity of 2 66 operations, therefore the exhaustive search the time complexity of the attack in that case.
Our results are summarized in

Application to the initial Ketje Jr with rate 40 bits.
We now study the specific case of the initial Ketje Jr permutation with an increased rate of 40 bits.In that case, the adversary can compute backwards the partial χ layer between states A 2 and C 1 , and therefore knows 40 bits of state C 1 .As π and ρ only shuffle bit positions, the adversary knows 40 bits of B 1 .Moreover, one can easily notice that these bits are located on 5 lanes.Therefore, the adversary knows 5 bits on each slice.
In our advanced attack represented on Figure 5, we can see that the adversary can deduce the value of 4 full slices of B 1 from his guess (both the green phase and the red phase), as they are computed linearly (through a θ layer) from the guessed bits.Putting it together, he gets 20 linear relations on the 85 guessed bits in each phase of the attack.Taking account of these relations, he only needs to guess 65 bits.This case is represented on Figure 6.One can also notice that nonlinear sieving relations are not used in that specific case.
Known bits Known bits used to sieve Bits derived from v Bits derived from u Linear relations on guessed bits Unsurprisingly, the number of remaining candidates after the divide-and-conquer phase of the attack is left unchanged.The number of pairs before sieving is divided by (2 20 ) 2 = 2 40 , however the 40 known bits of A 2 have already been used and do not provide useful information for the sieving phase.

An attack using output blocks with a rate of 40 bits on Ketje Jr
We describe here a more performant attack that uses 4 consecutive output blocks (i.e. 3 rounds) of the non-twisted version of Ketje Jr and we consider a rate of 40 bits.Hence, this produces a smaller number of solutions of 2 200−4×r = 2 40 .The principle of the attack is similar, but we have in addition a last non-linear sieving using the keystream extracted from A 3 that is the most complicated part of the and we will describe in detail in the next sections how to efficiently perform it.In fact, we aim at sieving with B 2 instead of A 3 because χ can be inverted on the full plane A 3 * ,2, * and the ρπ application can also be inverted that we move the known parts to the lanes B 2 i,i, * .For this we propose two different methods.The first performs a few initial guesses on some bits in the state C 1 in order to be able to partially compute through the non-linear relations (that are coming from the χ-layer between C 1 and A 2 ) and sieve linearly.The second uses the merging lists algorithm from [Nay11], refined in [CNV13] with the ideas from [DDKS12]: the instant matching algorithm and the parallel matching without memory.
Our attack considers 4 consecutive output blocks of Ketje Jr v1, covering then three Keccak rounds.We use the same notations as in 3.1 and in figure 7. The idea relies also on guessing separately both halves of the state that complete the known part of A 1 , and merging these lists by considering the sieve given by the information from A 0 , A 2 and A 3 .As in 3.1, we can compute the planes C 2 * ,0, * , C 1 * ,0, * and C 0 * ,0, * through the inverse of χ, as the whole corresponding rows are well known.Moreover, we also compute the (ρπ) −1 application and get the full lanes B 0 i,i, * , B 1 i,i, * and B 2 i,i, * for i ∈ {0, 1, 2, 3, 4} as it is shown in figure 7.As previously described in section 3.2.2, the state A 1 is divided in two halves A u and A v , increasing the size of the lists by adding parity bits of 5 columns.Each list has a size of 2 4×5×5+5−4×5−4×5 = 2 65 elements.There exist 10 extra linear equations (in addition to the 40 bit conditions given by the known values from A 0 ) that can sieve the number of possible combinations between the two lists.In other words there exist f 1 and g 1 : {0, 1} 65 → {0, 1} 10 such that In the same way as in 3.1, there exist two functions f 2 and g 2 such that 4.1 First method for sieving with B 2 : preliminary guessing As we focus on guessing slices independently, we focus on the application of ρπ to the slices.More precisely, B denotes the state before ρπ, hence each bit of the lane B 0,4, * will stay in the same slice, each bit of B 1,4, * will be shifted by 2,...Those shift values are called the ρ-shift and are displayed in figure 8.Eventually, we look where the bits of A u and A v are found in the state C 1 after the ρπ application.However, one needs to pass from C 1 on to A 2 through χ and then to B 2 through θ, in order to sieve with known bits of B 2 (colored in blue in 7).To do so, we fix bits of the state such that some rows in C 1 are fully determined by the value A u (resp.A v ).The bits that are guessed this way are given in figure 9.However, the attack has to be done 2 times if denotes the number of guessed bits.But guessing those bits allows us to compute entire rows on the state A 2 with only bits from A u (resp.A v ).However, the application θ after A 2 remains, so we have a clever choice of guesses to make in order to be able to sieve with the known part of the state B 2 .To do so we fix bits such that we can compute at least two consecutive slices of A 2 , hence there will be a linear sieving.The details of the state C 1 and the choices of bits to guess are given in figure 9. Figure 9: C 1 : the green part is known, the bits with correspond to the half state A u , the other ones to A v .Bits colored in yellow, grey or blue correspond to the best choice of guesses such that an entire slice is known after χ.
Moreover, guessing bits decreases the size of the two lists, and allows us to sieve more bits, by using the known information from B 2 (that is immediately obtained from A 3 0, * ,0 ).On the other hand, the number of guess will increase our time complexity.
This choice of guess is the best one we can do regarding the total time complexity of the attack, that is 2 73 .The other reasonable choices of guesses are described in Table 2.

Second method for sieving with B 2 : list merging
Here we have two lists L 1 (which corresponds to A u ) and L 2 (which corresponds to A v ) of size 2 65 each as explained in the beginning of section 4. We also have 50 linear relations coming from the 40 known bits of A 0 and the 10 parity bits of columns that should be satisfied by any candidate pair of elements.Each list can be partitioned in 2 50 sublists of average size 2 15 .All the elements in each sublist are associated to the same value of the corresponding half of the state associated to the 50 linear relations.In order to separate each list in 2 50 sublists we compute and store the associated values to the linear relations (f 1 (A u ), f 2 (A u )) from the elements in L 1 and (g 1 (A v ), g 2 (A v )) from the elements in L 2 .Each one of the different possible values that (f 1 (A u ), f 2 (A u )) could take defines a sublist.Because of the linear relations, there is only one possible value that is a match for (g 1 (A v ), g 2 (A v )) from L 2 (which also defines another sublist).Each sublist, associated to a different value of (f 1 (A u ), f 2 (A u )), contains 2 65−50 = 2 15 elements from L 1 , and the same goes for the elements in the sublists from L 2 associated to each value of (g Then we have that, for each one of the 2 50 possible combinations to compute the linear relations, we can merge the 2 associated sublists (one from each list) of average size 2 15 that meet this sharing.
We propose then an algorithm that, for each one of the 2 50 different sublists associated to (f 1 (A u ), f 2 (A u )), considers the only correct sublist (g 1 (A v ), g 2 (A v )), and efficiently merges next the two remaining sublists of size 2 15 .The final cost of the algorithm will be 2 50 times the cost of merging the two lists of 2 15 .Those lists are denoted by L 1 and L 2 .
Let us point out here that for each one of the values in the sublists of size 2 15 , we are able to compute the yellow (resp.purple) bits depicted in figure 10 in C 1 by computing through ρ and π.We can also compute the yellow and purple bits in A 2 as all the corresponding inputs belong to the same list.We can also deduce from each list the values that, xored with a value given by the other half, will determine the bits marked with L (L means that the associated bits depend on linear relations between both lists).
In order to reduce the cost of this merge from the trivial 2 30 (given by trying all the elements in one sublist with all the elements in the other), we have to consider the relations imposed by the output known in A 3 , that we can trace up to B 2 .
Merging the two lists L 1 and L 2 of size 2 15 through parallel matching without memory.
We will recall here how this algorithm detailed in [CNV13] works, but first we have to determine the relations that we will consider for the sieving.In Figure 11, we can see some information regarding the equations to satisfy certain of the known bits of B 2 .More precisely, we focus on the known bits e 0 , b 1 , d 1 , a 2 , d 2 , e 2 , b 5 , d 5 , a 6 , d 6 , e 6 and c 7 .As can be seen in equations from (16) to (25), those bits (or linear combinations of them) have a small number of known variables that intervene in a non-linear way.
For the sake of readability, x ijk denotes the bit C 1 i,j,k if it belongs to L 1 and y ijk if it belongs to L 2 .In the following, we explain as an example how we get the equation of b 5 .Through θ, we get that However, in this equation only A 2 4,4,4 has a non-linear combination of variables from one variable of each list.In other words there exist two linear functions 1 and 2 such that We can then define two linear functions x b5 and y b5 such that By doing the same for the other chosen bits of B 2 we get the following equations.
Figure 10: Representation of the 3-round attack.Each 5 × 5 square represents a slice, each small square is a bit, and each line of squares represents the full state (the 8 slices) at a certain instant.The first state that outputs a value is the one on the top, and the last the one at the bottom.The bits colored in grey, green, red and blue are known bits from the outputs.The bits colored in yellow are known bits from A u (L 1 ).Colored bits in purple are known bits from A v (L 2 ).Bits with an L represent bits that can be computed as a linear combination of bits from values obtained from A u and from A v .The ones in white represent a quadratic combination of both lists.Figure 11: Scheme on exploited quadratic relations between the last output and the unknown bits for the final sieving By looking at the equations, we see that there are a lot of variables that appear several times in different equations.For instance the equations ( 5), ( 6) and (7) only involve 14 variables, 7 from L 1 and 7 from L 2 .Moreover we can linearly combine the equations, for example: Eventually, we can also factorize the terms y 212 and x 216 in the equations (10) and ( 12), which gives us the following system of 10 equations that totals 21 variables of L 1 and 21 variables of L 2 .As there are more variables with 2 quadratic terms, the complexity would only increase if we considered more.We will explain later that our obtained complexity is optimal as far as no more relations depending on one variable exist.
General intuition of the algorithm.
Given two lists L 1 and L 2 , the idea of the algorithm is to find all the pairs of elements, one from each list, satisfying a certain number (n aux + n rem ) of bit-relations by testing in parallel n aux relations and the n rem remaining ones.The aimed complexity is better than the naive one of testing each element with all the elements in the other list.As the considered relations are not linear, if for each elements we check all the possible matches regarding a certain number of relations, the complexity might soon become higher than the naive limit.For this we will first consider n aux relations, and classify the first list regarding the v aux1 involved values.For each value, we consider all the possible matches through the n aux for the value of the v aux2 involved variables from the second lists.With all the elements from the second list that satisfy these values in the v aux2 variables, we build an auxiliary list, smaller than the second one, that we order by the values of the v rem2 variables involved in the n rem remaining relations.This has to be done for each different value of the v aux1 variables.We can now go back to the first sublist, of elements associated to a certain value for the v aux1 variables.We know that these elements and the ones from the auxiliary list already satisfy the n aux first relations.If we now go through the different values for the v rem1 variables involved from the first list in the n rem relations and check in the auxiliary list if the possible values of the v rem2 variables appear or not, when we find a match we will find a pair of elements that satisfies both the n aux relations and the n rem relations.The intuition of the gain can be explained by imagining that the number of each group of relations and of the variables are balanced.The overhead of trying all the possible matches for one element in the other list is reduced by a square root (for similar relations).In the particular case we are dealing with here, v aux1 = v aux2 = v aux and v rem1 = v rem2 = v rem .We want to point out here that the final complexity cannot be better than |L 1 | × |L 2 | × 2 −naux−nrem , as this is the number of solutions obtained.
We want to merge L 1 and L 2 .In order to apply the parallel matching algorithm, we have to separate these relations in two.A number of n aux relations (involving v aux variables from each list) will be considered for building the auxiliary lists of elements from L 2 .For each value of the v aux variables from the elements in L 1 , this auxiliary list will contain only the elements from L 2 satisfying these n aux relations.The remaining n rem relations, involving v rem variables from each list, will be checked later.The parallel matching algorithm, for each one of the 2 vaux different values of the v aux variables associated to the first n aux relations in L 1 (that produces a sublist of L 1 ), will perform the following: 1. Use the n aux relations to build the auxiliary list with the elements from L 2 that are already a match with respect to these relations.About 2 vaux−naux values for the 5 Improved attack using 4 output blocks with a rate of 32 bits In this section we describe an improved version of the previous divide-and-conquer attacks described in section 4 that allows us to build an attack when considering a rate of r = 32 bits.The number of expected remaining candidates is equal to 2 200−4×r = 2 72 which is still lower than 2 96 .Moreover, similar techniques as the ones used in section 4 can be applied to mount an attack of complexity 2 92 .
New ideas used for the attack.Our attack on initial Ketje Jr with rate 32 relies on two new ideas.First, please notice that the 32 bits that are known to the adversary from each state A 0 , . . ., A 3 are 4 of the 5 output bits of the 8 Sboxes from the sheet x = 0. We show that we can partially invert such Sboxes layers and derive useful information on state B 2 from the known bits of A 3 .Our second idea is that we can reduce the number of nonlinear interactions between guesses u and v in B 2 by guessing some bits of C 1 before starting applying divide-and-conquer algorithm.
5.1 Using known information from A 0 , A 1 and A 2 .
If we consider the red bits from Figure 10, that correspond to known bits of A 1 , there is nothing to change from the previous attack with a rate r = 40, but the size of the two lists increases by a factor 2 4 for both lists : half of the 8 bits that are now unknown are located in parts of the state A u and A v , and need to be guessed while building our lists.The 32 known bits from A 0 can be computed as linear relations between the bits of the two lists.Hence there is no longer 40 linear relations that are sieving in the merging lists algorithm, but only 32.
In our attacks with a rate 40, we need to recover 40 bits of B 1 from the 40 known bits of A 2 .To transpose this information to a rate of 32, we guess the 8 missing bits before starting our attack and apply the same strategy, such that the green part can sieve the same information on the 2 lists.However, this increases the complexity of the attack by a factor 2 8 but reduces by 2 4 the size of both lists.
Guessing 4 more bits.Now we have guessed 28 bits in total.The size of the lists becomes now 2 100 × 2 −16 × 2 5 × 2 −20 × 2 −10 = 2 59 . 2 100 is half of the state, 2 −16 corresponds to the known bits in A 1 , 2 5 corresponds to the parity bits of the columns, 2 −20 corresponds to the known bits in A 2 (8 are guessed) and 2 −10 corresponds to half of the bits we guess to get linear equations.The final cost of the algorithm is then (2 28 × 2 59 ) + 2 28 × 2 2×59 × 2 −32 × 2 −10 × 2 −12 , which is 2 87 + 2 92 .Both lists are of size 2 59 and we have 54 linear relations, hence it is of interest to guess few bits more such that both terms become equal.

Conclusion
These attacks do not pose a threat to Ketje Jr instantiated with the recommended parameters.In particular, the attacks against 4 consecutive output blocks do not work with the twisted permutation, and therefore, the tweak seems to make the primitive more resistant to divide-and-conquer attacks.However, our attacks provide us with a new non-trivial limit on the rate we can output.

Figure 1 :
Figure 1: Computation of one output bit of θ

Figure 4 :
Figure 4: Summary of our basic state-recovery attack

Figure 6 :
Figure 6: Divide-and-conquer attack against initial Ketje Jr with rate 40 bits with guessing of parity bits.

Figure 7 :
Figure 7: Representation of 3 rounds of Ketje.Each colored part corresponds to lanes that can be directly computed known from the 4 known output blocks.

Figure 8 :
Figure 8: ρ-shift offsets, that is the positions in each lane where bits of slice z = 0 before the application of ρπ are relocated.

Table 1 Table 1 :
Complexities of our divide-and-conquer attack with nonlinear sieving

Table 2 :
Attack complexity for different choices of guessed slices.Guess is the number of bits the adversary has to guess and Rel. is the number of new sieving relations he gets.T search is the complexity of the exhaustive search on the remaining state values after the sieving.