Boolean functions with restricted input and their robustness; application to the FLIP cipher

. We study the main cryptographic features of Boolean functions (balanced-ness, nonlinearity, algebraic immunity) when, for a given number n of variables, the input to these functions is restricted to some subset E of F n 2 . We study in particular the case when E equals the set of vectors of ﬁxed Hamming weight, which plays a role in the FLIP stream cipher and we study the robustness of the Boolean function in this cipher.


Introduction
In a cryptographic framework, Boolean functions are classically studied with an input ranging over the whole vector space F n 2 of binary vectors of some length n.This is the case when Boolean functions are used as the (main) nonlinear components of a stream cipher, in the so-called combiner and filter models of pseudo-random generators.However, it can happen that the function be in fact restricted to a subset (say E) of F n 2 .A recent example of such situation is given by the cipher FLIP (see [MJSC16]).

FLIP: filtering a constant Hamming weight register
The cipher FLIP is an encryption scheme that appeared recently.It is specifically to be combined with an homomorphic encryption scheme to improve the efficiency of somewhat homomorphic encryption frameworks.As for Kreyvium [CCF + 16] and LowMC [ARS + 15] the goal of the cipher is to present a decryption algorithm whose homomorphic evaluation is as insignificant as possible in term of homomorphic error growth.This homomorphicfriendly design requires to drastically reduce the multiplicative depth of the decryption circuit and in the case of FLIP, it led to use a non generic construction: the filter permutator.This symmetric primitive consists in updating a key register only by wire-cross permutations and then in filtering it with a Boolean function wich inputs the whole register to generate the keystream.At each clock cycle the wire-cross permutation used to shuffle the secret key is given by the output of a PRNG.The PRNG seed acting as an IV, at each clock cycle the input to the filtering function is only a reordering of the secret key bits.
This specificity produces an unusual situation for stream ciphers: the Hamming weight of the key register is invariant, equal to the Hamming weight of the secret key.For the instances of FLIP, the Hamming weight of the secret key is set to n 2 , with n the size of the secret key, way larger than usual key sizes ( > 500 and > 1000 bits for security parameters 80 and 128).These sizes prevent an exhaustive search but it still restricts the input of the filtering function to the half Hamming weight vectors of F n 2 .It raises a very natural question, which was not addressed in [MJSC16] but which is mandatory when evaluating the security: Does the filtering function maintain good behavior on this restricted input with respect to classical attacks?The Boolean criteria commonly used to study the robustness of a filtering function are always considered on the whole space F n 2 ; hence they do not apply on restricted inputs.Therefore in this work we study Boolean functions on restricted inputs, focusing on criteria adapted to restricted sets.

Boolean criteria on restricted sets
Let us begin with a preliminary remark: for the FLIP family of stream ciphers, the divide-and-conquer technique introduced by Siegenthaler [Sie84, Sie85] does not seem to apply.Siegenthaler's attack applies on a combination of several generators filtered by a Boolean function, when there is a correlation between the output of the function and some of its input coordinates, which allows to make an exhaustive search reduced to the outputs of the corresponding generators, without needing to consider the outputs of the other generators.To withstand the attack, the function needs in such framework to have large resilience order.However, in the FLIP family of stream ciphers, a permutation is applied at each clockcycle to only one register.It seems then very difficult to find a bias between the output to a function and a fixed set of input variables.More generally, it seems very difficult to apply Siegenthaler's attack on ciphers in which a filter function applies on restricted input, because the principle of the correlation attack, as explained above, is to make an exhaustive search on some part of the initial state without having any restriction on the rest of the state; the restriction (like fixing the Hamming weight) imposes a dependence between the two parts.Consequently we do not study the resilience of "restricted Boolean functions".But all the other classical features of Boolean functions (namely balancedness, algebraic immunity and nonlinearity) continue to play a direct role with respect to attacks in such new framework.However, their behavior changes because of the restriction on the input.

Balancedness
A first commonly accepted requirement on cryptographic Boolean functions is to be balanced -or at least almost balanced-since otherwise, if there is a fairly big bias in the output distribution of the function, then the attacker could detect the resulting statistical bias between the plaintext and the ciphertext, allowing to distinguish when two texts of the same length have high probability to be a plaintext and the corresponding ciphertext.We shall then be focussed on those functions which are balanced on the input set E. But since E may change in the process (this is not the case in FLIP but it could be in a variant), we are interested in Boolean functions whose restrictions to all sets E in some family E are balanced.Even if E does not change, we may wish to have a Boolean function which is balanced on a family of sets E, so that it can be used in a variety of situations.Given some family E of subsets of F n 2 , we shall say that a Boolean function f is perfectly balanced over E if its restriction to any set E ∈ E of even size is balanced.We shall be in particular interested in the case of E = {E n,1 , . . ., E n,n−1 }, where E n,k = {x ∈ F n 2 ; w H (x) = k}, w H denoting the Hamming weight.We shall then call such functions weightwise perfectly balanced.
Notation 1.We denote by w H (f ) k the Hamming weight of the evaluation vector of the function f on all the entries of fixed Hamming weight k: where w H denotes the Hamming weight.We accordingly denote w H (f ) i = |{x, w H (x) = i, f (x) = 0}| = n i − w H (f ) i .We denote by E n,k the set of such entries: E n,k = {x ∈ F n 2 ; w H (x) = k}.Definition 1.Let f be a Boolean function defined over F n 2 .It will be called weightwise perfectly balanced (WPB) if, for every k ∈ {1, . . ., n − 1}, the restriction of To make the function balanced on its whole domain F n 2 , we shall additionally impose that f (0, . . ., 0) = f (1, . . ., 1) and more precisely that f (0, . . ., 0) = 0; f (1, . . ., 1) = 1.This last constraint does not reduce the generality (when f (0, . . ., 0) = f (1, . . ., 1)), up to the addition of constant 1 to f , and it makes some constructions clearer.Note that weightwise perfectly balanced Boolean functions exist only if, for every k ∈ [1, n − 1], n k is even and this property is satisfied if and only if n is a power of 2. Note that w H (f 2 is then even for k ∈ [1, . . ., 2 −1 − 1] ∪ [2 −1 + 1, . . ., 2 − 1] and odd for k = 2 −1 = n/2.To be able to address the case where n is not a power of 2, we introduce: Definition 2. Let f be a Boolean function defined over F n 2 .It will be called weightwise almost perfectly balanced functions (WAPB) if, for every k ∈ when n k is odd.

Nonlinearity
A second parameter, which plays an important role for quantifying the contribution of the function to the resistance against attacks by affine approximations, like the fast correlation attack [MS88b], is the minimum of the Hamming distance d H (f, h) = |{x ∈ F n 2 ; f (x) = h(x)}| between f (x) and affine functions h(x) = a • x + ε, a ∈ F n 2 , ε ∈ F 2 (where "•" is some inner product in F n 2 ; any choice of an inner product will give the same definition).This parameter is called the nonlinearity of the function, and we shall denote it by NL(f ) when there is no restriction on the input to f , and NL E (f ) when the input to f is taken from a set E.
Let E be any subset of F n 2 and f any Boolean function defined over E (i.e.any function from E to F 2 ).Let (x) = a • x + ε be any affine function.Denoting by f a (x) the sum (in F 2 ) of f (x) and a • x, we have: , and the Hamming distance between f and a • x on inputs ranging over Hence, the Hamming distance between f and over E equals: Definition 3. Let E be any subset of F n 2 and f any Boolean function defined over E. We call nonlinearity of f over E and denote by NL E (f ) the minimum Hamming distance between f and the restrictions to E of affine functions over F n 2 .

Algebraic immunity
A third parameter plays a role for quantifying the contribution of the function to the resistance against algebraic attacks, giving the degree of the algebraic system obtained by the Courtois-Meier method [CM03] (which needs to be solved for recovering the initialization of the register).It is called the algebraic immunity of the function; we shall denote it by AI(f ) when there is no restriction on the input to f , and AI E (f ) when the input to f is taken from a set E.
Let E be any subset of F n 2 and f any Boolean function defined over E. The principle of the algebraic attack is to use the existence of Boolean functions g and h over F n 2 , such that h and gf coincide over E, while g is not identically null on E. In the case of the standard attack, both functions g and h must have low algebraic degree, and in the case of the fast algebraic attack, g must have low algebraic degree and h must have an algebraic degree reasonably low (say, not much larger than n/2).The algebraic immunity of f is then defined as AI(f ) = min{max(deg(g), deg(f g)); g = 0} and equals min{deg(g); f g = 0 or (f + 1)g = 0; g = 0} because if f g = h then f (g + h) = 0.It enables to define the algebraic immunity over a restricted set: Definition 4. We call algebraic immunity of a function f over a set E the number: min{deg(g); g annihilator of f or of f + 1 over E and g not identically null over E}.

Previous works
Studying the robustness of Boolean functions from these criteria has been largely applied for the security analysis of stream ciphers, and the corresponding attacks are considered as the standard attacks to consider for any stream cipher.In that sense, many works consider the three Boolean criteria presented above for the particular case of a design it introduces or of the cryptanalysis it develops.More specifically, some papers have already considered Boolean functions whose inputs are restricted.
The bias of a stream cipher output in presence of Hamming weight leakage is considered in [JD06].Precisely, it is shown that knowing the Hamming weight of a register when the updating function is an LFSR in a particular representation enables to distinguish the keystream from a random binary stream, and the authors also describe a correlation attack in this setting.This result can be deduced from an application of a subpart of our results: they use the balancedness flaw on a function on the sets {x | w H (x) = k} (the fact that this function is not weightwise perfectly balanced), and combine it with other equations to mount a correlation attack on these LFSR.
Concerning the algebraic immunity criteria, the work [CFGR12] realises a theoretical study of the algebraic phase of the so-called algebraic side channel attacks on block ciphers.The authors modify the notion of algebraic immunity to include the information (on Hamming weight or Hamming distance) obtained by exploiting the leakage and are able to obtain enough equations of degree one to solve the algebraic system with Gröbner methods.In the present paper, our modification of the definition of algebraic immunity is also related to Hamming weight but is of a different nature, being related to the fact that the input is restricted.Another major difference is that we focus on functions with one bit of output and not S-boxes functions.
A study has been made by Yuval Filmus et al. on the restrictions of Boolean functions to sets of inputs of fixed Hamming weight (that he calls "slices") [Fil16a,Fil16b,FKMW16,FM16]; this study is asymptotical and does not really fit with our cryptographic framework; the results from these papers have no overlap with ours.
Finally, the nonlinearity of Boolean functions under non-uniform input distribution has been recently studied in [GGPS17], but the chosen distribution is binomial and there is no overlap with our work in this case as well.

Our contributions
We realise the first study of balancedness, nonlinearity and algebraic immunity on Boolean functions with restricted inputs, centred on the fixed Hamming weight input.In this case we study the degradation of the parameters for functions optimal in the whole space, commonly used for basic cryptographic constructions.More suprisingly we determine bent functions which are linear for every restricted Hamming weight and hence have null NL E n,k for every k and relatively to algebraic immunity we prove counter-intuitive results for direct sums (used in the design of the FLIP Boolean function).
Then for each criterion, we compare its behavior in this constrained framework to the properties well known in F n 2 , we consider the functions with highest criterion on E n,k = {x | w H (x) = k} that we can construct.More precisely, for the balancedness criterion, we prove necessary conditions on the Algebraic Normal Form of f to be weightwise perfectly balanced and we give a primary construction (i.e.we exhibit a class) of such functions and a secondary construction for designing them.Since weightwise perfectly balanced functions can exist only in numbers of variables which are powers of 2, we also give a construction of weightwise almost perfectly balanced function for all n and present a relation between balancedness on fixed weight inputs and a transform similar to the Walsh transform involving symmetric functions.For the nonlinearity criterion, we give for every subset E of F n 2 an upper bound for those functions restricted to E and show that, contrarily to the case of F n 2 , this bound (related to bent functions) cannot be reached for most E n,k .We use an error correcting code perspective to construct functions with non null NL E n,k for all k ∈ [1, n].For the algebraic immunity criterion, we generalise the upper bound for all set Eof F n 2 and give precise results in the constant Hamming weight case, showing how the general algebraic immunity can decrease on E n,k .
We give a cryptanalysis aspect of this study analyzing the 4 instances of the cipher FLIP.For these functions we prove bounds for the three main criteria, also considering possible attack improvements with guess and determine attacks.We provide a new security analysis of this cipher, based on filtered function with fixed Hamming weight input.

Paper organisation
In Section 2 we show how much the restriction to inputs of fixed hamming weight can influence the cryptographic criteria.Then Section 3 concerns the behavior of the criteria of balancedness, nonlinearity and algebraic immunity on restricted inputs, and finally Section 4 presents the security analysis of FLIP with fixed Hamming weight input.

Fixed Hamming weight inputs and criteria degradations
In this section we show how the restriction to inputs of fixed Hamming weight can affect the cryptographic criteria.This restriction makes that some functions which are known as having optimally good cryptographic property when they are defined over the whole space totally lose this property when their input becomes restricted.It is trivially the case of so-called symmetric functions (whose output depends only on the Hamming weight of the input) when the input weight becomes restricted, like the majority function (which has optimal algebraic immunity over F n 2 ).But there are other examples.Thereafter we exhibit some functions highly degraded by a weightwise restriction.

Balancedness degradation in fixed input weight framework
First we consider the behavior of balanced functions, or highly resilient functions, compared to weightwise perfectly balanced functions as defined in Section 1.2.1.Weightwise perfectly balancedness implies balancedness over all F n 2 whereas the inverse is false, as illustrated by the next Remark on highly resilient functions.
Remark 1.For all n ≥ 2, there exists an (n − 1)-resilient function (i.e. a balanced Boolean function which remains balanced when at most n − 1 of its variables are arbitrarily fixed) which is unbalanced for all weight k ∈ [1, n − 1].Indeed, the first elementary symmetric Boolean function σ 1 = n i=1 x i = w H (x) (mod 2) is (n − 1)-resilient and is constant on all fixed weight input, its weightwise restrictions are as much unbalanced as possible.

Nonlinearity degradation in fixed input weight framework
Fixing the input Hamming weight may deteriorate in an extreme way the nonlinearity of a Boolean function.
Proposition 1.For every n, there exist n-variable bent functions f such that, for every k = 0, . . ., n, NL E n,k (f ) = 0.
Proof: This is for instance the case of the function f (x) = w H (x) 2 = 1≤i<j≤n x i x j .This function is, up to the addition of an affine function, the only bent symmetric function (see e.g.[Car10]).Since it is symmetric, fixing the Hamming weight of its input makes it constant and therefore with null nonlinearity.
More generally, it would be interesting to characterize those bent functions whose restrictions to E n,k have null nonlinearity (i.e. are affine), for every k.This task seems very difficult but we are able to achieve it in the particular case of quadratic functions.We begin with an observation: Remark 2. A Boolean function satisfies NL E n,k (f ) = 0 for every k, i.e. has all its restrictions to E n,k affine, if and only if there exist symmetric Boolean functions ϕ 0 , ϕ 1 , . . ., ϕ n such that f (x) = ϕ 0 (x) + n i=1 ϕ i (x) x i .Any symmetric Boolean functions ϕ(x) can be written in the form • Σ(x) where is affine and Σ is the vectorial (n, n)-function whose ith coordinate function is the elementary symmetric function 1≤j1<•••<ji≤n i l=1 x j l .We deduce that f satisfies NL E n,k (f ) = 0 for every k if and only if it has the form f (x) = 0 • Σ(x) + n i=1 i • Σ(x)x i , where the i 's are affine.In other words (after gathering all the terms in this expression which involve each elementary symmetric function σ i ): where the i 's are all affine.
Proof: According to Remark 2, a quadratic function satisfies NL E n,k (f ) = 0 for every k if and only if, up to the addition of an affine function, it has the form: where is linear and ∈ F 2 .The symplectic form associated (x, y) → f (x + y) + f (x) + f (y) + f (0) (see e.g.[Car10]) equals: of this symplectic form is the vector space of equations: x j + j =i x j = 0, where i ranges from 1 to n.The sum L i + L i of two of these equations equals x j = 0 otherwise.Hence, denoting we have that, if = 1, then all the coordinates of indices i ∈ I of an element of E are equal to some bit η and all those such that i ∈ I c are equal to η + n j=1 x j , and if = 0, there is no condition on x ∈ E if I = ∅ or I = {1, . . ., n} and if I = ∅, {1, . . ., n}, the condition is n j=1 x j = 0. We then have two cases: then either all x i 's are null, in which case (L i ) is satisfied, or all are equal to 1, in which case (L i ) becomes (since n is even) (1, . . ., 1) = 1; hence, if this latter equality is true (i.e. if I has odd cardinality), E = {0}; if = 0 then all equations L i are equal to (x) = 0; then E = {0} unless the hyperplane ker has a trivial intersection with the hyperplane of equation n j=1 x j = 0, which is possible only if n = 2; the case = 0 is then compatible with f bent only for n = 2; we shall not consider it anymore.
• if x ∈ E is such that n j=1 x j = 1 then if = 1, all x i 's such that i ∈ I are equal to η and those x i 's such that i ∈ I c are equal to η + 1, which implies η|I| + (η + 1)|I c | = |I c | = 1 (mod 2); hence I has odd cardinality and we have seen that E = {0} in such case.
Two n-variable Boolean functions f and g are called EA-equivalent if there exist an affine automorphism L over F n 2 and an affine n-variable function such that f = g • L + .All the functions above are EA-equivalent to each others, since all quadratic bent functions are EA-equivalent to each others, but EA-equivalence is not preserving the Hamming weight, so the nonlinearity degradation with weightwise consideration cannot be seen equivalence class by equivalence class.

Algebraic immunity degradation in fixed input weight framework
The majority function is well-known for its optimal algebraic immunity and, as all symmetric functions, is constant on all inputs of the same Hamming weight.Therefore it is a trivial example where the algebraic immunity collapses in our context.To go further we investigate the algebraic immunity of direct sums of functions.
The so-called direct sum is a well-known secondary construction of Boolean functions which on the entire space F n 2 , enables to guarantee some algebraic immunity of a function based one two fonctions on a smaller number of variables, we prove here that it behaves differently when the inputs are restricted to a fixed Hamming weight.
Definition 5 (Direct Sum).Let f be a Boolean function of n variables and g a Boolean function of m variables, f and g depending in distinct variables, the direct sum h of f and g is defined by: h(x, y) = f (x) + g(y), where x ∈ F n 2 and y ∈ F m 2 .Theorem 1. [Link between AI k and AI in direct sum] Let F be the direct sum of f and g with n and m variables respectively.Let k be such that n ≤ k ≤ m.Then the following relation holds: Proof: Let h(x, y) be a non-null annihilator of F over E n+m,k .Let (a, b) ∈ F n+m 2 have Hamming weight k and be such that h(a, b) = 1.Since (a, b) has Hamming weight k, we may, up to changing the order of the coordinates of b (and without loss of generality), assume that for every j = 1, . . ., n, we have b j = a j + 1 and for every j = n + 1, . . .k, we have b j = 1 (so that for every j = k + 1, . . .m, we have b j = 0).We define the following affine function over F n 2 : where the length of the part "1, . . ., 1" equals k − n.We have

is a non-zero annihilator of f and has algebraic degree at most deg(h) + deg(g); then we have deg(h) + deg(g) ≥ AI(f ). If g(b)
= 1, then by applying the same reasoning to f + 1 instead of f and g + 1 instead of g, we have deg(h) + deg(g) ≥ AI(f ).If h(x, y) is a non-null annihilator of F + 1 over E n+m,k , we have the same conclusion by replacing f by f + 1 or g by g + 1.This completes the proof.
This bound proves in particular that, if k ≥ n, then adding m ≥ k virtual variables to a function (taking g = 0) does not lower the algebraic immunity with inputs of Hamming weight k with respect to the (global) original algebraic immunity.This was already true (with no condition on n, k, m) when dealing with functions with no restriction on the input and it was completely straightforward to prove it, while here it was less obvious.Note that the bound of Theorem 1 is tight when deg(g) = 0: take for f a function whose algebraic immunity equals its algebraic degree; we have then that AI k (F ) equals AI(f ) = deg(f ), since it cannot be larger than the algebraic degree of f over E n,k (formally proved in the Corollary 5) which is at most equal to deg(f ); the three parameters AI k (F ), the algebraic degree of f over E n,k and deg(f ) are then equal.
Nevertheless, the bound of Theorem 1 also suggests that making the direct sum with a non-constant Boolean function g may lower the algebraic immunity over inputs of Hamming weight k with respect to the (global) original algebraic immunity.This may seem rather counter-intuitive, but it is true.Let us give an example: take , and x 2 being an annhilator of f (x 1 , x 2 , x 3 ) + g(x 4 , . . ., x 10 ) over inputs of Hamming weight 5, because x 2 (f (x 1 , x 2 , x 3 ) + g(x 4 , . . ., x 10 )) = x 2 (1 + 10 i=1 x i ) vanishes when the input has weight 5, we have AI 5 (f (x 1 , x 2 , x 3 ) + g(x 4 , . . ., x 10 )) = 1; the bound is then tight here.In fact, making the direct sum with a non-constant Boolean function g may decrease drastically the algebraic immunity over inputs of Hamming weight k: take n odd, f (x) = 1 + maj(x) where maj is the majority function over n variables (which has optimal algebraic immunity n+1 2 ) and g(y) = maj(y) over n variables as well.Then 2 and w H (y) ≤ n−1 2 .We fall then down to a null algebraic immunity with input weight n (however, the bound is not tight here because the algebraic degree of maj is in general strictly larger than its algebraic immunity).
3 General study of restricted inputs criteria, and constructions

Balancedness
In this part we study the criterion of balancedness with weightwise consideration; we first determine the necessary conditions for a function to be weightwise perfectly balanced, then we construct such functions and finally we describe a new transform adapted to weightwise balancedness.

Relation with ANF
Recall that any Boolean function over F n 2 has a unique algebraic normal form (ANF) f (x) = I⊆{1,...,n} a I i∈I x i , where a I ∈ F 2 .Any term i∈I x i in such ANF is called a monomial and its degree equals |I|.The algebraic degree of f equals the global degree max I;a I =1 |I| of its ANF.The function f is affine if and only if its algebraic degree is at most 1.In the following, we give more insights on necessary conditions on the ANF of WPB or WAPB (recall definitions in Section 1.2.1).Remark 3.For every even n and ε = 0 or 1, function ( is balanced on all words of fixed odd Hamming weight, since for such word, either w H (x 1 , . . ., x n 2 ) is odd and (x 1 , x 2 , . . ., x n ) = ε + 1, or w H (x n 2 +1 , . . ., x n ) is odd and (x 1 , x 2 , . . ., x n ) = ε, and the words of the former kind are the shifted by n 2 positions of the words of the latter kind and are then no more and no less numerous.Conversely, any affine function balanced on words of Hamming weight 1 has the form ε , where ε = 0 or 1.Any weightwise perfectly balanced Boolean function has then the following form : where g is non null and is the sum of monomials of degrees at least 2, since all monomials of degree at least 2 vanish at inputs of Hamming weight 1.
More precisely, we can derive necessary conditions of the ANF of f for weightwise perfectly balancedness: Proposition 3. If f is a weightwise (almost) perfect Boolean function of n variables then the ANF of f contains n/2 monomials of degree 1 and at least n/4 monomials of degree 2, where n/2 equals n/2 if n is even and (n ± 1)/2 if n is odd.
Proof: In the particular case where f is linear, w H (f ) k is exactly the number of entries of weight k for which an odd number of the monomials of f are set to 1. Therefore denoting by d the number of (degree 1) monomials in the ANF of f , we have: For any function f , as w H (f ) k is only determined by the monomials of f of degree at most k, let us partition f into f , q f and f , respectively made of the monomials of degree 1, 2 and strictly larger than 2 in the ANF of f .For k = 1, we have: where 2 for n odd.We have: Therefore, if f is (almost) balanced for fixed weights 1 and 2, then, for n even, w We have seen that f being perfectly balanced implies that n = 2 and therefore w H (f and odd for k = 2 −1 = n/2.This enables to determine the parity of the number of monomials of each degree of f smaller than or equal to 2 −1 = n/2.Concretely, f has an even number of monomials of degree d for 1 ≤ d ≤ n/2 − 1 (by induction at weight k = d this number has to be even due to w H (m k ) k = 1 ) and an odd number of monomials of degree n/2, finishing the proof.

Constructions
The direct sum construction (see Definition 5) can be a starting point to build weightwise perfectly balanced function.This secondary construction does not build a weightwise perfectly balanced function from two weightwise perfectly balanced functions as we can see from the next Lemma and Corollary.
Proof: Let f be a Boolean function such that f is a direct sum of two Boolean functions g 1 and g 2 of n 1 and n 2 variables.As f is a direct sum of g 1 and g 2 , we can link the value of First we do a partition of the entries of f of Hamming weight k depending on the Hamming weight of the entries of g 1 and g 2 , this gives a partition of k + 1 sets where g 1 is evaluated on E n1,i and g 2 is evaluated on E n2,k−i .Then f (x 1 , . . ., x n ) = 1 is equivalent to g 1 (x 1 , . . ., x n1 ) = g 2 (x n1+1 , . . ., x n ), so we can link w H (f ) k to the number of entries where g 1 gives 1 and g 2 gives 0 plus the number of entries where g 1 gives 0 and g 2 gives 1. Finally we obtain: Now we suppose that f is weightwise perfectly balanced and we use that ≡ 1 mod 2 and developing: Moreover, we know that, as n 2 is also a power of 2, then for each i is even.To conclude, if f is weightwise perfectly balanced, then we have the following relation: Then we need that g 1 (0 . . .
The corollary below is a direct consequence: 2 ) and g 2 (x n 2 +1 , . . ., x n ) are two weightwise perfectly balanced functions, then the Boolean function defined by the direct sum of g 1 and g 2 cannot be weightwise perfectly balanced.
Hence, the direct sum, when applied to perfectly balanced functions, does not lead to a weightwise perfectly balanced function; nevertheless we can derive such construction from weightwise perfectly balanced functions by applying the direct sum after modifying one of the functions: if f and g are two n-variable weightwise perfectly balanced functions, then h(x, y) = f (x) + n i=1 x i + g(y) is a 2n-variable weightwise perfectly balanced function.In fact, this result is a particular case of a more general construction, inspired by the so-called indirect sum, which builds a Boolean function from four Boolean functions as follows: ), and which allowed to construct bent and correlation immune functions: Theorem 2. Let f , f and g be three weightwise perfectly balanced n-variable functions and let g be any n-variable Boolean function, then h(x, y) = f (x) + n i=1 x i + g(y) + (f (x) + f (x))g (y), where x, y ∈ F n 2 , is a weightwise perfectly balanced 2n-variable function.
We can extend the previous example to get weightwise almost perfectly balanced Boolean function on n variables for all n.
Proposition 5.The function f n in n ≥ 2 variables, recursively defined by f 2 (x 1 , x 2 ) = x 1 and for n ≥ 3: is a weightwise almost perfectly balanced Boolean function of degree 2 d−1 , where 2 d ≤ n < 2 d+1 , and with n − 1 monomials in its ANF if n is even and n − 2 monomials if n is odd.Note that this function can be written as a direct sum for all n ≥ 2.

Proof:
The degree and number of monomials of f n are easily checked by induction on n for n ≥ 2. We prove the weightwise almost perfect balance property by induction on n as well: We now assume that n ≥ 3 and that, for every 2 ≤ i ≤ n − 1, f i is WAPB.We prove under this induction hypothesis that f n is WAPB.
• for n odd: As n − 1 is even, at least one of the coefficients n−1 k , n−1 k−1 is even (as n − 1 is even and k or k − 1 is odd therefore one of those written in binary has a digit equal to 1 where the corresponding one of n is 0 which characterize the even parity of this binomial coefficient), therefore w Hence, f n is WAPB.
• for n = 2 d ; d > 1, we can view f n as the following direct sum: As f 2 d−1 is WPB by hypothesis, we can apply Theorem 2 with g = 0, giving that f n is WPB.
• n = p • 2 d ; 1 < p odd ; we decompose f n in a direct sum and use techniques of Theorem 2's proof: x n−i we get: Equation 2 comes from g being WPB of 2 d variables, therefore w H (g To conclude for n ≥ 2, f n is weightwise (almost) perfectly balanced.

A Walsh-like transform involving symmetric functions and handling balance with fixed input weight.
For i ∈ {1, . . ., n}, let us recall that σ i denotes the ith elementary symmetric Boolean function: x j l (sum performed in F 2 ) and Σ the vectorial (n, n)-function whose ith coordinate function is σ i .
Lemma 2. For k ∈ {1, . . ., n}, we have w H (x) = k if and only if, for every i = 1, . . ., n, we have Indeed, the σ i 's generate by linear combinations all those symmetric Boolean functions which are null at input 0, and we know that two vectors x, y have the same nonzero Hamming weight if and only if every symmetric Boolean function null at input 0 takes the same value at inputs x and y (indeed, the indicator of the set of those vectors of some nonzero Hamming weight k is a symmetric function null at input 0).We have then: Indeed, for every x ∈ F n 2 , we have: With the same notation, we have w H (x) = w H (y) if and only if Σ(x) = Σ(y), and we have then: Hence, the quadratic mean of the sequence: k → w H (x)=k (−1) f (x) equals 1 √ n+1 times the quadratic mean of the sequence: (5) where g ranges over the set of all symmetric Boolean functions null at 0 input.Expression (5) corresponds to a transformation similar to the Walsh transform where the linear functions a • x are replaced by the symmetric functions null at input 0.

Nonlinearity
In this part we study the criterion of nonlinearity on restricted inputs; first we study the bound on the maximal nonlinearity reachable by a function on a restricted set, then we investigate the behavior of this bound for the fixed Hamming weight case.We give an error correcting code perspective on these investigations, enabling to construct functions with a guaranteed amount of nonlinearity when the Hamming weight is fixed and finally we show how direct sums can provide some nonlinearity in this setting.

Nonlinearity upper bound for all restricted sets
From the definition of nonlinearity over a set of Section 1.2.2 we deduce: Proposition 6.For every n-variable Boolean function f over F n 2 and every subset E of F n 2 , we have: This obvious observation will be useful below.
We have: Equation 6 is obtained by changing the order of the two summations and applying the classical equality ( i∈I a i ) 2 = i,j∈I a i a j expressing the square of a summation.The second sum being not null only when x + y = 0, we get Equation 7. As the maximum of a sequence of numbers is always bounded below by the arithmetic mean, we deduce: Proposition 7.For every subset E of F n 2 and every Boolean function f defined over E, we have: This bound when applied with E = F n 2 is called the covering radius bound and the functions achieving it with equality are called bent and are characterized by the balancedness of their derivatives D a f (x) = f (x) + f (x + a), for a = 0.
We show that this bound can be improved for some E and in particular when E is the set of vectors of fixed Hamming weight: Proposition 8. Let E be any subset of F n 2 , f a Boolean function over E, and F a vectorspace where there exists v in F n 2 such that v • (x + y) = 1 for all (x, y) ∈ E 2 such that 0 = x + y ∈ F ⊥ .Then we have: Proof: Let F be any vector subspace of F n 2 .Then we have: and Let us assume that there exists v in F n 2 such that, for all (x, y) ∈ E 2 such that 0 = x + y ∈ F ⊥ , we have v • (x + y) = 1.Suppose that: Then λ may be without loss of generality assumed to be positive.Indeed, if λ is negative, then let v be as above, and let f (x) = f (x) + v • x; we have: Therefore we deduce the bound of the Proposition.
Moreover, we can also take a family of vectorspaces F, and the proposition above can then lead to the corollary below.
Corollary 2. Let E be any subset of F n 2 , f a Boolean function over E, and F a family of vectorspaces F for each of which there exists v in F n 2 such that v • (x + y) = 1 for all (x, y) ∈ E 2 such that 0 = x + y ∈ F ⊥ .Then we have: In particular, taking for F the family of all linear hyperplanes of F n 2 (for which such v always exists since F ⊥ has dimension 1), we have: 2 and f a Boolean function over E. Then: Remark 4. Note that this result applied for E = F n 2 proves again that the derivatives of bent functions are all balanced.

Nonlinearity upper bound for fixed Hamming weight input
Let us now consider the case of E = E n,k for k = 0, . . ., n, where E n,k is the set of vectors of Hamming weight k in F n 2 .We have: Note that this bound could be tight only if n k is a square, but we shall see that even in that case, it is not.Of course, we have and it seems difficult to determine for which values of n this latter bound is tight.Let us denote by i the Hamming weight of a.If i is odd then {(x, y) ∈ E 2 n,k ; x + y = a} is empty and if i is even, then |{(x, y) ∈ E 2 n,k ; x + y = a}| equals the number of possible choices (for building the support of x) of i 2 indices in the support of a and of k − i 2 indices outside the support of a. Then Clearly, since the sum is invariant when swapping x and y, if is not divisible by 4, then λ equals twice the sum of an odd number of integers equal to ±1, where λ is defined as in corollary 2; it is then strictly positive.For instance for k = 2 and i = 4, and for n ≥ 4, the sum cannot be null.We deduce: Corollary 4. For all n and k ∈ {1, . . ., n − 1}, Bound (3.2.2) is never tight, except maybe for two particular pairs (n, k): (50, 3) and (50, 47).
Indeed, the bound can only be tight when E n,k is a square.Erdős showed the following theorem.
• k = 0 (or k = n): Proposition 7 gives NL En,0 (f ) ≤ 0 which is tight because for all n and for all Boolean function f , f k when k = 0 (or k = n) is constant.
| is a square if and only if n is a square; using Proposition 6, every function restricted to its entries of Hamming weight 1 (or n − 1) is linear therefore NL En,1 (f ) = 0 whereas the bound tells NL En,1 (f ) and if n is odd, the sum (x,y)∈E 2 x+y=a (−1) f (x)+f (y) cannot be null, and for i = 4, i

Error correcting codes perspective
Reed Muller codes RM (r, n) are binary codes of length 2 n whose codewords are the evaluations of all Boolean functions of algebraic degrees at most r in n variables on their 2 n entries.Fixing the Hamming weight of the entries gives particular punctured Reed Muller codes whose characteristics are directly linked to Boolean functions with fixed weight entries.As Reed Muller codes have been intensively studied in other contexts we do not describe fundamental new results in this part, we rather use another perspective to give interesting constructions and help to link our problematic to a quite well known topic.Definition 6.For all n ∈ N * ; r, k ∈ [0, n] we denote by RM (r, n) k the punctured Reed Muller code of length n k obtained by puncturing RM (r, n) on all entries of Hamming weight different from k.
Remark 5. RM (1, n) k corresponds to the evaluation of all affine functions in n variables on entries of Hamming weight k; therefore, for every Boolean function f , NL E n,k (f ) is the distance between f 's truth table restricted to Hamming weight k entries and RM (1, n) k .The maximal value of NL E n,k (f ) when f ranges over the set of all Boolean functions equals the covering radius of RM (1, n) k .
In the next remark we exhibit the parameters of the code RM (1, n) k ; this provides a lower bound on the maximal value of NL E n,k .
Let l(x) = i∈I x i be any linear Boolean function whose restriction to the entries of Hamming weight k is non constant, and let |I| = .We have ∈ {1, . . ., n − 1}.The number of entries x of Hamming weight k such that |supp(x) ∩ I| = i equals i n− k−i .We deduce that the minimum distance of RM (1, n) k equals: min In other words, writing P [X k ] for the coefficient of X k in a polynomial P (X), the minimum distance of RM (1, n) k equals: is invariant when changing into n − (by changing i into k − i); we can then replace max (0< <n) by max (0< ≤n/2) .
Dumer and Kapralova studied this punctured Reed and Muller code of order 1 in 2013 and more recently they also studied the general case of the punctured Reed and Muller codes of order r.See the results in the two following papers [DK17,DK13].Note that the maximal value of NL E n,k (f ) when f ranges over the set of all Boolean functions (i.e. the covering radius of RM (1, n) k ) is bounded from below by d 2 .It is then nonzero except for particular values of k and enables to directly build functions reaching this minimal bound for all k from this error correcting code perspective.

Direct sum and NL En,k
Let N be any positive integer and k ∈ {1, . . ., N }.We recall that the nonlinearity NL E N,k (F ) of an N -variable function F over a set E N,k equals the minimum Hamming distance between the restriction of F to E N,k and the restrictions of affine functions to E N,k .
Lemma 3 (Direct sum and NL E N,k ).Let F be the direct sum of f and g, we have: We have: Although this inequality does not provide a tight bound, it enables to guarantee some nonlinearity on fixed Hamming weight input of a function from two simpler functions with high nonlinearity in this context.

Algebraic Immunity
In this part we study the criterion of algebraic immunity on restricted inputs; first we study the bound on the maximal algebraic immunity reachable by a function on a restricted set, then we investigate the behavior of this bound for the fixed Hamming weight case and give more detailed results in relation with this particular case.Finally we show how direct sums can provide some algebraic immunity in this setting.

Algebraic immunity upper bound for all restricted sets
In the case of E = F n 2 , Courtois and Meier [CM03] have shown that, for every non-negative integers d and e such that d + e ≥ n, there exists a nonzero Boolean function g of algebraic degree at most e and a Boolean function h of algebraic degree at most d such that h = gf .For e = d = n/2 , this proved that the so-called algebraic immunity of f (see Definition 4 in Section 1.2.3) is at most n/2 .We revisit these results for functions defined over a subset of F n 2 .
Proposition In other words, the algebraic degree of any Boolean function over E is bounded above by the least value of d such that rank(M d,E ) = |E|.Indeed, in Proposition 9, we have g = 1 and gf = h on E where h has algebraic degree at most d.
Taking d = 0, we have rank(M d,E ) = 1 and, calling annihilator of f on E any Boolean function g over E whose product with f vanishes: Corollary 6.Let E be any non-empty subset of F n 2 and f any non-constant Boolean function defined over E. Let n and e be such that rank(M e,E ) = |E|, then there exists a nonzero annihilator of f of algebraic degree at most e over E. Indeed, in Proposition 9, we have h constant and since gf = 1 on E is impossible, we have then h = 0. Note that this shows that the algebraic immunity of a function (see Definition 4 in Section 1.2.3), which for a random Boolean function over F n 2 lies not far from n/2 as shown by F. Didier in [Did06], can tumble down when the input is restricted to a set E. Notice also that Corollary 6 can be viewed as a corollary of Corollary 5, since we can take f + 1 for annihilator.This being observed, we have in fact a stronger result when taking e = d; we have: Corollary 7. Let E be any non-empty subset of F n 2 and f any Boolean function defined over E. Let n and e be such that rank(M e,E ) > |E| 2 , then there exists g of algebraic degree at most e, whose product with f or f + 1 is null on E, and whose restriction to E is nonzero.
Indeed, using a classical idea of Meier et al. [MPC04], either the functions g and h of Proposition 9 coincide on E, and we have then gf + h = g(f + 1) = 0 on E, where g has algebraic degree at most e and nonzero restriction to E, or they do not and we have, after multiplication of equality h = gf by f , that (g + h)f = 0, where g + h has algebraic degree at most e and nonzero restriction to E.
The situation is then similar to that described by Meier et al. and

Algebraic immunity upper bound for fixed Hamming weight input
In this section, we focus on the particular case when the input is restricted to the words of Hamming weight fixed: E n,k for some k ∈ [1, n − 1], note that M n,k,d is a generator matrix of the code RM (d, n) k from definition 6.To be able to evaluate efficiently in such situation the algebraic immunity by using Proposition 9 and its corollaries, there remains to calculate the rank of the matrix M n,k,d for each d and k: Proof: The principle of this proof is to find a recurring relation on the rank of M n,k,d .To this aim, we use a construction which looks like the well-known u u + v construction of Reed-Muller codes: every Boolean function f of algebraic degree at most d can be written in the form : where g has algebraic degree at most d and h has algebraic degree at most d − 1.In the sequel of the proof, we shall use the notations: Let ψ n,k,d be the following linear application, mapping every Boolean function in n variables (defined by its ANF) of algebraic degree at most d to the restriction of its truth table to the elements in E n,k : where u x means u i ≤ x i for every i.This application ψ n,k,d is linear and moreover, the rank of M n,k,d is exactly the rank of this linear application ψ n,k,d .
Denoting by m u the monomial x u , the rank of ψ n,k,d is the rank of the family of the following vectors: We also split the vectors x of F n 2 of Hamming weight k into: Notice that for every u ∈ F 2 and every x ∈ E 1 , we have m u (x) = 0. The representing the linear application ψ n,k,d has then the form given in figure 1.
A n,k,d takes its entries on the set of monomials in which x n does not occur and of degrees at most d.The output of the linear application defined by A n,k,d is the truth table of Boolean functions on those inputs of Hamming weight k where the value of x n is set to 0. Then, as x n does not occur in the entries and is fixed to 0 in the output, A n,k,d defines exactly the linear application which gives the truth table on words of weight k of all Boolean functions with n − 1 variables (x n is fixed) and of degrees at most d.
B n,k,d defines the linear application which gives the truth table on words of weight k − 1 (because x n is fixed to 1 and not 0 anymore) of all Boolean functions with n − 1 variables (x n being fixed) and of degrees at most d − 1.Hence, A n,k,d defines the linear application Moreover, let us prove that the rank of this matrix is equal to the rank of A n,k,d plus the rank of B n,k,d (i.e.M n−1,k−1,d does not play any role in the rank of the whole matrix).Indeed, if we have a vector of length n k which is a linear combination on the lines such that the last n−1 k coordinates of the resulting vector are null, (i.e.we are in the kernel of A n,k,d ) then this vector is linearly dependent from the vectors defined by B n,k,d .By viewing this in terms of Boolean functions, we prove that if f is a Boolean function in the linear span of F 1 such that ∀x ∈ E 1 , f (x) = 0 (i.e. in the kernel of A n,k,d ) then f is in the linear span of F 2 ; indeed: The Boolean function f is of degree less than d, then h is of degree less than d and g is of degree less than d − 1.But for all x ∈ E 1 , we have f (x) = 0, then that means that h(x 1 , • • • , x n−1 ) = 0, then f is in the linear span of F 2 .Then we deduce the following recurring relation: In fact, the monomials of degree exactly k correspond to the canonical basis of the Boolean functions defined over E n,k (representing within their truth table).For d ≥ n − k, we can choose the Boolean functions defined by Remark 7.For r > 0, we have: , then we have a fortiori k−r+2 n−k+2 > 2 −1/r , . . ., k n−k+r > 2 −1/r , and we have then . Hence, the best possible algebraic immunity of a function with constrained input Hamming weight is lower than for unconstrained functions.
With theorem 4, we have the dimension of the image of ψ n,k,d and then of its kernel.Without direct application to the others sections we can exhibit more properties on the basis of this kernel.
Proposition 10.Let k, r and n be such that k ≤ n 2 and let Then any Boolean function defined as: is null on the set E n,k of all binary vectors of size n with Hamming weight equal to k.More generally, for every j < k and s, and any u of Hamming weight equal to j, the function defined as: Proof: Without loss of generality, we take x i l is a elementary symmetric Boolean function on n − j variables of degree s − j, then it is constant when the Hamming weight of the entry is fixed (which is the case when x u = 1 here) and its value is The sum involved in this definition is an elementary symmetric Boolean function but defined on a smaller set of variables.
is the set of all the monomials of degree k.
is the set of all the Boolean functions of the form Proof: See the end of the proof of theorem 4

Direct sum and AI k
One of the main purposes of this paper is to discuss (see the Section 4) the robustness of the filter function in FLIP [MJSC16].This function being built as a direct sum, we need then to study the algebraic immunity of direct sums.Complementary to the results of Section 2.3 linking classic algebraic immunity and algebraic immunity on fixed Hamming weight input, we investigate here the behavior of AI k for a direct sum construction.As for the nonlinearity case, it enables to build functions with a guaranteed algebraic immunity from functions with a lesser number of variables.
For every Boolean function say F for example, we will denote by F k the Boolean function restricted on its input of Hamming weight k.
Lemma 4 (Direct sum and AI k ).Let F be the direct sum of f and g with n and m variables respectively.Then for all k ≤ min(n, m), AI k (f ) follows the bound : Proof: Suppose that we have A(x) a non-zero annihilator of F k .Then we will show that A is a non-zero annihilator of f or 1 + f and g k− or 1 + g k− for some .For all X ∈ E n+m,k , A(X)F (X) = 0.Moreover, A is non-null on E n+m,k , so, there exists X ∈ E n+m,k such that A( X) = 1.We write X as ( x, y) where x ∈ F n 2 and we define as w H ( x), then the weight of y ∈ F m 2 is k − .Finally for this , we fix for X ∈ E n+m,k the x part to the value x and we consider all possible values for y ∈ F m 2 of Hamming weight k − .By doing so, it appears that A k is an annihilator of g k− or 1 + g k− , and is non null by construction.We can also fix y to y and consider all possible values for x of Hamming weight and find out that A k is also a non-zero annihilator of f or 1 + f .Therefore Recalling that = w H ( x) and then 0 ≤ ≤ k finishes the proof.

Open Questions
Considering Boolean functions on restricted input with a cryptographic point of view is quite new.Our theoretical study focusing on fixed Hamming weight input tries to address most of the natural questions in this context.In this part we enlight some other questions of variable interest and presumed difficulty which are not answered yet.
WPB function with minimal number of monomials.We proved in Section 3.1.1that a WPB function has an ANF containing at least 3n 4 + 1 monomials (for n > 4) and we exhibited a construction with n − 1 monomials.It remains then to determine whether this number of monomials in the ANF is the smallest number of monomials to obtain a WPB Boolean function.Determining the minimal number of monomials necessary to fulfill a Boolean criterion could lead to low cost functions usable in the FHE or MPC context.Tightness of NL E N,k bound.In Section 3.2.2we proved the upper bound: As this bound is unreachable for almost all values of n and k, it would be nice to investigate if the floor of this value is a tight upper bound.Moreover this quantity being the covering radius of a punctured Reed-Muller code, various approaches could help to precise its value or give intuition on weightwise bent functions.
Exact behavior of algebraic immunity.We proved in Section 3.3.2that the algebraic immunity e when the input is restricted to the Hamming weight k is bounded with the relation: It would be nice to determine the smallest integer e satisfying this relation, and the asymptotic behavior of e relatively to the standard algebraic immunity upper bound n/2 for meaningfull values of k (around n/2).A common function considered in cryptography for its optimal AI is the majority function, which is constant when the input weight is fixed, therefore optimal functions for restricted weight algebraic immunity could lead to very different constructions.
Tightness of AI k bound.Back to Section 3.3.2,we linked the algebraic immunity upper bound to the rank of the generator matrix of a punctured Reed-Müller code RM (d, n) k .This matrix can also be used to compute the exact AI k of a given function, by partitioning the columns depending on the value of f on the column entry and determining when the rank of one of these two matrices is strictly inferior to the rank of the global one.For matrices with rank r at least twice the number of columns, proving the existence of a partition of the columns in two rank r matrices will prove the tightness of the AI k upper bound e.

Case study on FLIP
The stream cipher FLIP [MJSC16] has a non standard design (i.e. the filter permutator) where the updating process of the internal state consists in permuting its coordinates.Therefore, the Hamming weight of the internal state is constant during all the encryption.In the four proposed instances, the Hamming weight of this register is forced to n 2 where n is the size of the register (n is larger than the security parameter λ, enough to ensure that n n/2 ≥ 2 λ ).
For classical filtered pseudo-random generators (for example filtered Linear Feedback Shift Registers), when the next-state function reaches all elements in F n 2 or F n 2 \ {0}, the three main criteria (for functions defined over F n 2 ) are relevant.Indeed, as the input is the whole space, designers can ensure that there are no extra relations on the filtering function inputs.However, if all inputs of the filtering function are not all reachable by the next-state function as in this case, then the security analysis does not rely on the classical criteria defined for Boolean functions over all F n 2 , because the internal state itself does not reach all possible values.Then, the security analysis must be done in the adequate model: the stream cipher function is only evaluated on entries from E n, n 2 , and the security should be studied relatively to the robustness of Boolean functions over E n, n 2 .The purpose of this section is first to stand on the fixed input weight criteria of the proposed filtering functions, and then to analyze the security of this stream cipher adapting standard cryptanalysis over F n 2 to fixed weight entries.In an article published at CRYPTO 2016 [DLR16], Duval et al. gave a guess and determine attack on preliminary instances of FLIP, leading the authors of FLIP to add more triangular functions in the filtering function.Therefore we also study the case of guess and determine attacks, combined with algebraic like attacks and with correlation like attacks, when the Hamming weight is fixed.

FLIP instances
We recall the 4 filtering functions proposed in FLIP in table 1, each one defined by 4 parameters n 1 , n 2 , nb and h where the filtering function is the direct sum of a linear function of n 1 variables, a quadratic function of n 2 variables (obtained by direct sum) and nb triangular functions of degree h.A triangular function of degree h is a direct sum of one monomial of each degree from 1 to h.Notice that in the ANF of f , each variable is used once, and f can therefore be expressed as a direct sum in various ways, which is determinant to bound its parameters on fixed Hamming weight inputs.

Balancedness of FLIP and distinguishing attack
Distinguisher: For any stream cipher, it is important to guarantee that the keystream has good statistical properties (i.e. looks like a random sequence), to avoid the possibility for an attacker to distinguish the keystream from a random sequence.That is a reason why Boolean functions used in cryptography should be balanced and therefore, in a model where the Hamming weight is known, Boolean functions should be weightwise perfectly balanced functions.Indeed, let us denote Then if there exists k such that ε k = 0, then there exists a distinguisher on the function f for this weight.The amount of data needed to detect the bias ε k is equal to 1 ε k 2 ; If we consider all entries of f , we scale the probability of having a word of weight k, so the amount of data needed for our distinguisher to detect a bias is therefore: As FLIP cipher always applies the filtering function on entries of constant Hamming weight k, there is no need to scale the probability; k being fixed to N 2 , we care about ε N/2 and balancedness over E N,N/2 .Balancedness for all weight is still important in this setting; a guess and determine technique consists in fixing some entries, modifying the weight.Therefore we study the balancedness criterion for all Hamming weight of the FLIP function instances.
The number of variables in the functions of the FLIP instances is never a power of 2.Moreover, the filtering function is defined with a direct sum, and has a small degree compared to the number of variables.In this sense, there is no way that the filtering function of FLIP could be weightwise perfectly balanced neither weightwise almost perfectly balanced.However, we calculate the bias for each weight on toy versions of FLIP, and it appears that for all not extreme k (k close to 0 or N ) the Boolean function is not balanced.Nevertheless the calculated bias are totally not exploitable to distinguish the output from another random sequence, regarding that we cannot have more than 2 80 or 2 128 bits of keystream.
Using the direct sum structure of the FLIP functions into the proof of Lemma 1, we can exactly compute the values |{x | f (x) = 1, w H (x) = k}| for all k for the four filtering functions and the bias from a random sequence.In table 2 we summarize our computation results, providing for which weights k the bias is inferior to 2 − λ 2 .As the impact of guess and determine attack on the balancedness on restricted inputs is not known, for the 4 instances we study this criterion for three particular guesses.The first guess consists in canceling (forcing to be 0) λ/2 linear variables, the second one on λ/2 variables of the quadratic part and the third one on λ/2 variables of the highest degree monomials.These three strategies represent the best deterioration that an attacker can obtain on one part of a FLIP filtering function.We assume that if none of these strategy reveal a sufficient bias for a concrete cryptanalysis then no hybrid approach will lead to an efficient attack.
Interpreting the results of table 2, as k = N 2 is in the ranges of the v 0 column for the 4 instances of f , we conclude that we can not apply our distinguisher based on the balancedness criterion as it will require more than 2 λ keystream bits.Even considering a combination with guess and determine attack, the biases on the simpler functions obtained are insufficient to mount a concrete attack.

Fast correlation attack:
In our model we target only the function f generating the keystream, in this sense correlation attacks cannot work because there is no way we can do a divide-and-conquer technique as in Siegenthaler's attack [Sie84] in this case only fast correlation attack [MS88a] could have smaller complexity than an exhaustive search in our Table 2: Balancedness (with constant weight inputs) bias on FLIP filtering function instances, and modified instances.N is the number of variables of the instance, is the number of variables guessed to be 0. v i stands for the range of weights with bias < 2 − for the various strategies of guesses i and N i the number of variables of the resulting function, with v 0 the attack without guesses of the N variable function.model.The attacker first computes the linear approximations l k of f k where NL E N,k (f ) is small.Approximating the keystream equations by their linear approximation, she builds a linear system and relies it to a decoding problem.When no particular code structure is used, this system can be seen by the attacker as an instance of the Learning Parity with Noise problem, where the noise parameter is . Standard algorithms can be used to solve this instance, as BKW [BKW03] or LF [LF06] algorithms giving an attack with data complexity of O(2 h ε −2(x+1) k ) where the parameters h and x depend on the algorithm used and the number of variables used in f .
The high number of variables of FLIP instances makes impossible to compute exactly the nonlinearity on constant weight inputs.Nevertheless using the lower bound of the NL E N,k of a direct sum (see Section 3.2.4)we can give a lower bound of the nonlinearity on constant weight inputs for the 4 instances.To do so we recursively compute the NL E N,k for a direct sum with lower bounds for NL En,i (g) and exact value of NL E m,k−i (h).In table 3 we summarize the results, the exact NL E n,k values are computed on functions with n ≤ 22 for the quadratic functions (Dickson functions), n ≤ 15 for the triangular functions and n ≤ 17 for sums of two monomials functions.Note that the NL E n,k of a function in n variables consisting in only one degree n monomial is null for all k.The results are obtained by first combining the quadratic functions, then the triangular part and finally the linear (and higher than 9 degree) part.
The results in table 3 show that for k = N 2 , NL E N,k (f ) is higher enough to ensure that considering the system of equations described in the attack scenario 2, the LPN system will be unsolvable with data complexity inferior to 2 λ as the error is considered as coming from a Bernouilli distribution with mean 0.499 ≤ p ≤ 0.5.Even with a guess and determine attack (with simulated results form the right part of the table), The NL E N,k of the various functions and the number of variables are too big to lead to a concrete cryptanalysis with an attack scenario based on NL E N,k .

AI k of FLIP and algebraic attack
Algebraic attack: Assuming that an attacker does not want to distinguish the keystream but instead to recover some internal state of the cipher, it could improve the so-called algebraic attacks.Algebraic attacks (and fast algebraic attacks [Cou03]) were first introduced by Courtois and Meier in [CM03] and applied to the stream cipher Toyocrypt.Their main idea is to build an over-defined system of equations with the initial state of the stream cipher as unknown, and to solve this system with Gaussian elimination.More precisely, by using a nonzero function g such that both g and h = gf have low algebraic degree, an adversary is able to obtain T equations with monomials of degree at most AI(f ).Function g can be taken equal to an annihilator of f or of f ⊕ 1, i.e. such that gf = 0 or g(f ⊕ 1) = 0.After a linearization step, the adversary obtains a system of T equations in D = AI(f ) i=0 N i variables.Therefore, the time complexity of the algebraic attack is O(D ω ), that is, O(n ωAI(f ) ) where ω is the exponent in the complexity of solving a linear system (ω = 3 for the naive approach, we consider ω = log 7 for a realistic attack).
The fast version consists in finding a function g with low degree and a function h with degree slightly higher than AI(f ) which are solutions of the equation h = gF , providing an easier algebraic system to solve.In our context of fixed Hamming weight, the data complexity will drop to D = AI k (f ) i=0 N i , the number of independent equation needed to mount a solvable algebraic system.
For the FLIP family of stream cipher, the filtering function is defined by a direct sum but we cannot conclude on the exact algebraic immunity of FLIP instances (regarding the restricted input) with the corresponding bound (given in Section 3.3.3).However, we have shown with Theorem 1 (in Section 2.3) that defining a Boolean function with a direct sum can lower the algebraic immunity regarding the degree when the input has a fixed Hamming weight.Therefore Theorem 1 can be useful to find a lower bound on the algebraic immunity of FLIP.To apply it, we define each filtering function of the four instances of FLIP using the following form: where f has n variables and a high algebraic immunity and g has m variables with the smallest degree as possible.The inputs in FLIP are the words of Hamming weight N 2 where N = n + m.Then to apply Theorem 1, n and m have to satisfy the condition n ≤ N 2 ≤ m.To garantee that g has the smallest degree possible (to get the bound of Theorem 1 as meaningful as possible) we set the function g to be the first sum of all monomials of degree lesser or equal to k which has more than N 2 variables.Then we need to know the algebraic immunity of the other function f ; due to the particular shape of f , we need more results on the algebraic immunity of functions with few monomials.We begin with a property linking the algebraic immunity of two functions.
a Boolean function in n variables such that there exists two variables (x 1 and x 2 without loss of generality) satisfying: Proof: Formally we prove that if there exists a non null function g in n variables of degree ≤ d such that f g = 0 (respectively (f + 1)g = 0) over all F n 2 then there exists We conclude that for all f with this property relatively to two variables there exists a function G with the described property, and therefore AI(f ) ≤ d then AI(F ) ≤ d.
The contraposition of this Proposition gives that for such functions AI(F ) > d implies AI(f ) > d, therefore, from the lower bound on the algebraic immunity of a function F we can derive a lower bound on the algebraic immunity of a function f in more variables.
We use this Proposition on the specific functions we obtain by partitioning in two the instances of FLIP; we want to determine the algebraic immunity of a function being the direct sum of high degree monomials.Proof: First, property 1 implies that AI(f ) ≤ d; each one of the d monomials can be annihilated by the degree one function 1−x i with x i one variable of the monomial; therefore the product, a function of degree d annihilates f .We then prove that AI(f ) ≥ d using Proposition 11.The first property guarantees that each variable is used only once, therefore for two variables in the same monomial the contribution to f is the same.Indeed, without loss of generality calling x 1 and x 2 these two variables, parts of a monomial of degree , and reordering the variables we can write f as: where h is a direct sum of d − 1 monomials, in n − variables.Denoting x 3 , • • • , x , x +1 , • • • , x n by x we can then verify that f (0, 0, x) = f (1, 0, x) = f (0, 1, x).It implies that for each couple of variables in the same monomial we can apply Proposition 11.Therefore, we can reduce the number of variable of f , one by one, contracting a product of two variables in a new one at each step, keeping the property that f and the newly obtained function F share the same AI.
The second property guarantees that f can lead to a triangular function of degree d, as all monomials of degree ≥ k are already in f , there are d − k other monomials, all of degree ≥ k.These monomials are recursively taken and contracted to be the monomials of degree k − 1 to 1 of the triangular, by applying several time Proposition 11.
As the triangular function of degree d has an algebraic immunity of d we get AI(f ) = d The partitioning of the instances in a direct sum of high degree monomials and a low degree function, makes the first part to be a direct sum of degree d with more than d monomials of high degree.The algebraic immunity of a direct sum of two functions being at least as high as the highest one we can apply Proposition 12 on the d monomials of highest degree of this function and we conclude that the algebraic immunity of the direct sum is d, its degree.
To sum up; for the 4 instances we can write the function as the direct sum of f and g minimizing the degree of g.We can determine exactly the algebraic immunity of f and apply the lower bound of Theorem 1 for the AI k of the 4 instances.
With the lower bounds on the four instances of FLIP, we then can calculate a lower bound on the complexity of an algebraic attack on FLIP which is . The results and the bounds giving us the complexity are shown in table 4.
Table 4: Lower bounds on AI k of FLIP instances and complexity of the algebraic attack, N refers to the number of variables, AI(f ) is equal to the degree of f , n is the number of variables in f , and the lower bound is given with theorem 1. f is the Boolean function defined by the direct sum of all the monomials of degree strictly greater than deg(g) and g takes all monomials of degree less or equal to deg(g), D refers to the number of variables after linearization and D log(7) to the corresponding attack complexity.It is important to notice that the lower bounds here could be not tight and the algebraic immunity (on constant Hamming weight inputs) of FLIP instances could be as great as AI(F ).Then it remains to prove the exact AI n 2 of FLIP instances, which we could not determine by computation due to the high number of variables of these functions.

Instance
Moreover, it remains not clear whether or not a guess and determine attack targeting the algebraic immunity could apply.Considering all possible guesses leads to functions where the lower bound decreases enough to contemplate an attack, but different aspects impeach us to exhibit an attack.Firstly, the guessing technique conducts to cancel or diminish the degree of some monomials; when a monomial is canceled some variables unguessed inside could be ones, and therefore the considered function is evaluated on an input where we cannot know exactly the Hamming weight.Secondly, the probability of obtaining a targeted weight on a targeted simpler function enough times (i.e.disposing of enough keystream bits with a coherent set of permutations) fastly decreases.Finally, the time complexity of the attack exhibited with the lower bound (Table 4) is out of reach when no guesses are made; computational trials make us believe that the additional complexity cost of fixing variables of the filtering function (additional cost of 2 ) compensate enough the potential decrease of AI k of the weaker function obtained.
An interesting point is that the high number of triangular functions used in FLIP were used to prevent the guess and determine attack combined with fast algebraic attack.Nevertheless, it appears here that if the number of triangular functions is smaller, then the lower bound on the algebraic immunity is increased.Supposing that we cannot compute the exact AI k of FLIP filtering functions, there is then for now a compromise to find on the number of triangular functions: if there are few triangular functions, then the guess and determine attack consisting in canceling monomials with high degree is more efficient, but if there are too many triangular functions, then the lower bound on the algebraic immunity of FLIP is decreasing.Consequently, this attack cannot serve to determine the optimal number and size of triangular functions.

Conclusions and cautionary note
As a preliminary conclusion, the lower bounds exhibited for balancedness, nonlinearity and algebraic immunity do not reveal any concrete attack on the four instances of the FLIP family of stream ciphers.In this section the security analysis focusing on the Boolean function is made in the right model (i.e.we have taken into account that the entries of the filtering function only reach E n, n 2 and not F n 2 ).On one side, we get only lower bounds on the two criteria of nonlinearity and algebraic immunity, leading to lower bounds on the attack complexity rather than practical bounds.On the other side, if the bounds we proved for AI k are tight, we could not infirm that a deviation of a guess and determine attack on FLIP could be efficient in that way.
which are of degree less than d and form also the canonical basis of the Boolean functions defined over E n,n−k .Then, as we found a recurring relation between dim( (ψ n,k,d )), dim( (ψ n−1,k−1,d−1 )) and dim( (ψ n−1,k,d )),at some point there will be k = d or n − k = d.Then the initialization step (d = k or d = n − k or d = 0) of the recurring relation is true.Then we deduce by induction that dim( (ψ n,k,d )) = n min(d, k, n − k) From Corollary 7 and Theorem 4, we deduce: Corollary 9. Let k be any positive integer such that k ≤ n/2.The algebraic immunity of the restriction of F to E n,k is bounded above by min e;

Proposition 12 .
Let f (x 1 , • • • , x n ) be a Boolean function in n variables satisfying the two following properties: 1. f is a direct sum of d monomials, 2. ∀i ∈ [k, d], f has a monomial of degree i;where k is the smallest degree over all monomials of f .Then AI(f ) = d.
9. Let E be any non-empty subset of F n 2 and f any Boolean function defined over E. Let d and e be two non-negative integers.Let M d,E be the ( Corollary 5. Let E be any non-empty subset of F n 2 and f any Boolean function defined over E. Let n and d be such that rank(M d,E ) = |E|, then there exists a Boolean function over F n 2 of algebraic degree at most d which coincides with f on E.
H (u) ≤ d, and at column indexed by x ∈ E equals x u := n i=1 x ui i .Assume that the ranks of matrices M d,E and M e,E are such that rank(M d,E ) + rank(M e,E ) > |E|, then there exists a Boolean function g of algebraic degree at most e over F n 2 , whose restriction to E is not identically null, and a Boolean function h of algebraic degree at most d on F n 2 , such that functions gf and h coincide on E. Proof: Let F d (resp.F e ) be a maximum size free family of restrictions to E of monomials x u of algebraic degree w H (u) at most d (resp.atmoste).By definition of the rank of a matrix, the size of F d equals rank(M d,E ) and that of F e equals rank(M e,E ).Let us consider now the family, that we shall denote by F e f , whose elements (with possible repetitions) are the products of the elements of F e by the function f .By hypothesis, |F d | + |F e f | is strictly larger than the dimension of the F 2 -vectorspace of all Boolean functions over E, that is, |E| (indeed, the number of Boolean functions over E equals 2 |E|).There exists then a linear combination of the elements of F d and of those of F e f , which equals the zero function and whose coefficients are not all null.Gathering the part of this linear combination dealing with the elements of F d and those dealing with F e f , we obtain respectively functions h and g such that h and gf coincide over E, and the restrictions of g and h to E are not both null (since both families F e and F d are free), that is, the restriction of g to E is nonzero.Taking e = 0 in Proposition 9, we have rank(M e,E ) = 1, since constant function 1 does not vanish over E, and: this explains our definition of restricted algebraic immunity (see Definition 4 in Section 1.2.3) and leads to the following property: The algebraic immunity of any Boolean function over E is bounded above by min{e; rank(M e,E ) > |E| 2 }.

Table 1 :
FLIP filtering function instances, N is the total number of variables, n 1 is the number of variables over the linear part, n 2 is the number of variables over the quadratic part, nb is the number of triangular functions, h is the degree of the triangular functions and λ is the security parameter.

Table 3 :
Lower bound on NL E N,k of FLIP instances, N refers to the number of variables, v 0 the range of weight k for which quadratic part by the guess strategy, N 2 and v 2 are the corresponding number of variables and range of weights.