Power noise filtration in DREM

The problem of estimation in the linear regression model is studied under the hypothesis that the noise is sufficiently small comparing to the regressor. Then the estimation solution is searched for a new regression containing the powers of the unknown parameters and disturbance, where the influence of the latter is attenuated. It is shown that power transformation of a regressor can preserve the excitation under mild assumptions. Possibilities of evaluation of powers of parameters are investigated using the dynamic regression extension and mixing (DREM) method. The performance of the estimators is demonstrated in numerical experiments.


I. INTRODUCTION
We consider the problem of parameter estimation in a noisy linear regression: where ỹ(t) ∈ R is the measured output, φ(t) ∈ R p is the measured vector regressor, θ ∈ R p is unknown constant vector of parameters to be estimated, and d(t) ∈ R is a bounded measurement noise.There are many methods solving this standard estimation problem, see [1]- [5] (to mention a few handbooks), providing different performances of evaluation of the value of θ, and being crucially dependent on the excitation properties of the regressor.Interval excitation is sufficient for identification of θ in the noise-free setting, while in the presence of disturbance d, the persistence excitation requirement is commonly imposed [6], [7], which roughly speaking implies that collecting the measurements and continuing the observation it is possible to improve the estimation precision.The latter condition can be restrictive, and there are many works oriented on its relaxation [8]- [13].
Among the quality requirements for identification algorithms, it is worth to highlight the time of convergence of the estimate to its ideal value, and the estimation error robustness with respect to the perturbation d.Dealing with the latter issues, different filtering algorithms, often linear, are applied to both sides of (1), in order to improve the estimation accuracy [14]- [16].Usually, this requires the information about the frequency spectrum of the d or its stochastic characteristics, and ideally the spectrum separation (independence) between the regressor φ and the perturbation d is used.Meeting these constraints on d can be rather difficult in applications, hence, development of any other filtering approach is of great importance.
In this work we propose a nonlinear filtering framework that is based on power operations and can be applied under smallness of the amplitude of noise in comparison with the excitation of the regressor, but it does not use any frequency or stochastic properties of d.The noise reduction is obtained at the price of augmentation of the size of the vector of unknown parameters and, hence, extension of the regressor.Under mild assumptions, we will show that the excitation characteristics of φ in (1) can be preserved for the new regressor, while application of the dynamic regression extension and mixing (DREM) method [9] allow us to avoid the issues related with increased dimension of the vector of unknown parameters.The DREM method by itself has good robustness abilities providing the estimates in a fixed time [17], and supporting an independent estimation of any component of the vector of unknown parameters (it is the property we will use).
The outline of this paper is as follows.Technical formulation of the considered problem is given in Section II.Two approaches to the power transformation of the linear regression (which provides a nonlinear filtration) are introduced in Section III.The estimation problem in the transformed regression is solved via the DREM method applying delay based extension in Section IV.Illustrative simulation results are shown in Section V.

Notation
• Let R and N denote the real and natural numbers, For convenience, we also denote ρ 0 = ρ n = 1.

II. PROBLEM STATEMENT
Consider the parameter estimation problem in a scalar noisy linear regression: where y(t) ∈ R is the measured output, ϕ(t) ∈ R is the measured regressor signal, θ ∈ R is unknown constant parameter that we would like to find, and Remark 1: The scalar case is considered without loosing generality since it can be always obtained from (1) using DREM approach [9].
As it has been shown in a recent work [17], if regressor in (2) is persistently excited (PE) [1], [7], DREM method by itself is an identification approach providing a simple solution to the problem, and the main technical difficulty next is to apply a noise filter to the obtained estimate.It was observed in the same work that nonlinear filtering algorithms demonstrate a remarkable performance comparing with conventional low-pass filtering.In the vein with this result, we would like to propose an approach in the present note for introduction of nonlinear filtration in (2), with posterior recalling the DREM procedures to estimate the obtained extended vector of parameters containing the powers of θ.
To this end we consider a particular scenario when • the regressor ϕ is PE; • the noise is sufficiently small, i.e., ∥d∥ ∞ ≪ 1. Formally the latter condition can be always provided by scaling the available for measurements output y(t) and regressor ϕ(t).
Moreover, as it is shown in [18], if the scalar model ( 2) is generated from (1) using the DREM procedure with a proper choice of a dynamic extension method, then φ being PE implies that ϕ is strictly separated from zero after an initial information accumulation interval, i.e., we may assume ϕ(t) ≥ µ > 0 for all t ≥ t 0 for some t 0 ∈ R + .

III. EXTENSION TO POWER LINEAR REGRESSION
Our main idea is based on the following simple observation.Since the noise is small, ∥d∥ ∞ ≪ 1, if a suitable transformation of (2) yields a new linear regression with the d n (t) term for some n ∈ N with n > 1, then the noise influence will be attenuated.There are several approaches to obtain such a new linear regression and a price to pay for that, which we present and discuss below.

A. Power method
One straightforward way is to take the nth power of both sides in (2): ( which results in the new linear regression where and the coefficients of φ(t) ∈ R n are defined respectively as functions of powers of ϕ(t), y(t) and their products: where ρ i for i = 1, . . ., n − 1 are the binomial coefficients, and ρ 0 = 1.
Note that all elements of φ(t) have a common multiplier ϕ(t) by construction.Under certain conditions, this new extended regressor inherits PE of ϕ(t).Indeed, the following simple result can be derived in discrete time: and there exist L ∈ N and κ > 0 such that then the regressor φ is PE.
The requirement ( 6) implies PE of ϕ, but the converse is obviously not true.However, it was proven in [18], [19] that if (1) has a PE vector regressor, then there is a choice of filters in DREM method providing this separation property for ϕ.Note also that for any PE signal ϕ, ϕ(t) cannot be identically zero for all t ∈ N * , therefore, by skipping the time instants t ′ ∈ N * where ϕ(t ′ ) = 0 (i.e., y(t ′ ) = d(t ′ )), we can construct a subsequence of time instants where separation of ϕ from zero is satisfied.Hence, the property (6) can be assumed for an excited (2) without much loosing generality (we just exclude the case µ = 0 from consideration).Then the following simple and natural estimate can be derived in the non-transformed regression (2) [17]: and where δ(t) = d(t) ϕ(t) is the related estimation error.We will use these quantities later for comparison.
Proof: [Proof of Lemma 1] By definition, the signal φ is PE if there are L > 0 and ϱ > 0 such that for all t ∈ N * .The direct computations show: 2 the claim follows from conditions of the proposition: The restriction given on the signal ϕ(t)d(t) in this lemma can be verified if the variables ϕ and d are independent and d is zero-mean, then the PE property may be preserved in then the regressor φ is PE.
Proof: For n = 3, repeating the same computations we which implies the result.
For n = 3 the restriction on ϕ and d has a similar spirit, saying that if excitation of the signal ϕ(t)d(t) is sufficiently rich, then PE is kept in (3).The results of these lemmas can be generalized to any n ∈ N: Theorem 1: Let (6) be true and there exist L ∈ N and κ > 0 such that there are i 1 , . . ., i n ∈ {t, . . ., t and then the regressor φ is PE.The proof of the theorem is omitted due to the lack of space.Note that the same conclusions can be obtained for interval excitation.
Remark 2: It is straightforward to see that Lemma 1 and Lemma 2 define equivalent conditions to Theorem 1. E.g., for n = 2 and L = 2, the condition of Lemma 1 reads and the condition of Theorem 1 for n = 2 and L = 2 reads for some κ 1 , κ 2 .Obviously, if one of these conditions is satisfied, then there exist such a pair κ 1 , κ 2 that the second condition is satisfied as well.

B. Approximation of exponent
Another approach deals with exponentiation of y(t) with posterior approximation of the exponent by power series.
1) Continuous time: Denote ξ(t) = e y(t) = e ϕ(t)θ e d(t) , then using representation of the exponent function by an infinite power series of the argument, the expression for this new variable can be rewritten as follows: where N ∈ N defines the order of truncation of the series and ϵ(t) is an approximation error, which should be of order d N +1 (t) and sufficiently small for ∥d∥ ∞ ≪ 1. Taking derivative of ξ(t) or applying any linear filter we get: where contains all terms proportional to uncertain variables ϵ(t), ε(t) and ḋ(t) that we assume to be negligible.Finally, using ξ(t) = ẏ(t)ξ(t), substituting d(t) = y(t) − ϕ(t)θ from (2) in the left-hand side of (10) and opening the brackets, we obtain (3) for n = N + 1, the respective Θ, and for ψ(t) and φ(t) dependent on y(t), ϕ(t), ẏ(t), and φ(t).Remark 3: To implement the proposed continuous-time approximation, the derivatives of y and ϕ should be available.This shortcoming can be alleviated, e.g., applying an LTI filter to the signals y and ϕ of the original scalar equation ( 2) and using the filtered signals and their derivatives (available as internal filters' variables).
2) Discrete time: In such a case differentiation is not possible, then we can use the series directly: where N 1 , N 2 ∈ N are the orders of approximation of related exponents by finite-length power series and ϵ 1 (t), ϵ 2 (t) ∈ R are corresponding approximation errors.A straightforward manipulation gives: where is the new noise, which we assume being small since ϵ 1 (t) and ϵ 2 (t) are so.Again substituting Let us now consider how to estimate the new vector of unknown parameters in (3), and to compare the obtained results with the conventional solution (7).

IV. ESTIMATION IN (3)
As we can see, the linear regression (3) can be derived using different approaches: exact with v(t) = d n (t) or with some additional approximation items in v(t) (in the latter case additional hypotheses are needed that the neglected terms are sufficiently small), where the disturbance v(t) is smaller than in (2) but the vector of unknown parameters Θ has n elements containing different powers θ i for i = 1, . . ., n of the unknown scalar parameter of (2).Despite, as it was demonstrated above, the excitation of φ(t) can be preserved in this extended vector regression under reasonable assumptions, it can be weak, and since it depends on the properties of d(t), it is difficult to predict it (inversely, a proper noise d(t) may also enforce the excitation of the regressor).However, developing estimation algorithms for (3) we can utilize the fact that all elements of Θ are interrelated.
For example, DREM approach can be applied to (3) allowing each component of Θ to be estimated separately, then the noise d(t) will be filtered several times: first, by taking the power d n (t) of it, second, in the filter of the extension part in DREM method.
Moreover, such a nonlinear filtering appears even if delays are used instead of conventional convolution filters (which seem to be not yielding a filtering ability).And this property we will demonstrate next by considering (3) obtained by the power method.
Applying the delays in the DREM extension step we get an extended version of (3): where , where ψ(t) ∈ R and v(t) ∈ R are defined in (4), φ(t) ∈ R n is defined in (5), and τ ∈ R + (in continuous time or τ ∈ N in the discrete-time case) is the elementary delay.If the new extended regressor Φ is nonsingular, then the direct estimates of Θ from ( 11) can be obtained: where is the implementable estimate that can be derived from (11) and V(t) = Φ −1 (t)V (t) is the new noise proportional to d n .
The following proposition shows that the first element of Θ corresponding to θ n will be estimated as the geometric mean of the scalar evaluations θ(t) from ( 7) defined for nonzero regressor ϕ.
Proposition 1: Consider the linear regression (2) where ϕ satisfies (6) and the related extended linear regression (11) for some n ≥ 2 with a non-singular matrix Φ(t).Then for the first component of Θ in (12) it holds where θ are defined by (7).
The proof of Proposition is omitted due to the lack of space.
Note that the last element of Θ (that corresponds to θ) receives an estimate as the arithmetic mean of the same quantities: The intermediate elements of Θ will be derived as respective products of θ(t − iτ ) of given orders, i.e., for θ 2 it will be an averaged sum of θ(t − iτ ) θ(t − kτ ) with i ̸ = k = 0, . . ., n − 1.The elements of the noise vector V(t) will have corresponding dependence on δ(t) = d(t) ϕ(t) , i.e., for the last element Clearly, the geometric or arithmetic mean estimates, as well as others, can be derived directly, but it is interesting that they also can be obtained as particular solutions (for a special choice of delays as extension filters) of DREM method applied to (3).Roughly speaking, it demonstrates what kind on nonlinear filtering can be derived by the proposed approach.Other filtering schemes can also be obtained, whose investigation is a direction of future research.

V. EXAMPLE
Consider the linear regression (2), where ϕ(t) = µ + a (sin(ωt)) 2 for some known positive constants µ, a, and ω.This scalar regression is PE and satisfies (6).The noise d is uniformly distributed in the interval from − d to d for some positive constant d.
The transients are shown in Figure 1.The direct estimate θdir provides an estimate after the first step; however, it has the worst noise sensitivity.The delay-based estimate converges after nτ = 0.3 seconds and show good noise attenuation; as shown in Proposition 1, such an estimate corresponds to the (nonlinear) geometric mean filtration.Finally, the DREM-based estimate has the slowest convergence and the best noise attenuation; note that the dynamics extension behaves as a low-pass filter.
For numerical comparison of the estimates, the mean absolute error (MAE) and the mean square error (MSE) are  As we increase the order n, the DREM procedure requires for extra tuning of α and β; otherwise, the value of ∆ remains small.In contrast, the delay-based estimation does not require any extra tuning as the order n increases.

VI. CONCLUSION
The problem of noise attenuation in the linear regression estimation model was studied.Avoiding the restrictions on stochastic of frequency properties of the measurement perturbations, it is only assumed that the noise is sufficiently small comparing to the regressor (the regressor is sufficiently excited in relation with the noise amplitude).Two methods of nonlinear transformation of the linear regression were proposed.The obtained transformed linear regression has an augmented regressor and the vector of unknown parameters containing the powers of θ and disturbance, where the influence of the latter is attenuated.It was proven that the presented power transformation of a regressor keeps the excitation under mild restrictions.The DREM method was applied to estimate the unknown parameters.The performance of the estimators was demonstrated and compared in numerical experiments.

2 :
(3): in such a case lim L→+∞ t+L k=t ϕ(k)d(k) = 0 as the mean value of an unbiased signal and t+L k=t ϕ 2 (k) t+L k=t d 2 (k) is positive ensuring existence of κ.Lemma Let n = 3, (6) be true and there exist L ∈ N and κ > 0 such that
→ R m define the norm ∥d∥ ∞ = sup t∈N * |d(t)|, and the space of d with ∥d∥ ∞ < +∞ we further denote as L m ∞ .• The identity matrix of dimension n × n is denoted by I n .• The binomial coefficients ρ i for i = 1, . . ., n − 1 are the coefficients of the polynomial (a + b) n • Let |x| be the Euclidean norm for a vector x ∈ R n or the absolute value of a real number x ∈ R. • For a Lebesgue measurable function of time d : R + → R m define the norm ∥d∥ ∞ = ess sup t∈R+ |d(t)|, and the space of d with ∥d∥ ∞ < +∞ we further denote as L m ∞ .• For a sequence of vectors d : N *