Non‐parametric measure approximations for constrained multi‐objective optimisation under uncertainty

In this article, we propose non‐parametric estimations of robustness and reliability measures approximation error, employed in the context of constrained multi‐objective optimisation under uncertainty (OUU). These approximations with tunable accuracy permit to capture the Pareto front in a parsimonious way, and can be exploited within an adaptive refinement strategy. First, we illustrate an efficient approach for obtaining joint representations of the robustness and reliability measures, allowing sharper discrimination of Pareto‐optimal designs. A specific surrogate model of these objectives and constraints is then proposed to accelerate the optimisation process. Secondly, we propose an adaptive refinement strategy, using these tunable accuracy approximations to drive the computational effort towards the computation of the optimal area. To this extent, an adapted Pareto dominance rule and Pareto optimal probability computation are formulated. The performance of the proposed strategy is assessed on several analytical test‐cases against classical approaches. We also illustrate the method on an engineering application, performing shape OUU of an Organic Rankine Cycle turbine.


INTRODUCTION
There has been a growing interest in optimisation with uncertainty in recent years. 1 We focus here specifically on cost-efficient and derivative-free strategies for tackling multiple objectives and constraints in this context.Optimisation under uncertainty (OUU) can be roughly classified under two categories, reliability-based design optimisation (RBDO), which deals with probabilistic and worst-case feasibility constraints, and Robust Design Optimisation (RDO), where the deterministic objectives are replaced with averaged or worst-case ones, possibly in a multi-objective context (i.e., Taguchi optimisation).
In the context of multi-objective optimisation, 3,4 a popular derivative-free optimisation strategy for constrained multi-objective problems are evolutionary algorithms (EA).A comparison of multi-objective EA strategies is proposed in, 5 in the bi-or tri-objective setting.In this work, we focus specifically on performing optimisation in the presence of uncertainties.Such a setting has received significant attention from the optimisation community, as can be seen in several reviews. 6,7Notably, Jin 6 proposed four types of uncertainties: (i) noise, which is an intrinsic bias on the evaluation of the fitness function, coming, for example, from imperfect sensors; (ii) robustness, which refers to specific parameters that may vary from their nominal value between the optimisation procedure and the real-world application; (iii) fitness approximation, when the true fitness value is not computable or very expensive, and less accurate but cheaper representation is used instead; (iv) time-varying fitness function, meaning that the optimum changes through time and that the optimisation algorithm should tackle this variability.
Much work has been devoted to tackling noise within the optimisation process.This field is usually referred to as noisy optimisation (NO).In this context, some parsimonious techniques rely on the assumption that there exist either different levels of accuracy for computing the Quantities of Interest.Namely, multi-level techniques 8,9 usually involve several mesh refinements for numerical simulation, and multi-fidelity approaches [10][11][12] use different modelling strategies.Recent works have also proposed adaptive approaches for tackling problems where accuracy can be tuned online.This approach is particularly adapted when the fitness function is the limit of intermediate approximations of increasing fidelities, such as iterative solvers or Monte Carlo estimations.This concept has been proposed in Reference 13 with online computation time allocation in a Bayesian optimisation (BO) context.In this work, a similar online refinement strategy is employed to achieve high parsimony.
In a multi-objective setting, the presence of noise and cohabitation of different approximation accuracies raises the need for extending the classical Pareto dominance to an uncertain environment.The EA community has proposed techniques for problems featuring uniform 14 or Gaussian noises. 15Interval and worst-case approaches are tackled in References 16,17 within the classical NSGA-II and MOEA frameworks.Recent developments 18,19 extends these computations to non-parametric noise distributions through the construction of histogram-based approximate densities, which permits fast computations of the probability of dominance.A straightforward extension of the Pareto dominance rule to interval uncertainty in the presence of constraints is proposed in Reference 20 under the name of Bounding-Boxes.This extension has been exploited in several papers 21,22 within an adaptive strategy for performing multi-objective optimisation on objective functions computed with tunable accuracy.
The concept of multi-objective optimisation in the presence of noise/uncertainty has been widely studied in a BO context, tackling mean and variance measures, 23 high-dimensional inputs 24 and outputs, 25 min/max formulations, 26 failure probability constraints, 27 Bayes risk, 28 CVaR measures, 29 large dataset setting. 30n particular, in Reference 23, they proposed a spectral representation of robustness measures based on the GP's spectral representation, treating mainly mean and variance measures and focusing on input uncertainties, where design and uncertain variables are identical.In Reference 26, the authors show that standard GP optimisation algorithms do not exhibit the desired robustness properties and provide a novel confidence-bound-based algorithm in this perspective.Azzimonti et al. 31 dealt with the problem of estimating and quantifying uncertainties on the excursion set of a function under a limited evaluation budget under a Bayesian approach.In Reference 28, they proposed a robust Gaussian Process model to infer the Bayes risk criterion to quantify robustness, and they developed a two-stage BO process to search for a robust Pareto frontier, again in the context of input uncertainties.In Reference 29, two BO algorithms with theoretical performance guarantees are presented to maximise the conditional value-at-risk (CVaR) of a black-box function.All these works are specific to a BO context.
In this paper, the core idea of the approach is to tune the accuracy of statistical objectives and constraints approximations to focus computational power on the most promising designs.This approach to performing cost-efficient OUU with some of the most used robustness and reliability measures is agnostic with respect to the optimiser to be used, in contrast to the previous state-of-the-art.It can be used with any classical technique for multi-objective problems, such as BO and genetic algorithms (explicitly used in this article).
The first contribution is the computation of a non-parametric representation of the robustness and reliability measures error, permitting to estimate the Pareto front in a parsimonious way.Specifically, we aim to avoid the restriction to independent uniform (used in Reference 22) or Gaussian [32][33][34][35] distributions, which often rely on the so-called projected process technique.We propose to use empirical sampling-based distribution approximations that should allow for more representative error estimations given the current information and hopefully quicker convergence.An original approach is introduced for constructing these measure samples, which relies on a coupled space Gaussian Process (GP) surrogate model for drawing joint realisations of objectives and constraints over the whole design space.These approximated measures are treated as a random vector and are compared and ranked using the concept of probabilistic Pareto dominance. 18,36,37The approach proposed in this article for drawing joint realisation is to be contrasted with other works concerning the construction of Sparse GP. [38][39][40] This article aims to employ sparse approximations specifically for estimating model error estimation of statistical measures from an OUU perspective.What is proposed permits a fast evaluation, and a simple quality indicator is used for judging the approximation quality.
The second main contribution of this work is an adaptive refinement strategy, where the accuracy is improved only on the most promising individuals.We assess the proposed approach on several (mono and bi-objective) optimisation problems to estimate its computational cost and accuracy regarding the average distance to the exact Pareto front.Finally, we tackle the problem of an Organic Rankine turbine shape optimisation with random operating conditions.Also, the proposed framework heavily relies on the work in Reference 22 It differs in most of its specific methodologies.Non-parametric measure approximations, the associated specific surrogate-assisting (SA) models, and probabilistic Pareto dominance rules are new in this work and heavily improve its applicative performance.
Section 2 illustrates the formulation of multi-objective optimisation problems under uncertainties treated here within a probabilistic setting.In Section 3, we introduce the numerical ingredients permitting to compute non-parametric measure approximations.Then, Section 4 illustrates the proposed general framework to solve the OUU problem.In Section 5, we analyse the performance of the proposed approach on some algebraic test cases by comparing it with known techniques in the literature.Finally, Section 6 is devoted to showing the applicability of the framework to complex engineering test cases with the shape optimisation of Organic Rankine cycles turbines.

UNCERTAINTY-BASED OPTIMISATION PROBLEM
In this section, the context of this work is made explicit.The general formulation of the tackled uncertainty-based optimisation problem is given, and classical choices in terms of robustness and reliability measures are introduced.The proposed probabilistic setting is then given, which allows for comparing aleatory values in a constrained multi-objective context.

Optimisation
A deterministic constrained multi-objective optimisation problem can be written as follows: minimise: f(x), satisfying: g(x) ≤ 0, by changing: with objectives f(x) ∈ R m 1 and constraints g(x) ∈ R m 2 , and where the design vector x lives in  ⊂ R n .Note that g(x) ≤ 0 stands for ∀i, g i (x) ≤ 0. The computation of f and g can also depend on a multitude of uncontrollable parameters (e.g.environmental, material, geometrical, … ), here denoted by  ∈ Ξ.
The most straightforward setting is to consider these uncertain parameters as aleatoric.This work is limited to this scope and does not deal with epistemic uncertainties.In this context, , f(x, ) and g(x, ) are random variables and Equation ( 1) is not suitable anymore.
It is classical to reformulate this problem into a measure-based OUU problem, where specific statistics of the random outputs f(x, ) and g(x, ) are plugged as objective and constraint functions.

Robustness and reliability measures
Writing  f and  g some statistical measures on the objective function f and constraint g, the OUU problem can be formulated as follows: minimise:  f (x), As raised in the introduction, using statistical measures as objectives is usually referred to as robust design optimisation (RDO), and these statistics ( f ) are called robustness measures.Similarly, Reliability-Based Design Optimisation (RBDO) deals with so-called reliability measures  g as constraints.
In the following, the original functions f and g are gathered in a vector q ∈ R m 1 +m 2 , that refers to Quantities of Interest (QoI).Namely,
Note that any combination of these may be used, and one has to choose the most relevant for a given application.
In practice, robust objectives usually are either expectation or worst-case measures.The latest can also be relaxed to quantile or superquantile measures.
As for reliability measures, they can impose hard constraint, by using worst-case measures, but are usually relaxed into probabilistic constraints, that ensures g(x, ) ≤ 0 with a given probability p.This can be readily obtained by using the p-quantile as a reliability measure.
Given the distribution of the random input variable , and specific measures , problem (2) can be solved with classical techniques.However, if the unitary evaluation of q is expensive,  becomes very tedious to compute.A compromise between the accuracy and computational cost of  must be chosen.
Specifically, we propose to consider these estimated measures as random variables, with a given probability density function (PDF) representing the associated inaccuracy.Next section presents this probabilistic setting in details.

Probabilistic setting
In this work, we aim to tackle multi-objective optimisation problems (≤ 3 objectives) where the statistical measures  used as objectives and constraint functions are not exactly known and regarded as random fields P over .In this context, the classical Pareto dominance does not hold and a probabilistic one must be defined.Several propositions exist in the literature, 18,36,37 notably differing by the assumed distribution shape.
The classical Pareto dominance is described in the case of minimisation.
where  is the admissible set, where all constraints are satisfied, and  is the failure set, where at least one constraint is not verified.Using this definition, two individuals that both violate the constraints will mutually dominate each other.This allows to discard inefficient individuals even when no satisfying individual has yet been found.
Here, a probabilistic Pareto dominance between two designs x 1 and x 2 hence consists in computing the probability P P [P(x 1 ) ≻ c P(x 2 )] of a design to dominate the other.Recall that P is a random field, meaning that P(x 1 ) and P(x 2 ) are random variables.For better generality and readability, P(x 1 ) and P(x 2 ) are denoted A and B in the following, respectively.
Formally, the unconstrained domination probability is simply probability for A to be lower than B in all dimensions, or equivalently the probability of the difference between A and B to be negative on all dimensions.This writes where  A−B is the PDF of A − B and Φ A−B is the associated cumulative density function (CDF).Formulation ( 7) is short and straightforward, by dealing with the distribution of the difference between A and B. An equivalent formula can be written using the joint PDF  A,B of A and B: As for the constrained case, definition (6) shows that for a to dominate b, either b must be in the failure domain, or a must be in the admissible domain and dominate b according to the classical Pareto dominance rule.The integral in (8)  can be split up depending on b belonging to the failure or admissible domain, and an additional dominance constraint can be added in the latest case.Hence, the domination probability reads: Note that this formulation naturally tackles the existence of dependency between the random variables A and B, which greatly differs from the assumptions made in References 18,37.
We finally propose a probabilistic constrained Pareto dominance rule between two random vectors: This dominance rule generalises the constrained boxed Pareto dominance introduced in Reference 22.However, in the context of random variables with infinite support (e.g.Gaussian variable), the dominance probability cannot reach exactly 1.It has been proposed in Reference37 to relax this threshold.The -relaxed probabilistic constrained Pareto dominance finally reads: The above formulation gives a general setting for comparing aleatory vectors, regardless of their distribution shape.In practise,  is usually taken at 10 −2 .Next section proposes a sampling-based technique to get an accurate representation of the random field P over the design space .

NON-PARAMETRIC MEASURE APPROXIMATION
We aim to make as few assumptions as possible on the distribution of P(x), for x in .To this extent, we only assume that realisations P (i) (x) can be drawn and exploit this empirical set of samples to characterise P(x).
More precisely, because P is a random field over the design space , we propose to draw realisations P (i) of the whole random field.This consists in jointly drawing P (i) (x) on a set of locations x.
In this setting, dominance must only be verified between these joint samples.With a number N samples of realisations, the probability of dominance between two designs can be empirically estimated as follows: In practise, we tend to take N samples between 10 3 and 10 4 .The major task is to obtain realisations P (i) (x) of the robustness and reliability measures.To this extent, we assume that a probabilistic surrogate can be constructed on the original Quantities of Interest q and draw realisations P (i) through measure estimations on functional realisations q (i) (x, ⋅), simultaneously for every studied location x ∈ .In this work, we use Gaussian process surrogate models, that are built using the GPy python package. 41e first recall the main characteristics and formulas of Gaussian Process surrogate models.Then, an approximated approach for drawing these realisations of P from a GP surrogate of q in the coupled space  × Ξ is proposed.

Gaussian processes
This section gives a quick reminder of the GP setting, its interpretability and the main formulas that will be used in the following.Gaussian process surrogate modelling, also called Kriging, is a Bayesian kernel-based regression method. 42he kernel setting, with the famous so-called kernel trick, allows to transpose most of the classical parametric (e.g.linear) approaches to the non-parametric world.In practise, basing on data points location x i , parametric features can be incorporated by replacing scalar product between the different parameters x ⊺ 1 x 2 by a scalar product between chosen features (x 1 ) ⊺ (x 2 ).Then, the kernel trick consists in choosing a kernel function k, and to use k(x 1 , x 2 ) in place of the previous scalar product.In this way, the features are not explicitly defined and can be of infinite number.Some kernels are called universal and engender a space of functions that is dense in the space of continuous function.
The Bayesian setting, in addition, tracks the distribution of all parameters (of infinite number with the kernel trick), and returns not only a predictive model of the Quantity of Interest, but its whole distribution at each location x.In the context of Gaussian process, this distribution is fully determined by a predictive mean and a predictive variance for the QoI.
In practise, Gaussian processes can also simply be viewed as an infinite dimensional Gaussian distribution, which reduction to a finite set of points is also Gaussian.In this setting, the kernel plays the role of covariance function between points and the predictive mean and variance are the conditional mean and variances at each point x.Given the training data {x train , y train } where y train = f (x train ) +  and  is a random Gaussian noise, they read: where K is the covariance matrix between training points x train , K = k(x train , x train ), k * is the covariance k(x train , x) between the new points x and the training points and k * * is the covariance between the new points, k(x, x).Matrix Δ refers to the so-called input noise or nugget effect.Usually, it regularises the problem with the same noise on each training point, Δ =  2 I N , with N the number of training data.However, in the heteroscedastic case, one can readily assign a different noise to each training point, meaning ) .In this article, hyperparameters of the kernel function are optimised by Maximum Likelihood.Because Automatic Relevance Determination (ARD) is exploited, a lengthscale parameter is assigned to each dimension.This heavily limits the use of these models when the problem features more than tens of dimensions.
Finally, one can draw functional realisations of a GP surrogate model, discretised on a finite set of points.A GP is a Gaussian random vector, so a realisation f (i) ∼  ( GP , Σ GP ) is drawn as follows: with L GP referring to the Cholesky decomposition of the covariance matrix, so that

Direct measure sampling in coupled space
We assume that a GP surrogate model can be built on q in the coupled design/uncertainty space.We here simply build a scalar GP for each output dimension.Based on training data { q(x i ,  i ) } i , the surrogate q is constructed and yields an estimation of q at any location (x, ) in the coupled space  × Ξ.
This surrogate model can be exploited to obtain an estimation of the robustness and reliability measure through Monte Carlo integration.This integration is directly performed on the predictive mean q of the GP.For example, with the expectation measure (x) = E  [q(x, )], one can use the estimator: More interestingly, one can draw realisations of the GP built on q, which directly translates into realisations of P by integrating over the uncertain dimension.
For example, in the case of the expectation measure (x) = E  [q(x, )], by evaluating q with N  locations in the uncertain space at N x different designs, the joint realisations of P over all these designs write as follows: drawn simultaneously on a realisation q (i) at all designs.The same can be carried out with empirical estimators of other robustness and reliability measures.However, one can notice that q (i) is drawn at N x × N  locations, which can quickly reach a value of more than 10 5 .Looking back at Equation ( 14), it reveals that Cholesky decomposition should hence be performed on a matrix of size 10 5 × 10 5 .The computational cost associated with such a task is prohibitive.This problem is quite common in the literature, and several approximations techniques have been proposed, such as a circulant embedding of the covariance matrix, 43 regularisation 44 or the use of a Gibbs sampler for Gaussian Markov Random Field. 45However, some of these approaches require equispaced points, and the others remain too expensive to allow substantial sampling of the q random field.
As raised in Reference 46, the complexity is particularly high when the domain is much larger than the characteristic correlation length of the random field.Intuitively, when a value is drawn, it gives information in its surrounding at a distance proportional to the correlation length.
Based on this idea of local information, an inducing point strategy is proposed in the next section for drawing realisations q (i) at an extremely high number of locations simultaneously.Note that the proposed approximation method corresponds to constructing a Sparse GP with the subset-of-regressors mindset, only for the sampling step.Note that structure-exploiting methods, notably using Matheron's Rule 25,47 would be a relevant alternative to Sparse GP in this context.We however chose to use the latter for their simplicity and versatility.
The need for efficient GP sampling strategies is paramount in the BO context.In Reference 38, the authors proposed the first inducing point allocation strategy explicitly designed for BO by exploiting the quality-diversity decomposition of determinantal point processes.In Reference 31, the authors optimise the inducing points positions specifically for excursion set estimation.In Reference 39, a variational formulation for sparse approximations is introduced that jointly infers the inducing inputs and the kernel hyperparameters by maximising a lower bound of the true log marginal likelihood.In addition, Wilson and Nickisch 40 illustrated a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs).In contrast to this work, we aim to employ sparse approximations specifically for estimating model error estimation of statistical measures.We propose here a method permitting a fast evaluation and a simple quality indicator for judging the approximation quality.Finally, this indicator used within an adaptive strategy ensures quick sparse estimation for obtaining measures error estimations.
Contrarily to most of the recent works, the aim is not to optimise the location of inducing points, 31,38,39 as inducing points are to be built many times in the proposed framework.An indicator-based approach is preferred for quickly sampling realisations of the posterior GP distribution so as to check the quality of randomly sampled inducing points.The overall strategy is presented in Section 3.2.1, and the quantitative quality indicator is proposed in Section 3.2.2 for use in an iterative strategy in Section 3.2.3.

3.2.1
Inducing points strategy The central idea is to draw realisations q (i) of the Gaussian random field on few well-spread points, from which a smooth functional realisation can be extended to the whole input space.These well-chosen points will be referred to as inducing points.
In the following, for conciseness, coordinates (x, ) in the coupled space are denoted with z.With a set of training designs z train = (x train ,  train ) and a set of inducing points z ind = (x ind ,  ind ), we denote z tot = z train ∪ z ind .
We assume that a GP has been constructed on z train , as plotted in Figure 1, on values q train and with an optimised kernel k train .Its predictive mean and covariance are written  train and Σ train .This surrogate model is the current estimation of q over the whole coupled space.
We aim to generate random realisations of this Gaussian field on an extensive set of points z draw .Direct Gaussian sampling would perform Cholesky decomposition on the matrix K draw = Σ train (z draw ) of size N 2 draw , which quickly becomes unfeasible.
Following a subset-of-regressors approach to sparse GP approximations, we propose to only draw realisations on the inducing points: with L ind the Cholesky decomposition of matrix K ind = Σ train (z ind ) of much smaller size N 2 ind .Figure 2 gives an example of such drawn values at the inducing points.
Similarly to the locations z tot , we gather training values q train and drawn ones q (i) ind in q (i) tot .The realisations drawn at the inducing points are then extended to all locations z draw by constructing a new GP on the (z tot , q (i) tot ) database.Note that the new GP kernel is not optimised as k train is directly used.Final realisations estimates q (i) draw are obtained as follows: where k * = k train (z tot , z draw ) and K −1 tot only need to be computed once as they do not depend on i.These realisations are represented in Figure 3, and the empirical 2- band of the random field is computed on 10 5 samples.One can note that they fit well the real confidence band of the underlying GP.
Intuitively, this approach corresponds to parametrising the random field as a sum of kernels located at all training and inducing points.This is noticeable in the above Equation (18).By denoting K −1 tot q (i) tot as a vector  (i) of coefficients, the proposed realisations q (i) draw are linear combinations of the kernel function k train located at each training and inducing point.This is where the name subset-of-regressors come from.Coefficients  (i) manage the randomness and kernel functions the spatial characteristic.The functional realisations write k train (z tot , ⋅) ⊺  (i) .
In practise, the proposed approach has a lot of similarities with using a Karhunen-Loève Expansion (KLE) of a Gaussian random field with Nyström approximation of the basis functions.Computational details are given in A.

Quality indicator
To attain a good accuracy for this random field approximation, a key element is the number of inducing points.We propose in this section an indicator to assess the approximation quality.Intuitively, if the density of inducing points is high enough with respect to the lengthscales, the predictive means q (i) draw (Equation ( 18)) will be able to represent all scales of the random process.In such cases, the associated predictive variance remains very low over the whole design space .
The GP covariance reads: with K * * = k train (z draw , z draw ).The predictive variance vector  2 tot (z draw ) is the diagonal of Σ tot (z draw ).We propose to base the quality indicator on the mean predictive variance over all points z draw .More precisely, as K −1 tot is positive definite, k we propose the following normalised indicator: where  ∈ [0, 1] and  ≃ 1 denotes a low predictive variance, meaning a good approximation quality.
Note that for most of the usual kernel functions, the diagonal of K * * is filled with  2 , the variance hyperparameter of the chosen kernel.This is notably the case with the exponential, squared exponential and Matern kernels.In these cases, using  2 N draw = Tr (K * * ) and linearity of the trace, the indicator reads: One may recognise a Frobenius norm at the numerator of the proposed indicator: where L tot refers again to the Cholesky decomposition of K tot .The proposed indicator is pictured in Figure 4 with nine and three inducing points, respectively.The approach gives an empirical 2- band very close to the real one in Figure 4A associated with a value of the quality indicator of 0.995.On the contrary, the approximated random field in Figure 4B is quite different from the underlying GP, and the quality indicator only has a value of 0.825.
For deeper understanding of this indicator, we study the behaviour of  on an analytical example.With D the dimension of the input parameters z, we study the following quantity of interest: To vary the complexity of q, we regard parameters D and L are random variables, which yields a random function q D,L .Here, we take D ∼  (⟦1, 4⟧) and L ∼  ([3, 20]) D .
For each sampled function q D (k) ,L (k) , a GP is built on 10 training data q D (k) ,L (k) (z train ).A set of points z draw is drawn uniformly within the input space.The predictive covariance K on these points is given by Equation (13).A set of well spread inducing points of size N ind ∼  (⟦10, 60⟧) is also drawn (e.g., with a maximin Latin Hypercube Sampling (LHS) strategy) and 10 5 realisations q (i) D (k) ,L (k) (z draw ) are computed at 10 3 locations z draw .The empirical covariance between all these realisations gives an approximation K of K and the indicator  from Equation ( 20) is computed.
All this process is repeated 300 times, and we plot in Figure 5 the evolution of the normalised Frobenius distance between K and K with respect to .
Figure 5 shows that a strong correlation exists between the proposed indicator  and the accuracy of the empirical covariance.A value of the indicator very close to 1 seems to ensure an accurate representation of the random field.Notably, the zoomed window permits to see that when  is greater than 0.99 or 0.995, the worst recorded empirical covariances only yield 5 to 10% relative error.
Note that several appropriate norms for covariance matrices exist.Namely, the logarithmic, Cholesky-based, and Riemannian distances between covariance matrices allow to preserve positive definiteness and have many desirable properties. 48However, because we draw very badly conditioned empirical covariances, very small negative eigenvalues may arise, that prohibit the use of such distances.For its simplicity and interpretability, the Frobenius distance is the most reliable choice.

Adaptive strategy
To take advantage of the proposed quality indicator, we propose an iterative algorithm ensuring that  is sufficiently close to 1 before returning the approximated realisations.The number of inducing points is hence increased as long as a prescribed value for  has not been reached.At each iteration, these points are spread in the input space using the classical maximin criterion on a Latin Hypercube Sampling (LHS).The algorithm is given in Algorithm 1.
In the following, the threshold  * will take values of 0.99 or 0.995, basing on Figure 5.The simple approach proposed here is very intuitive and reveals very cost-efficient.
F I G U R E 5 Accuracy of the empirical covariance K approximated with inducing points with respect to the proposed indicator.
Algorithm 1. Algorithm for choosing the number of inducing points

APPLICATION TO OUU
In this section, we give a general technique, heavily inspired from the Surrogate-Assisted Bounding-Box approach introduced in References 22,49, with the aim of making use of the non-parametric approximations presented in the previous section in an OUU context.A general formulation of the proposed approach is presented in Section 4.1, and its application with non-parametric measure approximation is made explicit in Section 4.2.

General formulation
The proposed approach integrates the concept of tunable accuracy within the OUU process.This concept has notably been presented in Reference 13 and exploited in References 21,22,49, with the aim of tuning the accuracy of each computation of  on the fly in order to focus computational power on the most promising designs.
In practise, robustness and reliability measures are computed using some evaluations of q, and their approximation error can be estimated.We propose to treat these approximated measures as random vectors, that can be compared under the -relaxed probabilistic constrained Pareto dominance paradigm, proposed in Section 2.3, in order to discard all dominated approximations.
Following the developments from Rivier and Congedo 22 , we further propose to rank all remaining measures with the POP min metric, which is an approximation of the Pareto Optimal Probability (POP) with desirable properties.Using the notations from Section 3, the metric writes as follows: The accuracy is then adaptively tuned by improving these approximations only on the most promising individuals.We refer to this approach as measure approximation with tunable accuracy (MATA), as different levels of refinement coexist, depending on the estimated performances.
In addition, we construct a surrogate model on these measure approximations in the design space.This model is referred to in the following as the SA model.It directly yields predictions of the robustness and reliability measures at any design x, so as to bypass  estimations when the SA model is sufficiently accurate.In essence, it permits to further lower the required number of function evaluations when the measures  are easier to model in  than q in the coupled  × Ξ space.
These two methods (SA and MATA) couple very efficiently to lower the cost of an uncertainty-based optimisation process.Figure 6

SA strategy
SA strategies are common practise for accelerating the optimisation process.In the proposed approach, a surrogate model is constructed and updated throughout the optimisation process directly on the robustness and reliability measures  in the design space.For each visited design x i , an aleatoric approximation P(x i ) is computed.The SA model then builds on the training set {(x i , P(x i ))} i .When enough information is gathered at a new design x new , the SA model bypasses the computation of P(x new ) and returns the current prediction  SA (x new ).One can expect this approach to be extensively exploited at the end of the optimisation process, when a lot of training points are available and most new designs are gathered in the optimal area.The SA model takes as training data aleatoric approximations of , and must yield an accuracy metric at any location.We propose to write the accuracy of the SA model at a new design x new as a vector  SA (x new ).The SA prediction  SA (x new ) is then directly returned to the optimisation process whenever  SA (x new ) ≤ s 1 , where s 1 is a user-defined threshold chosen beforehand..
Similarly to Rivier and Congedo, 22 we propose to carry out the whole optimisation process several times with decreasing values of s 1 .All training data {(x i , P(x i ))} i are transfered from one iteration to another.This procedure allows to get several intermediate results, that compromise between computational cost and accuracy.With high s 1 value, the number of evaluations of q is kept quite low as the SA model prediction  SA is quickly exploited.On the contrary, with a small s 1 value, many training data are required before employing the SA model, thus ensuring a better accuracy of  SA .
The structure of this approach is depicted in Algorithm 2. The computation and refinement of the probabilistic approximations P(x) (lines 4 and 12 of the algorithm) are described in the following Section 4.1.2.Update the SA model on 

8:
Get new designs  new to visit 9: ∀x ∈  SA , return  SA (x) to the optimiser 12: ∀x ∈  c , compute P(x) and return ρ(x) to the optimiser ⊳ Algorithm 3 Throughout the optimisation process, we denote  c the set of designs at which an approximation P has been computed.At each iteration, among the new designs  new given by the optimisation algorithm, all designs which computation is not bypassed by the SA strategy are added to  c and first approximations P and  are computed.Within this set, we denote by  r the set of designs that must be further refined.These designs satisfy two criteria: (i) they belong to the current Pareto front (P( c )) and (ii) the user-defined accuracy threshold s 2 is not reached.Formally: where (x) is a chosen measure of the variability of P(x), here its standard deviation, and  corresponds to the non-dominated measures under the -relaxed probabilistic constrained Pareto dominance paradigm.
Given this set of designs to refine, we propose to use the POP ranking (Eq.24) to select the most promising design within  r .The variability of the associated approximation P(x) is then reduced, at the cost of some evaluations of q.We denote x * the design maximising the POP min metric.
The MATA refinement technique, for a given set of designs  c and a fixed threshold s 2 , is detailed in Algorithm 3. Refine approximation P(x * ) with some evaluations of q 8:

Algorithm 3. Algorithm of the MATA technique
Update (P( c )) and  r 9: end while 10: return P(x) for all x in  c Similarly to the SA strategy, we propose to iterate over decreasing values of s 2 in order to obtain intermediate approximations of the Pareto front.

Application to non-parametric measure approximations
The above SAMATA strategy is presented in its most general form.In practise, specific choices must be made on the computation of approximations P(x), metrics ρ(x) and (x), and SA predictions  SA (x) and  SA (x).
The inducing point strategy presented in Section 3.2.1 permits to generate realisations of q on an extensive set of points z draw , to empirically estimate realisations of the measures P, for example, using Equation ( 16).In short, the obtained empirical distribution of P comes from the Gaussian assumption of the GP surrogate model built on q, composed with the specific statistical estimator.
The mean of this approximated distribution is returned to the optimisation process as ρ, and three times the standard deviation is used as accuracy metric .Concerning the refinements, we use the criterion proposed in Reference 22, corresponding to a linear combination of measure-specific criteria.
Finally, the SA model is constructed in a specific way to best exploit all the available information.SA-based measure approximations should take the form of set of samples and be jointly drawn with the realisations obtained in (16).To this extent, we propose to define the SA-based realisations as random draws of a GP surrogate model conditioned on the computed joint realisations P (i) c .In practise, on any set of design x SA , SA-based measure realisations P (i) SA (x SA ) are computed as such: Realisations of the SA model with the conditioning samples.
where the predictive mean writes and L SA results from a Cholesky decomposition performed on the predictive covariance Σ GP (x SA ) defined in Equation ( 13).In the above, P (i) c refers to realisations drawn from Equation ( 16), on the set designs x c .K c corresponds to the autocovariance matrix k(x c , x c ) and k * to the covariance k(x c , x SA ).Note that hyperparameters are only optimised once for numerical efficiency.
This SA model construction is illustrated on an example set of samples in Figure 7, where realisations of P are jointly drawn at ten locations in ascending order.Two bimodal sets of samples are notably drawn at x = 0.15 and x = 0.45 in order to emphasise the need for non-parametric reconstruction.Some SA-based functional realisations P (i) SA form Equation ( 27) are represented.The two bi-modal densities are naturally well captured using the proposed approach.All computed P (i) c and SA-based P (i) SA realisations are jointly redrawn at each computed measure refinement, in order to use the joint dominance probability from Equation (12).
These formulations can be readily incorporated within the SAMATA strategy, presented in Section 4.1 and pictured in Figure 6.The specific algorithm is given in Algorithm 4.
Finally, specific numerical choices such as the optimisation algorithm and the threshold values s 1 and s 2 will be made case by case.

ANALYTIC COMPARISON
In this section, the proposed SAMATA approach depicted in Algorithm 4 is quantitatively compared to SABBa from Rivier and Congedo 22 with Coupled-Space surrogate models, which was shown very cost efficient.As a reference, we also compare these techniques to an A priori metamodel strategy, where a Coupled-Space surrogate model is constructed in the coupled  × Ξ space on a given number of data points and is readily employed as is for measure estimation and optimisation.
The three approaches are compared in terms of the expected modified Hausdorff distance between the approximated Pareto optimal designs and the true ones.Such a metric was proposed in Reference 22 in the context of Bounding-Box approach.Here, by construction, joint realisations P (i) (x) of the aleatory measure approximations can be drawn simultaneously on a set of designs x using ( 16) and (27).This readily permits to employ the aforementioned metric, which formula is recalled here: where d ′ H is the modified Hausdorff distance from, 50   is the Pareto front pre-image, and  is an aleatory variable corresponding to the pre-image of the currently approximated aleatory Pareto front.This expected value is empirically Algorithm 4. SAMATA algorithm with joint measure approximations 1: Launch N init evaluations q(x i ,  i ) 2: Generate a first coupled space surrogate model q 3: Loop over values of s 1 and s 2 4: while optimisation running do 5: Get new designs  new 6: Initialise  c empty 7: Compute P SA for all x ∈  new from ( Compute Deduce  0 r from  0 c using (25)   10: Compute ∀x ∈  new , POP min (x) w.r.t.all designs using ( 24) and ( 12) 13: Find x * from  k r using (26)   14: Find  * that maximises the refinement criterion from (22)   18: Launch evaluation of q(x * ,  * ) 19: Update the coupled space surrogate model q 20: Update all approximations P (i) c (x) for x ∈  c using empirical estimators like ( Update Compute  k+1 c ,  k+1 SA and  k+1 r 23: end while 25: Return E to the optimiser for all x ∈  new 26: end while 27: Return all measure approximations P(x) and their associated POP min score estimated, exploiting the capability of drawing joint realisations of P. Note that in a normalised input space, a value Q  = 1 corresponds to a bad score while Q  = 10 −2 reveals a very accurate approximation of the Pareto front.
The NSGA-II optimisation algorithm is used in the following test-cases.The number of individuals per generation is set to 32.Both the optimisation and refinement processes are sequential, and the coupled-space surrogate q is initialised with a Latin Hypercube Sampling (LHS) of 30 points.
Each technique is run ten times to get the mean and standard deviation of the convergence curves.

Test-case 1: Unconstrained Taguchi optimisation
This problem is a bi-objective robust optimisation proposed in Reference 21 There are two design variables x and one uncertain parameter .The problem reads: ) where: F I G U R E 9 Test-case 1: Q  metric for the A Priori Metamodel (solid blue), SABBa (dashed orange) and the proposed non-parametric SAMATA (dotted green) approaches.
The Pareto front associated with this problem is discontinuous, and the optimal set in the design space consists in segment and a point (see Figure 8).
The performance of the three compared strategies is depicted in Figure 9, showing the mean convergence over ten runs and the Bollinger Band corresponding to the associated standard deviation.Both the Bounding-Box and the proposed non-parametric SAMATA approaches show a steeper convergence than the A Priori Metamodel technique.In addition, the non-parametric approach reveals significantly more parsimonious than the other two techniques, reaching a Q  metric of 10 −2 around 60 function evaluations, corresponding to a gain of nearly one order of magnitude over the reference methods.

Test-case 2: Quantile-constrained mean optimisation
This second test-case, proposed in Reference 22, assesses the performance of the proposed non-parametric approach on a mono-objective optimisation problem with a reliability based constraint, in a higher dimensional context.The objective is a robustness measure derived from a simplified Rosenbrock function and the constraint a quantile formulation of the F I G U R E 10 Test-case 2: Q  metric for the A Priori Metamodel (solid blue), SABBa (dashed orange) and the proposed non-parametric SAMATA (dotted green) approaches.
Six-Hump Camel function.It features four design variables and three uncertain parameters, and is stated as follows: minimise: satisfying:  g (x) = q 95% (x) ≤ 1 where: ) The optimum design is x * ≈ (0.7033, 0.7035, 0.6212, 0.3859) with q 95% (x * ) = 1 and (x * ) ≈ 0.4981.The convergence curves are depicted in Figure 10.Similarly to Figure 9, the A Priori Metamodel is the less efficient strategy, and the proposed non-parametric SAMATA approach is the most parsimonious.The standard deviation over the ten runs are greater in this test-case, but the proposed method permits to reach a metric of 10 −1 for less than 70 function evaluations, against 110 for the Bounding-Box approach and more than 200 with the A Priori Metamodel.

Benefit of sampling-based approximations
The so-called projected processes techniques [32][33][34][35] permit to solve OUU problems in a BO setting (also called Efficient Global Optimisation, EGO).They however rely on independent Gaussian approximations of the robustness and reliability measures.Such Gaussian approximations are cheap to estimate and particularly adapted for expectation-like measures, but might fail to accurately capture more complex or correlated distributions.For deeper insights on this method, see Baudoui. 33e run ten optimisations of the first test-case using the proposed approach, but replacing the jointly drawn realisations of P with samples from a Gaussian distribution which mean and variance correspond to the empirical moments from the F I G U R E 11 Convergence curves of the non-parametric SAMATA strategy (solid blue) against its Gaussianised equivalent (dashed orange).Variability represented with the transluscent ± band.initial samples.In this manner, we aim to investigate solely the impact of using a different density estimation, with either the proposed sample-based strategy or a simple moment-based Gaussian fit.
Figure 11 shows the two convergence curves, with their Bollinger bands.Although the convergence seems to plateau around the same metric value, the Gaussianised approach shows slightly inferior accuracy compared to the original non-parametric strategy along the whole study.Intuitively, we argue that accurate distribution shapes and dependencies are lost when Gaussian density is forced, which may deteriorate the intermediate approximations of the Pareto optimal set.

ORC APPLICATION
We perform here a robust shape optimisation of a typical converging-diverging turbine nozzle for ORC applications.It was extensively studied in the context of deterministic [51][52][53] and multi-point optimisation. 54Some recent works tackled such design under epistemic uncertainties 55 due to turbulence modelling, and investigated the robustness of the blade design under aleatoric uncertainties in the operating conditions, thermodynamic model parameters and geometric tolerances. 56e propose here to minimise the mean of the performance function while the mean mass-flow rate is constrained to lie within a prescribed range centred on the nominal value.The Biere is a reference two-dimensional benchmark geometry to test the design of devices operating with the siloxane fluid MDM (Octamethyltrisiloxane, C 8 H 24 O 2 Si 3 .This blade profile is meant to obtain a convergent-divergent cascade passage which serves to accelerate the fluid up to supersonic speed.Compressibility effects play a crucial role and a typical fish-tail shock pattern is generated downstream the trailing edge.Strong shocks may cause significant losses, design of the trailing edge region is thus critical for the turbine efficiency.The Mach field is depicted in Figure 12 with the baseline blade profile.

Problem setting
The objective function ΔP is defined as the standard deviation of the azimuthal distribution of static pressure half an axial chord downstream of the blade trailing edge (TE).Minimising ΔP should yield a significant reduction of the shock strength.As mentioned above, a constraint is imposed on the mass-flow rate per unit span ṁ, normalised with respect to the nominal value.Uncertainties on the operating conditions are considered both at the inlet and outlet of the turbine, through changes of boundary conditions.Following, Pini et al. 54    The optimisation problem follows a multi-objective Taguchi formulation:

TA B L E 1
where Ξ = [7.95,8.05] × [541.15,549.15]× [1, 2].The design space  allows to modify the shape of the blade as much as possible while ensuring convergence of the numerical simulation.In practise, it consists of 9 of the 30 Control Points (CP) of a B-spline curve of degree 3, that can be displaced in the direction normal to the baseline geometry.Such parametrisation was proposed in 58 and is illustrated in Figure 13.A mono-objective version of (32) is also treated for comparison purposes.Results will also be compared with two deterministic optima obtained at the Nominal and Mean operating conditions from Table 1.
Simulation is performed using the the non-ideal compressible-fluid dynamics solver included in the SU2 [59][60][61][62] suite on unstructured grids, generated using an in-house meshing tool.A dedicated mesh deformation tool based on radial basis functions (RBF) allows high flexibility and robustness while maintaining the grid connectivity throughout the optimisation steps.

Mono-objective results
Because nominal conditions are not centred within the uncertain range, we conduct the reference deterministic optimisations both at Nominal and Mean conditions.Similarly to Razaaly, 63 a state-of-the-art BO strategy is employed, with an expected improvement (EI) criterion multiplied by the probability of feasibility (PF). 64Convergence is reached after roughly 100 function evaluations.
The robust optimisation problem is solved using the proposed non-parametric SAMATA strategy.The NSGA-II optimisation procedure is again exploited, and constraints are handled through penalisation.The initial DoE consists of 30 evaluations drawn with a maximin Latin Hypercube Sampling (LHS) in the coupled space.Each population contains 32 individuals, and measure refinements are performed sequentially.We iteratively take threshold values of 10%, 5%, 3%, 2% and 1%.
Convergence values are gathered in Table 2, with a final optimum centred on  f = 8087, with a standard deviation of the approximation error of 44.26.This optimum satisfies the 1% accuracy imposed by the normalised thresholds and only required 145 simulation calls, which is not significantly higher than the deterministic optimisation.The proposed approach permits to reduce by around one order of magnitude the number of required function evaluations with respect to 63 that employed a more classical nested approach.
The point-wise ( 0 and   ) and mean values of the objective function are computed at the Nominal x * 0 , Mean x *  and Robust x * OUU operating conditions.Results are gathered in Table 3 and the full distributions of ΔP over  at these given designs are plotted in Figure 14.These distributions are obtained with an extensive Monte Carlo on a GP built over Ξ at a given x, with 50 training points.
As expected, the nominal optimum yields very high variability in Figure 14A but very low nominal output.The Mean and Robust optima are quite similar, Figure 14B,C.However, the Robust optimal blade manages to reach a lower value for the expectation of ΔP, with a slightly modified distribution shape.

Bi-objective results
We then tackle the complete Taguchi formulation (32) of the OUU of the ORC turbine blade.This formulation aims to find the Pareto front between good expected performance and low variability.The proposed non-parametric SAMATA algorithm is applied, with again the NSGA-II optimisation method and normalised thresholds of 10%, 5%, 2% and 1%.The Pareto fronts associated with these threshold values are depicted in Figure 15, and reveal great improvements in the Pareto front accuracy throughout the iterations.The total computational cost for attaining the chosen thresholds is of 72, 122, 276 and 419 simulation calls respectively.Each cloud of samples associated with a given design has its transparency driven by its POP value (1: fully visible, 0: transparent).The mono-objective optimum found in the preceding section is represented as a red dot and lies on the Pareto front, as expected.

CONCLUSIONS
This paper describes an efficient strategy for solving multi-objective and constrained OUU problems.It consists in coupling two primary techniques: (i) a MATA approach that only refines promising individuals up to a given threshold, (ii) a SA strategy that constructs a surrogate model of the objective and constraint functions in the design space to bypass further evaluations when it is sufficiently accurate.
To apply this framework in practise, we propose non-parametric approximations of the error distributions using a sampling-based technique.The robustness and reliability measures for a given design are represented as a set of realisations that may exhibit complex shapes and dependencies.These realisations usually reveal more representative and can be jointly drawn between several designs.Issues associated with drawing such joint realisations are highlighted, and an adaptive algorithm is proposed to keep the associated cost manageable.This strategy show similarities with Nyström approximation for Karhunen-Loève expansion (KLE) but allows for intuitive control of the overall accuracy and adaptive sampling.An SA model that takes advantage of these joint realisations is introduced to apply the SAMATA strategy.Quantitative comparisons with state-of-the-art approaches on several algebraic test cases reveal a drastic increase in the overall parsimony.
Finally, the proposed approach is applied to an engineering problem dealing with the geometric optimisation of the blade profile in an Organic Rankine Cycle turbine.Convergence is attained for a cost of 100-150 evaluations in a mono-objective setting, which is one order of magnitude less than the state-of-the-art approach.The Pareto front of a bi-objective formulation is accurately recovered at the cost of 100-400 evaluations.
Several perspectives can be drawn.Regarding overall parsimony, cost improvement could be achieved by improving the uncertainty propagation (UP) techniques associated with specific statistical measures.For example, extreme quantile estimation and refinement should be performed with appropriate techniques such as Reference 65 or 66.Such approaches can be readily incorporated into the proposed framework.Similarly, we only performed sequential refinement in this work, while techniques such as Kriging believer can be exploited to return a set of new locations.Multi-point refinement is critical to take advantage of nowadays' high parallelisation capacities of computational clusters.Structure-exploiting methods for sampling from GP posterior distributions 25,47 could also be a powerful alternative to the proposed inducing-points sparse GP approximations, that would deserve further exploration.Finally, the proposed non-parametric error estimation relies on coupled-space surrogate models that may become unfeasible in high dimensional uncertain space or when the simulator is intrinsically stochastic.Specific developments for using SAMATA in this context would effectively widen its range of application.ORCID M. Rivier https://orcid.org/0000-0002-7828-7326N. Razaaly https://orcid.org/0000-0002-9330-8439P.M. Congedo https://orcid.org/0000-0003-3266-3549

5 )
For two vectors a, b ∈ R m , a ≻ b (a dominates b) ⟺ ∀j ∈ ⟦1, m⟧, a j ≤ b j and ∃j ∈ ⟦1, m⟧, a j < b j , a ≻≻ b (a strictly dominates b) ⟺ ∀j ∈ ⟦1, m⟧, a j < b j , a ∼ b (a is indifferent to b) ⟺ a ⊁ b and b ⊁ a.(In the presence of constraints, we follow the constrained Pareto dominance proposed in Reference 22.For two vectors a, b ∈ R m , with a f and b f the objective values and a g and b g the constraint values,

F I G U R E 1
Gaussian process model built on the training data.

F I G U R E 2 F I G U R E 3
Realisations drawn at the inducing points.Functional realisations from draws in Figure2.Approximated 2- band of the stochastic process in dashed red.

)F I G U R E 4
Approximated random field with (a) nine inducing points,  = 0.995 and (b) three inducing points,  = 0.825.
gives a flowchart of the global strategy, called SAMATA.The specific notations are presented in the following sections 4.1.1 and 4.1.2.

F I G U R E 6
SAMATA strategy flowchart.

Algorithm 2 .} i empty 3 :
Algorithm of the SA strategy 1: Set initial thresholds s 1 2: Initialise the training set  ={(x train i , P(x train i ))for k from 0 to N threshold do

F I G U R E 8
Test-case 1: a) Discretisation of the design space in red and Pareto optimal sets in black.b) Image of the discretised points in the objective (,  2 ) space.

F
I G U R E 12 Mach contours at nominal conditions for the baseline profile and computational grid of 36k cells.

F
I G U R E 13 B-splines parametrisation.Fixed CP in black, moving CP in red.

F I G U R E 14
Distribution of ΔP at (a) x * 0 , (b) x *  and (c) x * OUU .ΔP(x,  0 ) represented as a dashed black line, ΔP(x,   ) as a dotted black line and E  [ΔP(x, )] as a plain red line.TA B L E 3 Nominal values and statistics for different blade profiles.Blade profile P(x,  0 ) (Pa) P(x,   ) (Pa) E  [P(x, )] (Pa) Nominal optimum x = x
: Choose initial number of inducing points N init and added per iteration N add 3: Build a GP on the training data 4: N ind = N init 5: while True do 1: Choose threshold  * ∈ [0, 1] 2 57 model the operational variability as independent and uniform uncertain variables, gathered in  = [P t in , T t in , P s out ].The associated ranges are reported in Table1, alongside Nominal and Mean conditions.Uncertainties on the parameters of the thermodynamical model are neglected since previous studies57provided evidence of their limited impact.
Operating conditions: nominal, mean and random.

𝝆 f (Pa) 𝝆 g (kg.s −1 ) N eval Mean Std. dev. Mean Std. dev.
Optimal outputs associated to the different threshold values.