Resource-Constrained Scheduling Algorithms for Stochastic Independent Tasks With Unknown Probability Distribution

This work introduces scheduling algorithms to maximize the expected number of independent tasks that can be executed on a parallel platform within a given budget and under a deadline constraint. The main motivation for this problem comes from imprecise computations, where each job has a mandatory part and an optional part, and the objective is to maximize the number of optional parts that are successfully executed, in order to improve the accuracy of the results. The optional parts of the jobs represent the independent tasks of our problem. Task execution times are not known before execution; instead, the only information available to the scheduler is that they obey some (unknown) probability distribution. The scheduler needs to acquire some information before deciding for a cutting threshold: instead of allowing all tasks to run until completion, one may want to interrupt long-running tasks at some point. In addition, the cutting threshold may be reevaluated as new information is acquired when the execution progresses further. This work presents several algorithms to determine a good cutting threshold, and to decide when to re-evaluate it. In particular, we use the Kaplan-Meier estimator to account for tasks that are still running when making a decision. The efficiency of our algorithms is assessed through an extensive set of simulations with various budget and deadline values, and ranging over 13 probability distributions. In particular, the AutoPerSurvival(40%,0.005) strategy is proved to have a performance of 77% compared to the upper bound even in the worst case. This shows the robustness of our strategy.


Introduction
This paper focuses on the design of scheduling strategies to maximize the expected number of successfully executed tasks on a parallel platform composed of identical processors.The execution time of the tasks is not known before execution.The only information known by the scheduler is that these execution times are independent and identically distributed (IID) random variables obeying the same probability distribution, but this distribution is unknown.The scheduler has both a deadline constraint d and a budget constraint b.At any time, and on each enrolled processor, the scheduler can decide whether to interrupt a long-running task T to start a new task T ′ , with the hope that T ′ will have an execution time shorter than the remaining execution time of T .However, there is a big risk involved with such a decision because: (i) the time and budget spent to execute T until its interruption are completely lost; and (ii) T ′ may well happen to have an execution time longer than the remaining execution time of T .
In this non-clairvoyant setting, what is the optimal strategy?Intuitively, the scheduler must first decide how many processors to enroll.Then, the scheduler needs to acquire some information about task execution times by letting several tasks run until completion on each processor.At some point, the scheduler synthesizes the information acquired so far and decides for a scheduling policy.This policy could be either to allow all tasks to run until completion, or to define a cutting threshold τ after which every long-running task should be interrupted.The cutting threshold τ can be recomputed dynamically as the execution progresses until the deadline d is reached or the budget b is exhausted, whichever comes first.Each of the above decisions involves a complicated trade-off.Indeed, deciding for the threshold early avoids consuming a significant fraction of the deadline and of the budget before interrupting any task.But this can lead to an imprecise estimation because the threshold value is based on little information.On the contrary, deciding for the threshold later during the execution leads to making a more accurate decision, at the risk of having wasted resources unduly.This work introduces several strategies to determine a good threshold, and at the right moment in the execution.
This scheduling problem has the (somewhat non-standard) objective to maximize the expected number of successful tasks with a given budget and deadline.Not all tasks will be successfully executed in the end: some tasks will be interrupted, and some tasks will never be launched.This problem is very closely related to imprecise computations [2,9,23].In imprecise computations, it is not necessary for all tasks to be completely processed to obtain a meaningful result: tasks are divided into a mandatory and an optional part: while the execution of all mandatory parts is necessary, the execution of optional parts is decided by the user.Often the user has neither the time nor the budget to execute all optional parts, and they must select which ones to execute.Our work perfectly corresponds to the optimization of the processing of the optional parts.
Among domains where tasks may have optional parts (or some tasks may be entirely optional), one can cite recognition and mining applications [26], robotic systems [17], speech processing [11], and [21] also cite multimedia processing, planning and artificial intelligence, and database systems.In these applications, the processing times of the optional parts are of similar nature but are heavily data-dependent, hence it is very natural to model them via a probability distribution D. However, this probability distribution D is unknown before processing, and can be only determined through sampling many tasks.Unfortunately, in our scheduling problem, letting the scheduler sample many tasks without interruption to learn, say, the mean and standard deviation of the distribution, can prove very costly: it will consume a significant part of the budget and will prove suboptimal for any distribution requiring a small cutting threshold τ , such as lognormal distributions (see below).
This paper builds upon our previous work [6] where we tackle the dramatically simpler problem where the distribution D is known.In that case, we proposed an analytical method to compute the optimal threshold τ .Section 5.1 provides background material on this method.For some distributions, the optimal strategy is to never interrupt any task (τ = +∞), while for some others, such as some lognormal distributions, there is an optimal cutting threshold.
Regardless, when the distribution D is known, the approach in [6] provides an asymptotically optimal solution.The main focus of this paper is to investigate efficient strategies when the distribution D is unknown.To the best of our knowledge, this work constitutes the first attempt to address this challenging problem.
The major contributions of this work are the following: • We provide a detailed study of bimodal exponential distributions, for which we exhibit a wide range of cutting thresholds.Small changes in the parameters of the distribution have a big impact on the cutting threshold, which nicely illustrates the difficulty of the scheduling problem with unknown distributions.• We design a set of scheduling heuristics that use different estimators of the cutting threshold τ , and that refine this estimation periodically as the execution progresses.• We show how to use to the Kaplan-Meier estimator [20] to account for long-running tasks when estimating the threshold τ .• We introduce several methods for deciding when to compute and recompute the threshold.
• We report a comprehensive set of simulation results that compare the heuristics for various budget and deadline values, using up to 13 different probability distributions.The rest of the paper is organized as follows.Section 2 surveys related work.Section 3 provides an introduction to the Kaplan-Meier estimator.We detail the framework and objective in Section 4. In Section 5, we provide background on prior strategies for interrupting tasks when the distribution is known (Section 5.1), together with a set of new results for Exponential distributions (Section 5.2).We provide new scheduling heuristics when the distribution is unknown in Sections 6, 7 and 8: Section 6 is devoted to methods for computing the cutting threshold accurately, while Sections 7 and 8 focus on different heuristics of when to recompute it.In Sections 7 and 8, we assess also the performance of our heuristics through an extensive set of simulation parameters.Finally, we provide concluding remarks and directions for future work in Section 9.

Related work
We first overview studies dealing with bags of tasks in Section 2.1.Then we survey work related to non-clairvoyant scheduling in Section 2.2.Next we discuss related task models such as imprecise computations in Section 2.3.

Bags of tasks
A bag of tasks is an application composed of a set of independent tasks sharing some common characteristics: either all tasks have the same execution time or they are instances sampled from the same probability distribution.The survey [34] deals with resource optimization for bag of tasks applications, but does not consider the non-clairvoyant case.Several works devoted to bag-oftasks processing explicitly target cloud computing [4,7,15,29].These works consider the classical clairvoyant model, in which we know the exact execution times of the tasks, or their probability distribution, or maybe only their range and standard deviation.Vecchiola et al. [36] consider a single application comprising independent tasks with deadlines but without any budget constraints.In their model, tasks are supposed to have different execution times but they only consider the average execution time of tasks rather than its probability distribution (this is left for future work).Moreover, they do not report on the amount of deadline violations.Mao et al. [25] consider both deadline and budget constrained provisioning and assume they know the tasks execution times up to some small variation (the largest standard deviation of a task execution time is at most 20% of its expected execution time).Hence, this work is more related to scheduling under uncertainties than to non-clairvoyant scheduling.

Non-clairvoyant scheduling
The work surveyed in Section 2.1 assumes a fully or semi clairvoyant set of task execution times, which is not realistic in many applicative scenarios.In contrast, our model considers a fully non-clairvoyant case, in which we have no information in advance about the execution times of our bag of tasks.Although this topic has received less attention, we can still find several references.For instance, Im et al. [18] and Pawan et al. [33] both worked on online algorithms.They assume that the size of arriving tasks is not known before completing them.In [18], a unified model is designed for several different scheduling problems, while [33] aims at minimizing flow-time and energy.In the work of Li [22], task execution times are unknown, and the objective is to minimize the makespan while using one or several multicore processors.A group of authors [27][28][29] has published several studies focusing on budget-constrained makespan minimization.They do not assume to know the distribution of execution times but try to learn it on the fly [27,28].This work differs from ours as these authors do not consider deadlines.For instance, in [29], the objective is to try to complete all tasks, possibly using replication on faster processors, and, in case the proposed solution fails to achieve this goal, to complete as many tasks as possible.The implied assumption is that all tasks can be completed within the budget.We implicitly assume the opposite: there are too many tasks to complete all of them by the deadline, and therefore we attempt to complete as many as possible; we avoid replication, which would be a waste of resources in our framework.

Task models
As mentioned in Section 1, our model assumes that some tasks may not be executed, which is very closely related to imprecise computations [2,9,23].Furthermore, this task model also corresponds to the overload case of [3] where jobs can be skipped or aborted.Another related model, is that of anytime tasks [19] where a task can be interrupted at any time, with the assumption that the longer the running, the higher the quality of its output.Such a model requires a function relating the time spent to a notion of reward.Finally, we note that the general problem related to interrupting tasks falls into the scope of optimal stopping, the theory that consists in selecting a date to take an action, in order to optimize a reward [12].

The Kaplan-Meier estimator
In medical research, biostatisticians have to answer questions like: "What is the probability that a patient will still be alive 5 years after receiving a cancer diagnosis?"To answer such a question, biostatisticians analyse the data of many individual patients.Some of these data will be complete: they will have both the time of diagnosis and the time of death of the patient.However, at the time of the analysis, some patients enrolled in the dataset will still be alive.
The status of some other patients may be unknown because contact with them has been lost (e.g., they have moved away).In both cases observations are incomplete.One only knows the time of diagnosis and the last time the patient was known to be alive.Hence, one only knows a lower bound on the time the patient has survived after the diagnosis.These incomplete "lower-bound" data are called right-censored data and the question addressed by biostatisticians is that of survival analysis with right-censored data.This problem is exactly our scheduling problem, only the vocabulary changes: • instead of survival times, we have execution times; • instead of diagnosis times, we have start times; • at the time of analysis, instead of patients still alive, we have tasks still running; • at the time of analysis, instead of patients with unknown whereabouts, we have tasks that have been terminated by the scheduler before completion.We can therefore use the well-known tool to solve survival analysis with right-censored data, namely the Kaplan-Meier estimator [1,20].We provide a detailed example in Section 6.2.
We refer the interested reader to [1] for a thorough overview of survival and event history analysis.Survival analysis with the Kaplan-Meier estimator is widely used in biostatistics [8,16,37], and in a variety of other domains such as engineering [31], economics [24], etc.To the best of our knowledge, this work is the first to use the Kaplan-Meier estimator for scheduling bags of tasks, let alone for any problem involving scheduling decisions.In core computer science, the Kaplan-Meier estimator seems having been used only for the analysis of software projects (see [30,32] and the references therein).
Furthermore, the present study appears to be unique because it uses a fully non-clairvoyant framework and assumes an overall deadline in addition to a budget constraint.Our previous works [5,14] had the same setting under homogeneous [5] or heterogeneous [14] platforms.But in these works, we assumed that the distribution of execution times was known in advance, while the key problem studied in the current paper is to learn the distribution of task execution times on the fly and to decide when interrupting unfinished tasks.

Problem definition
We consider a parallel platform composed of M identical processors.The execution time of a task on a processor obeys an unknown probability distribution D. Without loss of generality, we assume that it costs one budget unit to execute a task for one second on any processor, and we have a total budget b and an overall deadline d.Thus the budget can also refer to the available machine time.
At any time, and on each enrolled processor, the scheduler can decide whether to interrupt a long-running task to start a new task, with the hope that the new one will have an execution time shorter than the remaining execution time of the old one.However, we assume that execution is non-preemptive: if the execution of a task is interrupted, it cannot be restarted, and all the work done (and the budget spent) so far for that task is lost.
Our objective is to maximize the total number of tasks successfully completed under the budget and deadline limits.To drive the design of our scheduling policies, we use an instantaneous version of this objective, namely the yield, which is defined as the expected number of tasks completed per unit of budget spent.Because one unit of budget corresponds to one second of execution, the yield is also equal to the expected number of tasks completed per second.
All scheduling policies are required to have polynomial complexity.Since a solution to the problem is the list of the tasks that are executed, either partially or successfully (for each of these tasks, the scheduler made a decision), the size of the problem is proportional to that number of tasks.This number in turn is proportional to the budget (or deadline), divided by the expectation of the (unknown) probability distribution D, since the average execution time until completion of a task is µ(D).Furthermore, the scheduling policies will make decisions and compute a cutting threshold several times during the whole execution; we require that the number of such decisions be constant, and they will typically be taken each time a prescribed percentage of the budget is spent.The motivation here is to cap the overhead incurred by the scheduler by forbidding to recompute a threshold each time a new task is executed.

Optimal strategies for known distributions
In Section 5.1, we recall previous results for known distributions, namely an asymptotically optimal policy for discrete distributions and its extension to continuous distributions [5,6].Then, in Section 5.2, we study the case where the distribution of task execution times is defined by a bimodal exponential distribution.This study shows how small changes in distribution parameters can lead to drastically different optimal scheduling policies.

Background on previous approaches
A scheduling policy has to decide whether all tasks should be allowed to run until completion, or whether some tasks should be interrupted and, in the latter case, which tasks and when?In [5,6] we provided answers to this key question.We review the approach to determine a cutting threshold, first for discrete distributions of task execution times (Section 5.1.1),and then for continuous distributions (Section 5.1.2).
In addition to determining a cutting threshold, the scheduler should decide how many processors to enroll.With a budget of b and a deadline of d, we enroll b d processors.The rationale is that this is the minimum number of processors required to exhaust the budget.Because the policy on each processor is asymptotically optimal (see below) enrolling more processors will not be beneficial for large budgets, and could lead to waste due to budget fragmentation for smaller budgets.

Discrete distributions
We consider a discrete distribution D under which there are k possible task execution times, w 1 < w 2 < ... < w k .A task has an execution time w i with probability p i , with 0 ≤ p i ≤ 1 and k j=1 p j = 1.The simplest policies that interrupt task executions are the fixed-threshold strategies.A fixed-threshold strategy interrupts every not-yet-completed task at a predefined threshold τ , i.e., when the task has been executing for a time τ without completing.The yield of the fixed-threshold strategy of threshold τ is computed as follows: where I(τ ) is the index of the largest task execution time smaller than or equal to τ : otherwise.This complicated formula has an intuitive explanation: the probability of success with cutting threshold τ is j=1 p j , and the execution time is averaged as follows: some tasks have (successfully) executed in w j seconds, with probability p j , for each j ≤ I(τ ), and the remaining tasks have been interrupted after τ seconds (with the remaining probability 1 − I(τ ) j=1 p j ).The following theorem states that the best fixed-threshold strategy is asymptotically optimal when the platform includes a single processor (M = 1) [5,6].
Theorem 1 Let τ opt = arg max τ ∈{w1,...,w k } Y(τ ).If the platform includes a single processor, the fixed-threshold strategy of threshold τ opt is asymptotically optimal among all possible strategies when the budget tends to infinity (the deadline being equal to the budget).
With several processors available, we enroll b d processors and execute on each of them the fixed-threshold strategy of threshold τ opt .

Continuous distributions
We now consider a continuous distribution D of task execution times whose cumulative distribution function is F (x) and its probability density function f (x).The execution time of a task is thus defined by a random variable X which follows D. With these notations, the probability that the execution is no longer than a duration t is: P (X ≤ t) = F (t).Then, the equation of the yield of the fixed-threshold strategy of threshold τ is easily extrapolated from that for discrete distributions (Equation 1): The optimal threshold is then, like previously:

New results for exponential distributions
To illustrate the fact that small differences in the distribution of task execution times can lead to dramatically different optimal policies, we study the case where task execution times follow exponential distributions.We assume that the distribution is either unimodal or bimodal, but we formally express it as a bimodal case: the unimodal case will appear as a special case where both modes coincide.Task execution times are thus defined by a bimodal exponential distribution of parameters λ and µ, chosen with respective weights p and 1 − p, where 0 ≤ p ≤ 1.In other words, each time we need to generate a new task execution time, with probability p we generate an execution time using an exponential distribution of parameter λ and with probability 1 − p we generate an execution time using an exponential distribution of parameter µ.Without loss of generality, we assume that µ ≥ λ.A potential problem with exponential distributions is that task execution times can be arbitrarily small.This seems unrealistic: independently of the task size, the system requires some time to load (part of) the code of the task and prepare for execution.Furthermore, the possibility of arbitrarily small execution times can lead to pathological situations (for instance, see Theorem 2 below).Therefore, one may want to add a positive constant time δ to the sum of the random variables (one can always set δ = 0 if one does not want to add such a constant).In this context, δ can be seen as the lower bound on any task execution time.Altogether, this leads to the following density function for the distribution of probability: Theorem 2 defines the optimal cutting threshold for a fixed-threshold strategy for the distribution of task execution times whose density obeys Equation 3. Its proof can be found in the appendix.
Theorem 2 When task execution times are defined by a bimodal exponential distribution plus a nonnegative constant, the optimal cutting threshold τ opt and the optimal yield Y opt are as follows: • If the constant is null (δ = 0) -If there is a single mode, any value for the threshold is optimal and Y opt = λ.-If the two modes are distinct, τ opt = 0 (tasks should be interrupted as soon as possible), and ≥ 0, then τ opt = +∞ (tasks should never be interrupted), and Y opt = This equation should be solved numerically and its solution should be injected in Equation 2to obtain the value of Y opt .
Certainly the most striking (and counter-intuitive) part of this theorem is the case where δ = 0 (tasks can be arbitrarily small) and when the distribution is truly bimodal (λ ̸ = µ and p(1 − p) ̸ = 0).The result states that τ opt = 0.This means that the lower the threshold, the better.But, obviously, a task must be launched for having any chance to complete.This result means that each task should be interrupted as soon as possible if it has not yet completed.When the constant δ is not null, however small it is, results are drastically different: depending on the relationships between the parameters (p, λ, µ, and δ), either no task should be interrupted or there is a single optimal cutting threshold (and it is not trivial: 0 < τ opt < +∞).
One may wonder whether Theorem 2 really matters, that is, whether the yield significantly varies with the cutting threshold.Consider two equiprobable modes (p = 0.5) with a constant δ = 0.001, with λ = 1, and µ = 50.If we never interrupt tasks the yield is approximatively 1.96.If we interrupt them with a cutting threshold of 0.01, the yield is 20.35, more than 10 times larger!There are distributions for which using the optimal cutting threshold has a dramatic impact on the performance of the system.

Threshold estimation for unknown distributions
As stated in Section 5, when the distribution of task execution times is known, the optimal policy is a fixed-threshold strategy that interrupts tasks, and the choice of the cutting threshold can have a very significant impact on the system performance.Now the question is: how do we find the optimal cutting threshold when the distribution is unknown?
In order to acquire information on the distribution of task execution times, the unique option is to execute some tasks and record their execution times.We consider the problem of deciding how many tasks to execute in Sections 7 and 8.For the sake of the argument, let us assume for now that we have already launched the execution of several tasks, that some executions have already completed, some are still running, and some were interrupted.For instance, in the toy example presented on Figure 1 we have two processors, four tasks, and we want to take a decision at time 20.One task has executed for 5 seconds, one for 16; two tasks have not yet completed (the tasks in red), having run, respectively, for 15 and 4 seconds so far.How do we estimate the distribution of task execution times based on this data?There are two types of approaches.In the first type, we would try to guess some characteristics of the distribution.For instance, we could claim that "task execution times likely follow an exponential distribution".Then, we would look for the exponential distribution that better fits the data, for instance using a maximum likelihood estimation.If our initial guess was lucky, we should end up with a good result.However, the underlying distribution may be either a lognormal distribution, or a multimodal one, or even not resemble any of the most used probability distributions.Rather than relying on potentially unlucky guesses, we aim at designing a robust approach delivering high quality results regardless of the underlying distribution.Therefore, our approach belongs to the second type of approaches, sometimes called "nonparametric" statistics.We are not going to make any assumption on the underlying distribution.
In Section 6.1, we start from a naive approach that only considers the execution times of tasks that have completed.This approach has the advantage of simplicity.However, as exemplified by the toy example on Figure 1, it can ignore a significant share of the data, and in particular long-running tasks.As discussed in Section 3, the question on how to take into account tasks that have not yet completed has been thoroughly research in the field of medical research.In Section 6.2, we show how to use the Kaplan-Meier estimator to solve our scheduling problem more accurately.

The empirical distribution function
The naive approach only considers the execution times of completed tasks and uses the associated empirical distribution function [35], along with Equation (1).Consider an example where there are k different task execution times, w 1 < w 2 < ... < w k , and where n i tasks have the execution time w i .Then, using the empirical distribution function, a task has an execution time w i with probability p i = ni k j=1 ni .Using these probabilities, we search in the set {w 1 , ..., w k } the value maximizing the yield, using Equation 1.
The main advantage of this approach is its simplicity.The toy example on Figure 1 illustrates its main drawback: there maybe many tasks whose information is ignored, namely the tasks that have not yet completed or that are already interrupted.This drawback induces a bias by ignoring interrupted or long-running tasks.

Using the Kaplan-Meier estimator for survival analysis
This section shows how to use the Kaplan-Meier estimator initially designed for survival analysis with right-censored data [1,20].Consider an example where there are k different task execution times, w 1 < w 2 < ... < w k .Here, execution times can be the execution times of tasks that have completed, like the values 5 and 16 in the example of Figure 1.They can also be censored execution times, like the values 4 and 15 in that example.Let d i be the number of tasks that die at time w i , that is, the number of tasks whose execution time is exactly w i .Let r i be the number of individual at risks just prior to time w i , that is, the number of tasks whose execution time is greater than or equal to w i .The survival function, S(t), is the probability that life is longer than t: S(t) = P r(X > t).The Kaplan-Meier estimator gives us: Using this estimator, we can then rewrite Equation 1 as: where I(t) is the index of the largest task execution time smaller than or equal to t: I(t) = k if t ≥ w k , and w I(t) ≤ t < w I(t)+1 otherwise (with w 0 = 0 and S(w 0 ) = 1).
We illustrate this estimator with the toy example of Figure 1: 15 0 2 1 16 1 1 0 0 The resulting function is presented in red on the left-hand side of Figure 2, alongside the probabilities associated to the empirical distribution function (in blue).Red ticks indicate the presence of censored data.For the empirical distribution function, the probability that the execution time of a task exceeds 5 seconds is 50%, while it is 66.6% for the Kaplan-Meier estimator.When we plug these different probability functions in Equation 1, we obtain the yields depicted on the right-hand side of Figure 2. In this toy example, the empirical distribution function claims that the optimal cutting threshold is 5, when the survival analysis claims that it is 16.
Note that, in the product of Equation ( 4), only the times corresponding to actual (non-censored) execution times matter.Execution times that only correspond to censored times each contribute a value of 1 in the product (see the table above).Note also that if there is no censored data, we have r i−1 −r i = d i−1 and S(t) simplifies into where j is the smallest index such that w j > t.In other words, when there is no censored data, the empirical distribution function and the Kaplan-Meier estimator coincide.We can use the survival function to compute the mean and variance of the execution times.Recall that S(t) = P r(X > t), hence P r(X = w j ) = P r(X ∈ ]w j−1 , w j ]) = S(w j−1 ) − S(w j )) for 1 ≤ j ≤ k (with w 0 = 0 and S(w 0 ) = 1, as stated above).We derive that:

Static algorithm
In Section 6, we have shown how to use available data from task execution times to define the best cutting threshold.In this section, we focus on how and when to acquire the data needed to compute a cutting threshold, possibly many different times as the execution progresses.
In order to acquire information on the distribution of task execution times, the only solution is to execute some tasks and to record their execution times.In this process, we have to make a classical trade-off.On the one hand, we should execute a sufficiently large number of tasks until completion, in order to be sure that the set of observed execution times is indeed representative of the underlying distribution.On the other hand, we should execute as few tasks as possible before making a decision, to avoid wasting a significant share of the budget on running tasks until completion if the optimal threshold is a "short" one.
We start by designing policies that try to guess the good trade-off before launching any task, and we assess their performance through simulation.In Section 8, we refine the approach and present a policy that tries to automatically infer the cutting trade-off.

One-size-fits-all heuristics
The simplest strategies try to guess, without interrupting any task, the "right" trade-off.Consider a strategy that spends 10% of the overall budget running tasks up to completion before computing the optimal threshold: one can still hope to achieve a 90% overall efficiency if the threshold has been accurately evaluated.This is the basis of the first two strategies: 1. pick a priori a percentage p; 2. run tasks on processors until having spent the fraction p × b of the overall budget; 3. compute the cutting threshold either using the empirical distribution function for strategy Empirical, or survival analysis for strategy Survival; 4. then apply the cutting threshold on all tasks until the budget is exhausted.For 2), recall that we enroll b d processors, hence up to rounding artefacts, the fraction pb of the whole budget is spent when the fraction pd of the deadline is reached on each processor.
When the average task execution time is large and the observation budget pb is small, it may happen that no task has completed when the observation budget is exhausted.In such a case, we delay the computation of the threshold until having spent 2pb, and so on if this extended budget is still too small.
There exists an obvious limitation to these first two strategies.First, once a threshold is computed, it is applied until the end.However, in the meantime, new tasks complete and some are interrupted, and we gather more information on the distribution.We should take the new available information into account.We propose to recompute the threshold periodically, each time we have spent another fraction pb of the budget, by accounting for all the available data.This leads to two new strategies PerEmpirical and PerSurvival, which we assess below.

Performance evaluation
This section assesses the performance of the one-size-fits-all heuristics introduced in Section 7.1.The experimental settings are detailed in Section 7.2.1, and results are presented in Section 7.2.2.All heuristics were implemented in R. The corresponding source code, and all the data, are publicly available in [13].

Experimental methodology
The default settings are as follows.The deadline d can take the values 5, 10, 50, and 100.The parallel platform is composed of M = 10 identical processors, each with a unitary cost.As discussed in Section 5.1, recall that a typical configuration enrolls b d processors.We therefore define the budget b as b = M d.Then, the budget b is evenly shared among the processors which all execute tasks until the deadline d.
We use different standard probability distribution functions to generate task execution times, namely uniform, exponential, log-normal, halfnormal, truncated normal (truncated on [0, +∞)), gamma, inverse-gamma, and Weibull distributions.In addition, multimodal distributions have been advocated to model jobs, file and object sizes [10].Therefore, we also consider two types of bimodal distributions, either based on truncated normal distributions or on exponential distributions.For all the bimodal distributions, the two modes are equiprobable.For three of the distributions, we consider two different sets of parameters to illustrate different potential behaviors associated to the same type of distribution.These distributions are log-normal, bimodal exponential, and bimodal truncated normal.To enable a direct comparison between all different distributions, we choose their parameters so that all distributions achieve a mean equal to 1.Following the discussion in Section 5.2 about avoiding arbitrarily small task execution times, we add a constant δ = 0.05 to all randomly generated task execution times.Therefore, for all the distributions under study, execution times will always have an average value of 1.05.The detailed parameters of the distributions are presented in Table 1.Due to space limitation, we will sometimes only report here the performance of 6 of these 13 distributions.The performance of the other distributions can be found in the appendix.
Figure 3 presents, for each distribution under study, the theoretical yield achievable as a function of the cutting threshold.In Figure 3, we have ordered the distributions by non-increasing values of their cutting threshold.One can see that different distributions, or the same distribution with different parameters, lead to different shapes of the yield function.For the first distributions Table 1 Symbol and parameters for the distributions used in the simulations.For all distributions µ is the mean and σ the standard deviation, except for the truncated normal and half-normal distributions where µ and σ are the mean and standard deviation of the original normal distribution.

Symbol
Distribution Parameters double_exp(λ1, λ2) Bimodal exponential λ1 = 1 1.005 ≈ 0.995, λ2 = 1 0.995 ≈ 1.005 λ1 = 1 0.1 = 10, λ2 = 1 1.9 ≈ 0.526 double_truncnorm(µ1, σ1, µ2, σ2) Bimodal truncated normal µ1 = 0.5, σ1 ≈ 0.534, µ2 = 1, σ2 ≈ 1.068 in the figure, tasks should never be interrupted.For the following distributions, tasks should be interrupted, and sometimes quite early.Table 2 reports the optimal cutting threshold for each distribution.This variety of situations makes it challenging to determine a good cutting threshold when the distribution is unknown.In the remainder of this section, in order to ease the comparison of the behavior of the scheduling strategies for the different distributions, all graphs and tables report results with distributions ordered as in Figure 3.
For each simulation setting, we generate 1000 random instances (i.e., sets of task execution times).In addition, we compare the result of the proposed strategies with two reference heuristics.NeverInterrupt is the baseline heuristic which let all tasks run up to completion.Oracle knows in advance the distribution used to generate task execution times and computes the optimal threshold using that knowledge.Oracle is thus an upper bound on the performance of any strategy.Therefore, the closer to Oracle performance, the better the heuristic.

Results
In Figure 4 (and Figure 8 in the appendix), we plot the ratio of the number of tasks successfully executed by each heuristic, over the value achieved by Oracle.Hence, the closer to 1, the better.We plot the performance of each heuristic while varying the percentage p of the budget spent for the observation phase (namely p = 1%, 2.5%, 5%, 10%, 15%, or 20%), and for the four different values of the budget b.
We observe that the performance of the different heuristics is strongly correlated to the shape of the yield functions, as illustrated by Figure 3.In particular, the performance of the heuristics evolves according to our ordering of the distributions.When the optimal threshold is infinite (i.e., for unif(0,2), truncnorm(0.8,0.75), lnorm(1,0.5),hnorm (1.25), double_truncnorm(0.5,0.5,0.53,1,1.07),double_exp(0.5,1,1.01),and exp(1)), NeverInterrupt has the same performance as Oracle.Also, the performance of the other heuristics increases with p.This is easily explained, since the behavior of the heuristics during the observation phase is, by definition, that of NeverInterrupt.Moreover, the longer the observation phase, the higher the probability that the accumulated data will be of good quality and lead to deriving an efficient threshold.
When the optimal threshold is finite (i.e., for invgamma(2.33,1.33),gamma(0.33,3),lnorm (1,3), double_truncnorm(0.5,0.01,0.18,1,1.78),dou-ble_exp(0.5,10,0.53)and weibull(0.41,0.32)),NeverInterrupt performs predictably worse.The lower the optimal threshold, the lower the performance of NeverInterrupt.Also, the larger the budget, the lower the performance of NeverInterrupt, even if the decrease is not always significant.For the other heuristics, the best value for p decreases.This is once again easily explained, because with larger values of p, the observation phase is longer, and thus the budget spent in a suboptimal mode is larger.The graphs are not decreasing from the start because a significant number of tasks must complete  to make a decision close to the optimal one, rather than one that is heavily influenced by the random nature of the very few completion times available.When the budget is large with respect to the average task execution time (e.g., b = 1000), many tasks complete before the end of the observation phase and we can infer a relatively precise threshold.Hence, the four heuristics perform globally well.For instance, when p = 10%, all heuristics achieve a performance that is at least 90% that of Oracle, whatever the distribution.For some distributions, some heuristics achieve a performance of 95% of this theoretical optimal.This is true even for distributions that, theoretically, need to be cut early, such as Weibull.Because we have enough budget to obtain a high-quality threshold after the observation period (which costs 10% of the budget), for the rest of the execution time (90% of the budget), we achieve a performance close to the optimal.Therefore the overall results are very good although we do not interrupt tasks during the observation phase.
When the budget is either b = 500 or b = 1000, PerSurvival achieves the best performance, or a performance equivalent to that of the best of the four heuristics, except for the inverse-gamma distribution.For inverse-gamma, PerSurvival is sometimes very slightly below PerEmpirical for the same percentage p.However, these two heuristics achieve the same peak performance for that distribution.
On the contrary, when the budget is small with respect to the average task execution time (e.g., b = 50), the performance of all heuristics worsens.When b = 50 and p = 10%, each of the 10 processors executes tasks for only 0.5 seconds during the observation phase.Hence, the threshold should be computed after very few tasks are completed, if any.It should therefore not be a surprise that the results are then far from optimal.The best performance is achieved either for the distributions which have a small optimal threshold -and then the performance is rather good whatever the value of p-or when the value of p is large -which compensates from the fact that the budget is small.PerSurvival remains the best heuristic when b = 100; when b = 50 there is no obvious heuristic of choice.
In conclusion, when the budget b is large with respect to the average task execution time, the four basic and periodic heuristics achieve a good performance (at least 90% of the optimal) if we choose carefully the parameter p (e.g., p = 10%).Then, among the four heuristics, PerSurvival achieves the best performance overall and also in most instances.When the budget is small, the performance of the heuristics worsens.The main reason is that for a same value of p, there are no longer enough completed tasks to make a relevant decision for the threshold.When the budget is small, p should be large if tasks should never be interrupted and p should be small if tasks must be interrupted quickly.
Obviously, before running any task, we do not know what the distribution of task execution times will be, what the cutting-threshold will be and, hence, how to adequately chose the value of p.In the next section, we will present the AutoPerSurvival policy which aims at mitigating this problem.

Dynamic algorithm
For the heuristics presented in Section 7: when we compute the threshold, we have no idea to what extent the accumulated data is representative of the actual distribution of execution times.Hence, we have no idea of the quality of the resulting threshold.To remedy this, in this section, we propose to automatically infer when to stop observing the distribution and to compute the threshold.As for the experiments, we compare the new heuristic with Per-Survival, the best static heuristic of Section 7, and we observe a dramatic performance improvement in the robustness of the heuristic.

Automatic inference heuristics
We do not want to compute the threshold before ascertaining that the data acquired on the distribution of task execution times is "good enough".However, we do not want to spend the whole budget trying to acquire information.Hence, we decide to rely on two parameters fixed a priori: a percentage p max of the overall budget and a precision ϵ.The precision ϵ will guarantee that we have a good enough approximation of the data distribution because the mean value and standard deviation of the empirical distribution function have converged (up to the precision ϵ).In addition, the percentage p max will be a large value guaranteeing that in extreme cases, we will take a decision eventually, before exhausting the budget.We compute the threshold as soon as one of the two following conditions is met: either observing convergence of the empirical distribution function, or having spent a fraction p max b of the overall budget.In practice, each time a task completes, we recompute the mean value and standard deviation of the distribution.If both new values have a relative difference less than ϵ from previous ones, we assume the approximation of the distribution to have converged.
Once we have computed a cutting threshold, say after having spent a budget qb, we recompute it periodically each time we have spent max{0.01,q}b of the budget.We add the max for the cases where the budget is very large and the convergence very fast, in order to keep the number of decisions constant (as stated in Section 4).Obviously the new strategy can be implemented for both the empirical distribution function and the survival analysis.However, because of the superiority of the survival analysis (shown in Section 7.2.2),we implement it only for survival analysis, leading to the new strategy AutoPerSurvival.

Performance evaluation
In this section, we assess the performance of the automatic inference heuristics introduced previously.The experimental methodology is the same as in Section 7.2.1.We compare first AutoPerSurvival with PerSurvival heuristic.And then, we show the stability of AutoPerSurvival while varying different parameters.Finally, we summarize all simulation results.

AutoPerSurvival vs. PerSurvival
In Figure 5 (and Figure 9 in the appendix), we compare the performance of AutoPerSurvival for different values of p max (namely p max = 10%, 20%, 30%, 40%, or 50%) when varying ϵ (namely, ϵ= 0.0010, 0.0025, 0.0050, 0.0100, 0.0250, 0.0500, and 0.1000).We added the performance of PerSurvival using different values for p as a reference.In each figure we plot the ratio of the number of tasks successfully executed by each heuristic, over the value achieved by Oracle.
In all graphs, we observe that the performance of AutoPerSurvival is influenced by the interplay of the two parameters ϵ and p max .When the value of ϵ is very small, we need a very large (in expectation) number of launched tasks to meet the ϵ criteria.This, in turn, will require to spend a large amount of budget for the observation phase.If ϵ is sufficiently small, on most instances this requirement will exceed the upper limit set by p max on the budget spent during the observation phase.Hence, if ϵ is sufficiently small, the behavior  of AutoPerSurvival is only dictated by the value of p max .For instance, when b = 50, this is the case for most of the distributions when ϵ ≤ 0.0025.However, when the value of ϵ gets larger, convergence is reached sooner.Then an approximation of the data distribution deemed "good enough" (with respect to ϵ) is obtained before spending a share p max of the budget.In that case, p max does not play any role, and only ϵ has an influence on the observation period, and thus on the performance.For instance, when b = 100, this is the case for weibull(0.411,0.324)when ϵ ≥ 0.0250.However, for the uniform distribution when b = 100, note that p max = 10% still plays a role when ϵ = 0.1000 which explains why AutoPerSurvival(10%, 0.1000) has a performance lower than that of the other AutoPerSurvival variants.
Furthermore, we see that the evolution of the performance depends upon the optimal cutting threshold.When the optimal cutting threshold is infinite, the smaller ϵ, the better the performance.Indeed, during the observation period, the optimal NeverInterrupt strategy is implemented, and, later on, the cutting-threshold strategy is applied.This is particularly true when we have enough time before convergence (large values of p max and of b).In such a case, there is no performance penalty in having a large observation period during which tasks are not interrupted.In contrast, for distributions with a short optimal cutting threshold, small ϵ values (and longer observation periods) waste more budget without interrupting tasks, and the performance decreases when ϵ decreases.
Globally, when the budget and deadline are large enough, AutoPerSurvival (when ϵ ≤ 0.0050) performs similarly to PerSurvival, and they both have a good performance (around 90% in nearly all cases).In this case, all p max values perform equally well.However, when the budget and deadline decrease, we already know that PerSurvival performs worse, and we observe that the performance of AutoPerSurvival is strongly correlated to the value of p max and ϵ.Among the parameters tested, AutoPerSurvival(40%, 0.005) is a good choice, because it can successfully execute more than 77% of the tasks of the optimal heuristic Oracle, regardless of the distribution and the budget (deadline) values.In other words, using AutoPerSurvival(40%, 0.005) will always lead to good results, contrarily to all one-size-fits-all heuristics.

Stability of performance while varying µ, σ, and M
In Figure 6 (and Figure 10 in the appendix) we assess the performance of the different heuristics under a log-normal distribution of task execution times when b = 50 (and b = 1000 for Figure 10 in the appendix) for different values of the average task execution time (µ), of the standard deviation (σ,) and of the number of processors in the platform (M ).We use a log-normal distribution because it has been advocated to model file sizes [10], and thus task costs can also be assumed to follow this distribution.For the heuristics, we choose the parameters which achieved the best performance in the previous simulations: AutoPerSurvival is used with the parameters p max = 40% and ϵ = 0.005.For the four one-size-fits-all strategies, we use the same value to define the observation phase: p = 10%.
When the budget is big enough (b = 1000), all heuristics perform similarly and close to the optimal in all configurations.AutoPerSurvival may perform slightly better than the four other heuristics in most of the cases but the differences are minimal.
Figure 6 presents the more interesting case of a small budget b = 50 with respect to the average task execution time.The first row of subgraphs show the influence of the average task execution time, µ, on the performance of heuristics.Remark that for b = 50, p = 10%, and M = 10, the observation phase for one-size-fits-all heuristics only lasts for 0.5 second, during which one expects that very few processors will be able to complete a task.This gets even more true when µ increases, and explains that the performance of the heuristics is decreasing.Nevertheless, the performance of AutoPerSurvival decreases more slowly than that of the other heuristics.For instance, when µ = 3, the four one-size-fits-all heuristics already achieve a rather bad performance while The third row of subgraphs show that varying the number of processors has no significant impact on the performance of the heuristics: all scenarios achieve near optimal performance.Overall, AutoPerSurvival(40%, 0.005) is a very robust heuristic, which overcomes the other heuristics in all settings, and which, in the most adverse scenarios, exhibits a graceful degradation of performance with respect to the theoretical optimal.

Summary
To summarize our findings, we present two tables showing the number of tasks successfully executed by each heuristic for each distribution expressed as a fraction of the optimal performance (of Oracle).We present results for a large budget (Table 4 in the appendix, b = 1000 and d = 100) and a small one (Table 3, b = 50 and d = 5) with respect to the average task execution time (µ = 1).We use the same parameters as previously: ϵ = 0.005, p max = 40%, and p = 10%.
With large values of budget and deadline, all heuristics perform well.Indeed, with the chosen parameters, all heuristics achieve at least 88% of the performance of the optimal.Among the one-size-fits-all heuristics, Per-Survival performs best and is the most robust, but the difference between these heuristics is not always significant.On average, the performance of AutoPerSurvival and PerSurvival are pretty similar.
Table 3 presents the result when budget and deadline are small.In this case all one-size-fits-all heuristics achieve very low performance, below 40% for each of them (for the lnorm(1,0.5)distribution for example).On the contrary, AutoPerSurvival always achieves good to very good performance: its worse case is 77% of the optimal.Once again, this shows the great robustness of AutoPerSurvival(40%, 0.005).

Conclusion
In this work, we have studied the problem of maximizing the number of stochastic independent tasks successfully executed on a parallel platform under deadline and budget constraints.When task execution times obey a probability distribution that is known before execution, previous results showed that longrunning tasks must be interrupted at some optimal cutting threshold τ , and provided techniques to determine its value.Some probability distributions call for a very short threshold τ while others have a large or infinite one.The main challenge in this study is that the probability distribution of task execution times is unknown to the scheduler.We designed a set of scheduling heuristics to estimate the cutting threshold τ , some of which making use of the Kaplan-Meier estimator.We also assessed different decision mechanisms to compute and refine the threshold as the execution progresses.On the practical side, extensive simulations show that our best heuristic AutoPerSurvival(40%, 0.005) achieves good performance for a wide spectrum of probability distributions and parameter sets.In the worst scenario, it can execute 77% of tasks that an omniscient oracle (knowing the distribution) would be able to complete.
Future work will be dedicated to considering heterogeneous processors, still under the assumption that the distribution of task execution times is unknown on the different processors.Indeed, some cloud providers provide different categories of processors with different computer power and nominal cost, and execution times are not directly proportional to cost nor power.This heterogeneity will dramatically complicate the selection of a good processor subset, and the estimation of the cutting threshold for each of them.To conclude, we have several cases to consider: 1. δ = 0 (arbitrarily small execution times are allowed).Then, k(0) = 0 and lim l→+∞ k(l) ≤ 0. We have two subcases to consider: (a) p(1 − p)(µ − λ) = 0.In this case, we have a monomodal exponential distribution.Then, y ′ (l) is null, Y(l) = y(l) is constant and equal to λ.In other words, in this case the yield is optimal whatever the value chosen for the threshold.(b) p(1 − p)(µ − λ) ̸ = 0.In this case, Y(l) = y(l) is a decreasing function and its maximum is achieved when l = 0, and the optimum yield is then lim l→0 Y(l).To compute this limit we use the equivalent to e −x in 0 which is 1 − x.We obtain: Remark: this counter-intuitive result means that the shortest the threshold, the better.It means that, in practice, the scheduler should stop each task as soon as it is started.Obviously, this is not achievable in practice.This peculiar property is a consequence of allowing execution times to be arbitrarily small, as the remainder of this case study will illustrate.2. δ > 0. Once again, we have two subcases to consider: • p(δλ − (1 − p) µ−λ µ ) ≥ 0. Then k(l), and thus g(l) and y ′ (l) are nonnegative for all values of l. y and Y are thus increasing and we should never abort the execution of a running task (threshold = +∞).The optimum yield in this case is then: • p(δλ − (1 − p) µ−λ µ ) < 0 which can be rewritten δ < (1 − p) µ−λ λµ .In this subcase, we have y ′ (l) which is a decreasing function, with y ′ (0) > 0 and lim l→+∞ k(l) < 0 which implies that y ′ (l) and Y ′ (l) are negative when l is sufficiently large.Therefore, Y(l + δ) is first increasing and then decreasing, and has a unique maximum.This maximum is achieved for l satisfying k(l) = 0, which could only be solved numerically.

Additional graphs and statistics
We report here the theoretical threshold and the performance for the distributions that were not illustrated in the core of the article.We also report the performance of heuristics when the budget is large with respect to the average task execution time (b = 1000).

Fig. 1
Fig.1Toy example with two processors, two successfully completed tasks (in blue) and two not-yet-completed tasks (in red) at time 20.

Fig. 2
Fig. 2 Probability of survival (left) and yield (right) for the toy example of Figure 1 when using the empirical distribution function (blue) or the Kaplan-Meier estimator (red).

Fig. 3
Fig.3Theoretical yield when varying cutting threshold for each distribution.
used for training Ratio to Oracle of number of tasks successfully executed

Fig. 4
Fig. 4 Ratio to Oracle of number of tasks successfully executed using different heuristics when varying p for each distribution.

Fig. 5
Fig. 5 Ratio to Oracle of number of tasks successfully executed using either AutoPer-Survival (for different values of pmax and when varying ϵ) or PerSurvival (with different values of p).

Fig. 7 Fig. 8
Fig. 7 Theoretical yield when varying cutting threshold for each distribution.

Table 2
Optimal cutting threshold for each distribution

Table 3
Ratio to Oracle of number of tasks successfully executed for each heuristic and each distribution with µ = 1, b = 50 and d = 5