Towards Autonomous Robot Navigation in Human Populated Environments Using an Universal SFM and Parametrized MPC

Autonomous mobile robot navigation in a human populated and encumbered environment is recognized as a hard problem to be solved in real-time. Most of the time, robots face the so-called ‘Freezing Robot Problem’, that occurs when the robot stops because no feasible and safe motion can be found. In order to provide to the robot the capability of proactive navigation, in this work we generalize the classical Social Force Model into a Universal Social Force Model (USFM) that attributes to any object surrounding the robot (humans, robots, obstacles) a social behavior. Nonlinear Model Predictive Control (MPC) can be used to solve the autonomous navigation problem since it can take into account all the possible constraints coming from the interaction model between the robot and the different surrounding objects. However, to be effective, MPC requires a sufficiently large prediction horizon, which generally implies a high computational cost. In order to considerably reduce the computational cost, we propose a new control parametrisation based on Thin Plate Spline Radial Basis Functions that allow us to have a large prediction horizon with fewer parameters. The global control framework is validated in simulation with virtual pedestrians, and in real world environments.


I. INTRODUCTION
The utilization of robots in human populated environment is becoming increasingly effective and significant.Some applications include floor cleaning (such as in supermarkets, subway stations, etc.), package transportation and delivery, and orientation guidance (such as in historical centers, museums, fairs, etc.).To achieve autonomous navigation in such environments, it is necessary to estimate the behavior of actors (including humans, moving obstacles, and other robots), compute real-time control while ensuring safety, and potentially perform proactive navigation (i.e., compute actions that are compatible and acceptable to humans while avoiding the Freezing Robot Problem [1]).Understanding human behavior is likely the most complex issue that must be addressed.Firstly, understanding Human to Human interactions is essential.In [2], the author presents a categorization of the surrounding space into different types (Intimate, Personal, Social, Public) from which proxemics were defined for Human to Human interactions (H2H).In [3], the Social Force Model (SFM) is described.Pedestrian motion is modeled as a result of the application of a set of forces called "social forces", which are applied to the pedestrian from different sources.Several versions of this model have been developed, including the Extended Social Force Model (ESFM) [4], the Headed Social Force Model {philippe.martinet,ezio.malis}@inria.fr(HSFM) [5], and the Collision Prediction Extension of Social Force Model (CPESFM) [6].All these extensions consider interactions with Humans (H2H), Obstacles (H2O), Robots (H2R), and, more generally, everything (H2X).Based on the H2X Social Force model, the SPACiSS simulator [7] was developed and recently used to simulate pedestrian behavior in the presence of an autonomous vehicle operating in a shared and open space in a proactive manner [8].
The accumulated knowledge of H2X interaction modeling can be beneficial to the modeling of Robot to everything (R2X) interaction.However, predicting human intentions in particular environments remains challenging.For example, in [9], the authors propose a Vehicle to Human (V2H) interaction model where the pedestrian's cooperation with the vehicle is estimated online.In this work, we propose a further extension of the SFM by defining the Universal Social Force Model (USFM) that can be used by the robot controller for autonomous navigation.
Computing real-time control while ensuring safety over a fixed time horizon is a challenging task.One approach to address this challenge is Model Predictive Control (MPC), which involves computing optimal control while anticipating future events within a time horizon.However, state-of-the-art works often fail to mention the computational efficiency of MPC in real-time applications.A major drawback of using an MPC controller is that it requires considerable time and computational resources to perform an online optimization problem at each time step, which limits its application in realworld scenarios [10].Control parametrization is an effective solution [11], which significantly reduces the number of control variables in the optimization problem while minimizing performance loss.Recent works have applied control parametrization on real-time simple systems employing linear parameterizations [12] and B-spline [13], and on realtime dynamic vehicles [10].In this work, we propose a new control input parametrization that outperforms previous ones and allows for an efficient real-time implementation on a real robot in complex scenarios.The remainder of this paper is organized as follows.Section II introduces the Universal Social Force Model used in the description of the interaction with the environment.Section III presents the proposed non-linear MPC control framework including the parametrization of the control Input.Section IV describes the evaluation methodology and the comparison results obtained in different simulated and real world scenarios.The last section recalls the main contributions of the work and presents the future perspectives.

II. UNIVERSAL SOCIAL FORCE MODEL
The objective of this section is to introduce a suitable Universal Social Force Model for describing interactions between humans and their dynamic environment, denoted as H2X interactions.To enable the robot to behave in a humanlike manner, the H2X interaction model can be extended for use in R2X interactions, thereby making the interaction more sociable.Various approaches have been proposed to model R2H and R2O interactions between robots and humans, and between robots and obstacles.For instance, the reactive navigation of a robot treating pedestrians as static obstacles is described in [14].This is because in the Social Force Model (SFM) used, humans are not aware of the robot.In contrast, [15] proposes a solution based on a modified version of the SFM that takes into account the interaction force between pedestrians and the robot.This knowledge enables the robot to navigate through crowds using various algorithms.However, most of these works are developed for open environments, and the forces generated by obstacles and other robots are not well managed in indoor environments.To obtain a model capable of accurately describing the dynamic environment and pedestrian behavior in different scenarios, the human model proposed in [4] is considered.In particular the forces that influence the pedestrian i are: where f 0 i (v i ) is the driving force that attracts the pedestrian towards the goal, f ji (x i , x j ) is the repulsive interaction force with other pedestrian j, f oi (x i ) is the repulsive force generated by obstacles and f ai (x i , x a ) is the attraction force generated by points of interest.Most existing works define forces based on the center point of an object, making it unsuitable for long walls.Moreover, critical forces such as the Along Wall Force and the Social Slip Force are missing, which are essential for accurately simulating human behavior in crowded environments.To address these limitations, the proposed Universal Social Force Model (USFM) considers the forces affecting a pedestrian from nearby obstacles based on the closest points rather than the center point of the nearest obstacle.This approach results in smoother and more realistic navigation.The force is inversely proportional to the distance between the pedestrian and the obstacle and is activated when the distance is less than a threshold d o .
To prevent local minima and enable pedestrians to bypass obstacles, an Along Wall Force is added to the model, as illustrated in Figure 1.This force is computed as follows: where n o is the unit directional vector of the wall, dir i is the direction of the pedestrian and A o is the strength of the force.This force is activated when the distance between the pedestrian and the wall is less than a threshold d AW o .To encourage more collaborative behavior among pedestrians during frontal interactions, the Social Slip Force is introduced in the USFM.This force can be seen as a magnetic field that causes pedestrians to "slip" along each other, as illustrated in Fig. 1.The strength of this force is also based on the distance between pedestrians and is calculated as follows: where t ji is the unit directional vector of the new force composed by v iy ; v ix , which are the directional velocities of the pedestrian, and A S,ij is the strength of the force.This force is activated when the interaction angle between the direction of the two pedestrians is θ ij ∈ α, β and the distance between them is less than a threshold d S ij .This force also allows faster pedestrians to surpass slower ones, promoting a smoother movement.In addition to the previous described forces, the proposed Universal Social Force Model (USFM) incorporates a pedestrian's perception zone, which defines the region in which the pedestrian can perceive obstacles and other pedestrians.This zone is represented as an ellipse and the interaction forces are only computed if the obstacles or other pedestrians are within the pedestrian's perception zone.By utilizing the USFM, the robot can estimate the position and velocity of pedestrians in the environment.This information can be utilized by the Model Predictive Control (MPC) to predict future trajectories of the agents in the environment.The MPC can also manage R2O interactions between the robot and obstacles as constraints in the optimization problem.Since the robot is behaving like a pedestrian, generating socially acceptable trajectories, the USFM can manage R2R interactions between the robot and other robots as R2H interactions.

Fig. 1: Along Wall Force and Social Slip Force
The final forces that influence the pedestrian behavior in the USFM are:

FRAMEWORK
In Model Predictive Control, several types of mathematical formulations can be used to find a solution to the optimization problem [12].In order to achieve real-time feasibility, direct methods are widely used.With this method the original continuous system is modeled as a discrete-time system; given the discretized state vector x k ∈ R n and the discretized control input u k ∈ R m at the k-th time step, a transition function f d returns the state vector that should be reached at the next time step.
Finally, the Euler's forward integration is used to approximate the state vector x k+1 by a first-order Taylor expansion: where ∆t is the time sampling step.The general formulation of the nonlinear control problem is usually in the form of a quadratic cost function.The Model Predictive Control optimization problem can be stated by the following: where x = x 0 ; ...; x Np is the sequence of the states, x 0 is assumed known a priori, u = u 0 ; ...; u Np−1 is the sequence of control inputs, x * is the desired state vector and N p is a constant that replaces the prediction horizon T p with the following relation T p = ∆tN p .The constraints (7c) and (7d) can be added in order to enforce explicit constraints to the state or the control input.The constraint (7b) can be enforced in two different ways depending on the chosen decision variables [12].
A. Parameterization of the control input The effectiveness of the Model Predictive Control (MPC) is influenced by the length of the prediction horizon.A longer horizon can consider more optimal trajectories, but the complexity of the problem should be kept manageable.To extend the prediction horizon, one can use a longer sampling time instead of increasing the number of decision variables.However, some systems may require a shorter sampling time for stability in closed [16].Consequently, there is a tradeoff between increasing the prediction horizon and keeping the number of decision variables manageable.Control input parametrization can be a useful approach to addressing this trade-off [12].It aims to minimize performance loss while reducing the number of control variables.With control input parametrization, it is possible to decouple the prediction horizon (T p ) from the control horizon (T c ) allowing the system to have a large prediction horizon.Control input parametrization generates a sequence of control inputs from a set of parameters, which can be either pre-selected constants or tuned online during the optimization process.The approach uses a set of free control inputs as decision variables to define the entire control sequence.
Thanks to this method it is possible to reduce the number of optimization variables and of constraints, making the problem solvable even with limited computational resources.Different works at the state-of-the-art have already highlighted the advantages of using parametrizations in Model Predictive Control.We consider two distinct parameterization techniques, namely linear parameterization [12] and cubic Bspline parameterization [13], to approximate control inputs in our system.In [12] various linear parameterizations were investigated.The linear parameterization establishes a direct linear relationship between the parameters η and the corresponding control inputs u, which is expressed as follows: where η ∈ R Nη are the free control inputs and Π ∈ R m×Nη is the combination matrix.On the other hand [13] explored the implementation and characteristics of the cubic B-spline parameterization, which is expressed as follows: where B i,3 (t) are the cubic B-spline basis functions.Reducing the control horizon N c to be smaller than the prediction horizon N P is a common strategy in control system.The first N c control inputs are freely allocated and optimized online, while the remaining N P − N c control inputs are predetermined, often holding the same value as u Nc−1 [17].This approach reduces the optimization problem's complexity but may lead to instability due to prolonged periods of constant control input.To address stability concerns, alternative strategies known as Move Blocking have been proposed.One such strategy is the Zero Order Hold (ZOH) method, where the control input is held constant between consecutive control points [12].Another technique is the linear interpolation, called the LERP method, which spreads the "degree of freedom" uniformly along the prediction horizon [12].Both Move Blocking methods decrease online computation time by reducing complexity, thus promoting system convergence and maintaining solution accuracy and control performance.However, an even more flexible approach that provides smoother control trajectories is the cubic B-spline parameterization [13].The cubic B-spline utilizes higher-order polynomial interpolation, allowing for continuous and smooth control input changes between points.We propose a novel control parametrization, which is smoother and more compact compared to the LERP parameterization.Additionally, it operates in a single dimension, making it less complex than the B-spline approach.The objective of this parametrization is to reduce the computational time required by the optimization to solve the problem.The Radial Basis Function (RBF) is an excellent technique for interpolating data and functions.A Radial Basis Function depends on the Euclidean distance between given points x.
Approximation with RBF is grid-free and easy to implement [18].Considering a set of points x 1 , x 2 , ..., x N , the Radial Basis Function centered at x j has the following form: where ||x − x j || is the Euclidean norm.
The Radial Basis Function approximation of the control input u is defined as: where N b is the number of free control inputs, ω j is the weight of the thin plate spline function computed to satisfy some boundary conditions, c is the set of corresponding time instants of the free control inputs and t the time sampled from 0 to the prediction horizon N P .The thin plate spline function φ(r) is defined as: One of the main advantages of Thin Plate Spline RBF is the flexibility of the space, making the implementation of this method much easier than other functions [19].This type of parametrization has been already used in the literature to parametrize surfaces [20], and has been shown to provide stable results even in presence of noisy information [21].
We propose to use parametrized MPC in a more complex constrained environment than the one proposed in [12] and [13].The problem is formalized using the single-shooting method, where the Optimal Control Problem (OCP) is rewritten into a smaller optimization problem since only the control inputs are considered as decision variables.Thanks to the USFM, the MPC is able to extract the current positions and velocities of the pedestrians, and it is able to predict their future trajectories (x p k and y p k ).The OCP of the parametrized MPC used to navigate in human populated environment is the following: The control input used in the OCP is approximated with the control parametrizations previously presented.

IV. EXPERIMENTS
In this section, we focus on assessing the effectiveness of the proposed parametrization and validating the global nonlinear model predictive control (MPC) framework outlined in section III.To simulate the kinematic evolution of the mobile platform, which is defined as f d (), we use a nonholonomic model (a classical [2, 0] robot).We conducted experiments in two main environments at INRIA, using both a virtual representation and the real site in each case.The first environment, shown in Figure 3 (left), is a workshop with offices, corridors, and open area for robot experimentation.The second environment, shown in Figure 3 (right), is a conference hall.We used virtual pedestrians modeled with the USFM model and defined waypoints to create different frontal, lateral, and back interactions with the mobile robot.

A. Evaluation Methodology
Defining the evaluation of the concept of efficiency, natural and social behavior in shared spaces poses a challenge as there are no fixed rules that limit the possible use cases.Prior studies have assessed autonomous navigation based on three primary metrics: collision rate, success rate, and time to goal [22].However, recent works propose to include time efficiency and boundary violation as main metrics [23].It is also possible to evaluate the motion safety of the robot in proximity to pedestrians and the quality of autonomous vehicle motion by considering the traveled distance, selected trajectory, and traveled time, as suggested by [24].Navigation can be evaluated using two main categories: motion safety and trajectory quality.
a) Motion Safety: Motion safety in proximity to pedestrians is evaluated by analyzing the collision rate (CR), which measures the collisions occurring between the robot and pedestrians during navigation.A collision occurs when there is an overlap between the footprints of the robot and the pedestrian.A near-zero collision rate is required to validate autonomous navigation.
b) Trajectory Quality: Trajectory quality is another critical factor used to evaluate autonomous navigation and can be divided into two categories: energy and efficiency.Energy can be defined using two different metrics: Path Energy (E P ) and Dynamic Energy (E T ).Path Energy is based on the robot's maneuvers along the trajectory and measures the smoothness of the path.Energy increases when there are several maneuvers in the path, while a zero energy (E P = 0) occurs when the path is straight.Dynamic Energy estimates trajectory energy based on speed, measuring the average instantaneous variation of the speed from the preferred driving speed.Energy increases when instantaneous velocity variation occurs.Efficiency can generally be expressed in terms of time or distance.Traveled distance (C L ) from the starting pose of the robot to the goal pose is an important metric for Trajectory Quality.Another important factor in efficiency is the Relative Time To Goal (T T G ), which evaluates the traveled time to reach the goal at the average time.

B. Control input parametrization comparison
The OCP utilized in this study is the one presented in equation ( 14).The optimization solver employed is CasADi [25], a symbolic framework for algorithmic differentiation and non-linear optimization that is available as an opensource software package.To solve the optimization problem, CasADi will utilize the Interior Point Optimizer (IPOPT), an open-source software package designed for large-scale non-linear optimization.To evaluate the performance of the parametrized Model Predictive Controller (MPC), it was tested in a constrained environment that included both static and dynamic obstacles.The simulation and experimental parameters used in the study are described in Table I.In particular, the pedestrian values were tuned to produce smooth trajectories and interactions in a cluttered environment.TABLE I: Parameters used in the simulations The dynamic agents in the simulation are modeled using the presented USFM model, and they follow specific trajectories to increase the complexity of the simulation.Three different types of control input approximations are considered, while the number of free control inputs N η is fixed at 10, and the number of prediction points N p and sampling time ∆t are varied.During the simulation process, we implemented and tested the cubic B-spline parameterization.However, it was not able to accurately approximate the control input function, rendering the robot unable to navigate among pedestrians in complex environments.It proved inadequate in accurately modeling the control input function due to the limited number of free control points, especially when N p exceeded the number of free control points N p .Table II and Table III show the simulation results obtained with ∆t fixed at 0.1 s and 0.2 s, respectively.The Thin Plate Spline RBF always provides a faster and smoother solution with lower energy consumption than the LERP approximation, as indicated by the Time metric, which represents the average time required by the MPC to solve the OCP, and the other performance metrics used.Moreover, when N P ≫ N η , the LERP approximation leads to high values of Path Energy E P and Dynamic Energy E T , indicating that the robot's navigation is not smooth and it must stop when the environment becomes unsafe with no feasible solutions, leading to the Freezing Robot Problem.In contrast, the Thin Plate Spline RBF consistently produces satisfactory results even when N P is ten times greater than N η .The time required to solve the OCP is always small enough to allow the robot to update the position of the dynamic obstacles in the environment.The Thin Plate Spline RBF outperforms the current parameterizations found in state-of-the-art in the simulations of autonomous navigation in human-populated environments.Note that, in real-world experiments, it was not feasible to compare the two parameterizations as navigation with the LERP approximation was computationally impractical due to the high complexity of the environment.

C. Real world experiments
Real experiments were conducted in both environments using virtual pedestrians simulated on ROS in their respective digital twins During the experiments, the SCOUT MINI AgileX robot, controlled as a classical [2, 0] robot with nonholonomic kinematic constraint, utilized ROS to interface with the scenarios and detect virtual pedestrians as they entered its perception view.By employing the Thin Plate Spline RBF approximation, the OCP's dimension was significantly reduced, resulting in the model's real-time feasibility in the dynamic environment.In these experiments, the prediction horizon time steps were N p = 24, with N b = 9 free control variables.The robot model's sampling time was ∆t = 0.4 s, resulting in a prediction horizon of T p = 9.6 s.Several virtual pedestrians were considered in both environments, with noncollaborative agents included in the second environment, standing in front of vending machines and sofas.Twenty simulations per environment were carried out to extract critical parameters for evaluating autonomous navigation.One simulation was conducted without pedestrians to understand the parameter values.The proposed Model Predictive Control was successful in avoiding the Freezing Robot Problem by collaborating with other pedestrians.As shown in Table IV and Table V, the Collision Rate was always 0. During the optimal solution without pedestrians, the robot followed a straight line since no interactions occurred during the navigation, resulting in a low Path Energy value.Due to the presence of several agents in the environment, the robot had to turn several times to enable a safe and secure navigation, resulting in a higher Path Energy value.The mean value of the Dynamic Energy E T approached zero, indicating that the model actively collaborated with other pedestrians without decreasing its velocity.Despite several interactions, the robot was always near the optimal path during the experiments, as evidenced by the Relative Traveled Distance being close to the optimal value obtained in the empty environment.Regarding the Relative Time to Goal, the robot took longer to reach the final goal during the experiments, as it had to generate socially acceptable trajectories to ensure secure and safe navigation.This work discusses parameterized Model Predictive Control (MPC) for autonomous navigation in a human-populated environment, specifically addressing the Freezing Robot Problem.The Universal Social Force Model (USFM) was introduced to describe pedestrian behavior, and the MPC was optimized for real-time feasibility and quick convergence during autonomous navigation.Comparisons were made between different parameterizations in simulations and real-world experiments, demonstrating the superiority of the proposed approach in constrained environments.The robot successfully avoided the Freezing Robot Problem and interacted safely with pedestrians in indoor settings.We plan to continue our work, improving the MPC by also optimizing the trajectories provided by the USFM with an optimization problem to enable proactive navigation.

TABLE II :
Comparison between LERP and Thin Plate Spline RBF in a dynamic environment with ∆t = 0.1 s

TABLE III :
Comparison between LERP and Thin PlateSpline RBF in a dynamic environment with ∆t = 0.2 s

TABLE IV :
Experiments in workshop scenario

TABLE V :
Experiments in conference hall scenario V. CONCLUSIONS