Multi-Agent Planning

Source:
[1] L. Quan et al., ‘Robust and efficient trajectory planning for formation flight in dense environments’, IEEE Trans. Robot., vol. 39, no. 6, pp. 4785–4804, Dec. 2023, doi: 10.1109/TRO.2023.3301295.
[2] L. Quan, L. Yin, C. Xu, and F. Gao, “Distributed swarm trajectory optimization for formation flight in dense environments,” in Proc. Int. Conf. Robot. Autom., 2022, pp. 4979–4985.
[3] Z. Wang, X. Zhou, C. Xu, and F. Gao, "Geometrically constrained trajectory optimization for multicopters", IEEE Trans. Robot., vol. 38, no. 5, pp. 3259–3278, Oct. 2022, doi: 10.1109/TRO.2022.3160022.
[4] Teo K L, Rehbock V, Jennings L S. "A new computational algorithm for functional inequality constrained optimization problems"[J]. Automatica, 1993, 29(3): 789-792.

Notation

Sign	Meaning
$A$	Adjacency matrix of the formation graph
$a_{m}$	Maximum acceleration
$c_{i}$	Coefficients of the $i^{th}$ piece
$D$	Degree matrix of the formation graph
$d_{o}$	Safety threshold
$d_{r}$	Safe clearance between each robot
$des$	Sign for desired trajectory
$f_{s} (\cdot)$	Differentiable formation similarity error metric: $(5)$
$H (\cdot)$	Cost function
$H (\cdot)$	Continuous-time constraint function: similarity in group formation, dynamic feasibility, obstacle avoidance, and reciprocal avoidance of swarms
$J (\cdot)$	2nd-order continuous cost function
$J_{s}$	Formation similarity error
$J_{u}$	Uniform distribution cost
$L$	Laplacian matrix of the formation graph
$\hat{L}$	Symmetric normalized Laplacian matrix
$M_{c}$	Number of sample points with corresponding time stamps
$M (\cdot)$	Parameter mapping from $(q, T)$ to $c$
$m$	Dimension of the robot
$N$	Number of robots
$P$	Cost: subscript $f$ : swarm formation similarity; $e$ : control effort; $t$ : total time; $o$ : obstacle avoidance; $r$ : swarm reciprocal avoidance; $d$ : dynamic feasibility
$p_{i, k}$	The $k^{th}$ sample point of the $i^{th}$ robot trajectory
$p_{i, k}^{*}$	The optimal $k^{th}$ sample point of the $i^{th}$ robot trajectory
$p (t)$	Trajectory, piecewise polynomial
$p^{(s - 1)} (t)$	The $(s - 1)^{th}$ derivative of the trajectory, i.e., the control input
$p^{[s - 1]} (t)$	$(p (t)^{⊤}, \dot{p} (t)^{⊤}, \dots, p^{(s - 1)} (t)^{⊤})^{⊤} \in R^{m s}$
$p_{i} (t)$	The $i^{th}$ piece of the trajectory
${\bar{p}}_{0}$	Initial state
${\bar{p}}_{f}$	Final state
${\tilde{p}}_{i, j}$	The $j^{th}$ constraint point of the $i^{th}$ piece
$Q$	Degree of the polynomial, here $Q = 2 s - 1 = 5$
$q$	Intermediate waypoints
$s$	Integrator order, if controlling jerk, $s = 3$
$T$	Time allocation for each piece
$T_{Σ}$	Total time of the trajectory
$T_{MINCO}$	The class of trajectory parameterization based on piecewise polynomials aiming at minimizing control effort
$U$	Squared distance vector: $(\| {\hat{p}}_{i, 1}^{} - {\hat{p}}_{i, 0}^{} \|_{2}^{2}, \dots, \| {\hat{p}}_{i, M_{c}}^{} - {\hat{p}}_{i, M_{c} - 1}^{} \|_{2}^{2}) \in R^{M_{c}}$
$v_{m}$	Maximum velocity
$w_{i}$	The weight vector of the robot $i$ , $R^{N}$ , essentially the weights of its $N$ adjacent edges
$β (t)$	Polynomial basis vector: $β (t) = [1, t, \dots, t^{N}]^{⊤}$
$δ$	Sampling time interval
$λ_{s}, λ_{u}$	Weights for the formation similarity and uniform distribution costs
$κ_{i}$	Number of constraint points for the $i^{th}$ piece
$Φ$	All other robots in the swarm
$ϕ$	Element in $Φ$
$ρ$	Time regularization weight

1 Adaptive description of swarm formation (Sec. IV in [1])

1.1 Graph-based Formation Definition

In this article, a swarm formation of $N$ robots is modeled by an undirected graph $G = (V, E)$ , where $V := {1, 2, . . ., N}$ is the set of vertices, and $E \subset V \times V$ is the set of edges. In graph $G$ , the vertex $i$ represents the $i^{t h}$ robot with position vector $p_{i} = [x_{i}, y_{i}, z_{i}] \in R^{3}$ . An edge $e_{i j} \in E$ that connects vertex $i \in V$ and vertex $j \in V$ means that robot $i$ and $j$ can measure the geometric distance between each other.

In our work, each robot can obtain the positions of all robots ${p_{1}, . . ., p_{i}, . . ., p_{N}}$ , thus the graph $G$ is complete. Then we determine the adjacency matrix $A \in R^{N \times N}$ and degree matrix $D \in R^{N \times N}$ of the formation graph $G$ by

\begin{aligned} (1) & A_{i j} = w_{i j} =∥ p_{i} - p_{j} ∥^{2}, \\ (2) & D_{i j} = {\begin{cases} \sum_{j = 1}^{N} A_{i j}, & if i = j, \\ 0, & otherwise, \end{cases} \end{aligned}

where the non-negative edge weight $w_{i j}$ is the squared distance between the $i^{t h}$ and $j^{t h}$ robots, and $∥ \cdot ∥$ denotes the Euclidean norm. Thus, the corresponding Laplacian matrix is

\begin{matrix} (3) & L = D - A . \end{matrix}

With the above matrices, the symmetric normalized Laplacian matrix of graph $G$ is defined as

\begin{matrix} (4) & \hat{L} = D^{- 1 / 2} L D^{- 1 / 2} = I - D^{- 1 / 2} A D^{- 1 / 2}, \end{matrix}

where $I \in R^{N \times N}$ is the identity matrix. $\hat{L}$ contains the information that is invariant to scale, translation, and rotation.

(Explained by ChatGPT 5.5) The distance measurement is invariant to translation and rotation. For uniform scaling:
$p_{i}^{'} = s p_{i}, s > 0 \to w_{i j}^{'} = s^{2} w_{i j} \to A^{'} = s^{2} A, D^{'} = s^{2} D \to L^{'} = s^{2} L \to {\hat{L}}^{'} constant$
A more precise statement is that $\hat{L}$ is a shape descriptor invariant under similarity transformations (rotation + translation + uniform scaling). Since distances are also unchanged by reflection, it is actually invariant under the full Euclidean similarity group, including mirror reflections.
Other properties:
Symmetric and positive semidefinite
Eigenvalues lie in a bounded interval: $0 = λ_{1} \leq λ_{2} \leq . . . \leq λ_{N} \leq 2$
The multiplicity of the zero eigenvalue corresponds to the number of connected components in the graph. Since the grph is complete: $mult (0) = 1 \to λ_{1} = 0, λ_{2} > 0$
The largest eigenvalue $λ_{N}$ is related to the maximum degree of the graph and can indicate the presence of highly connected nodes (hubs)
Known eigenvector for $λ = 0$ is the constant vector $1$ , which reflects the uniform distribution of vertices in the graph
The second smallest eigenvalue (normalized algebraic connectivity) indicates how well connected the graph is: larger values suggest stronger connectivity and robustness to node removal, while smaller values indicate a more fragile structure
The eigenvalues and eigenvectors of $\hat{L}$ can be used for spectral clustering, community detection, and dimensionality reduction in graph-based machine learning tasks
The eigenvectors (Laplacian eigenmaps) can be used for clustering and dimensionality reduction
Spectral invariance: the spectrum of $\hat{L}$ is a compact shape descriptor. Many formation-recognition methods use only the eigenvalues because eigenvalues are independent of robot numbering up to permutation.
Invariance to robot relabeling (permutation): $\hat{L}$ is invariant under any permutation of the robot indices, which means that the formation similarity metric based on $\hat{L}$ does not depend on how we label the robots.
The trace of $\hat{L}$ is equal to the number of vertices $N$ (since the diagonal entries are all 1)
The Frobenius norm $∥ \hat{L} ∥_{F}$ is related to the total edge weight and can be used as a measure of graph connectivity
The normalized Laplacian can be interpreted as a discrete analog of the continuous Laplace-Beltrami operator on a manifold, which is useful for analyzing the geometry of the formation
The normalized Laplacian can be used to define a diffusion process on the graph, which can model how information or influence spreads through the swarm formation
the spectrum alone does not always uniquely determine the formation. Different formations can occasionally be cospectral. However, in practice, the spectrum of $\hat{L}$ is often sufficient to distinguish between different formation shapes, especially when combined with other features or constraints.
The normalized Laplacian is particularly useful for comparing formations of different sizes, as it normalizes the influence of the number of robots and their connectivity, allowing for a more meaningful comparison of formation shapes.

Finally, we use graph theory to describe various desired formation shapes, such as squares, hexagons, and pyramids.

By specifying the desired positions $p_{i}^{d} = [x_{i}^{d}, y_{i}^{d}, z_{i}^{d}] \in R^{3}, i = 1, . . ., N$ , computing ${\hat{L}}_{des}$ is simple.

It's important to note that the desired formation shape is independent of the coordinate system as long as the relative positions are provided.

1.2 Differentiable Formation Similarity Error Metric

To assess the deviation from the desired formation, we propose a differentiable formation similarity error metric as

\begin{aligned} f_{s} & = f_{s} (p_{1}, . . ., p_{i}, . . ., p_{N}) = f_{s} (A, D) = f_{s} (\hat{L}, {\hat{L}}_{des}) \\ (5) & =∥ \hat{L} - {\hat{L}}_{des} ∥_{F}^{2} = tr {(\hat{L} - {\hat{L}}_{des})^{⊤} (\hat{L} - {\hat{L}}_{des})}, \end{aligned}

where $tr {\cdot}$ denotes the trace of a matrix, $\hat{L}$ is the symmetric normalized Laplacian of the current swarm formation, ${\hat{L}}_{des}$ is the counterpart of the desired formation. Frobenius norm $∥ \cdot ∥_{F}$ is used in our distance metric.

As a graph representation matrix, $\hat{L}$ contains information about the graph structure. This allows $f_{s}$ to consider only the geometric shape of the formation, and not be influenced by scaling, translation, or rotation. Additionally, $f_{s}$ is a dimensionless value that solely reflects the error in formation shape similarity.

In particular, under the distributed framework, each robot can only change its positions to reduce the overall formation similarity error. Therefore, the only variable for robot $i$ in $(5)$ is $p_{i}$ , and $f_{s} (p_{1}, . . ., p_{i}, . . ., p_{N})$ can be simplified as $f_{s} (p_{i})$ .

Our metric is analytically differentiable with respect to the position of each robot. For robot $i$ , we use the weights of its $N$ adjacent edges ${e_{i 1}, e_{i 2}, . . .,, e_{i N}}$ to form a weight vector $w_{i} = [w_{i 1}, w_{i 2}, . . .,, w_{i N}]^{⊤}$ . By the chain rule, the gradient of $f_{s}$ with respect to $p_{i}$ can be written as

\begin{matrix} (6) & \frac{\partial f_{s}}{\partial p_{i}} = \frac{\partial f_{s}}{\partial w_{i}^{⊤}} \frac{\partial w_{i}}{\partial p_{i}} . \end{matrix}

According to our metric $(5)$ , the gradient of $f_{s}$ with respect to each weight $w_{i j}$ can be computed as follow

\begin{aligned} \frac{\partial f_{s}}{\partial w_{i j}} & = tr {{(\frac{\partial f_{s}}{\partial \hat{L}})}^{⊤} (\frac{\partial \hat{L}}{\partial w_{i j}})}, \\ \frac{\partial f_{s}}{\partial \hat{L}} & = \frac{\partial | | \hat{L} - {\hat{L}}_{des} | |_{F}^{2}}{\partial \hat{L}} = 2 (\hat{L} - {\hat{L}}_{des}), \\ (7) & \frac{\partial \hat{L}}{\partial w_{i j}} & = - \frac{\partial (D^{- 1 / 2} A D^{- 1 / 2})}{\partial w_{i j}} . \end{aligned}

Then the gradient $\partial f_{s} / \partial w_{i}$ can be written as

\begin{matrix} (8) & \partial f_{s} / \partial w_{i} = [\partial f_{s} / \partial w_{i 1}, \partial f_{s} / \partial w_{i 2}, . . ., \partial f_{s} / \partial w_{i N}]^{⊤} . \end{matrix}

As for $\partial w_{i} / \partial p_{i}$ , the Jacobian can be easily derived since the weight function $(1)$ is a differentiable quadratic form.

1.3 Optimal Formation Position Sequence

TODO: Previous work Previous work [2] incorporated $f_{s}$ directly into the trajectory optimization, making formation flight a coupled trajectory optimization problem. While this method is suitable for small-scale formation flight, it becomes computationally inefficient as the number $N$ of robots increases.

Considering the simplified equation for coupled trajectory optimization

\begin{matrix} (9) & min_{p_{i, 0}, . . ., p_{i, M_{c}}} \sum_{k = 0}^{M_{c}} f_{s} (p_{i, k}) + J_{other}, \end{matrix}

where $p_{i, k}$ represent the $k^{t h}$ sample point of $i^{t h}$ robot trajectory in $(19)$ for convenience. $J_{other}$ represents all other cost functions, and $M_{c}$ is the number of sample points with corresponding timestamps.

The primary purpose of calculating $f_{s}$ is to supply gradient information for minimizing formation similarity error. However, since the graph $G$ is a complete graph, computing $f_{s}$ has a complexity of $O (N^{2})$ .

Consequently, the coupled trajectory optimization $(9)$ also exhibits high complexity of $O (N^{2})$ in each iteration, limiting its applicability to large-scale swarm operations.

To address this issue, we must identify an equivalent approach with reduced computational complexity to replace the function of $f_{s}$ in $(9)$ . We introduce the concept of optimal formation position $p_{i, k}^{*}$ for robot $i$ at timestamp $k$ , which is the position that minimizes the formation similarity error $f_{s}$ . Fig.1(a) illustrates this concept using a 2D formation as an example.

fig-10-1 — Figure 1[1]: Illustration of optimal formation position sequence using a 2D formation. (a) The surface shows the profile of the similarity metric when one UAV moves in the plane and the other three remain still. The minimum suggests the optimal formation position to form the desired shape. (b) The sequence of optimal formation positions corresponds to the timestamps.

It is evident from the figure that there exists an optimal formation position $p_{i, k}^{*}$ that results in a minimal formation similarity error, and the partial derivative is $\partial f_{s} / \partial p_{i, k}^{*} = 0$ .

In the future period with a sequence of timestamps ${0, . . ., k, . . ., M_{c}}$ , we represent the expected positions of robot $i$ with the optimal formation position sequence $p_{i}^{*} = {p_{i, 0}^{*}, \dots, p_{i, k}^{*}, \dots, p_{i, M_{c}}^{*}}$ , as shown in Fig. 1(b). By precomputing $p_{i}^{*}$ , we can utilize its quadratic distance to replace the gradient information offered by $f_{s}$ in $(11)$ , thus decreasing the computational requirements as follows

\begin{matrix} (10) & f_{s} (p_{i, k}) \Rightarrow ∥ p_{i, k} - p_{i, k}^{*} ∥^{2} . \end{matrix}

Since the optimal solutions of $f_{s}$ and quadratic distance cost are equivalent, the trajectory approaches the positions with minimal formation similarity error, maintaining the desired formation.

Thus, we can effectively solve the coupled trajectory optimization with a two-step procedure

\begin{aligned} 1. p_{i}^{*} = \arg min \sum_{k = 0}^{M_{c}} f_{s} (p_{i, k}), \\ (11) & \overset{p_{i}^{*}}{\Rightarrow} & 2. min_{p_{i, 0}, . . ., p_{i, M_{c}}} ∥ p_{i, k} - p_{i, k}^{*} ∥^{2} + J_{other} . \end{aligned}

As a result, the previously required calculation of $f_{s}$ in each trajectory optimization process is replaced by the computation of the quadratic distance, simplifying the optimization problem. This significantly reduces computational demands and enables large-scale swarm formation.

Formula $(11)$ indicates that trajectory optimization in the next section is performed on discretized points. Non-uniform discretized points may lead to poor trajectories and sub-optimal performance. Therefore it is crucial to ensure a uniform distribution of these points to maintain the effectiveness of the optimization process.

In engineering practice, since graphs $G$ are constructed from a series of discretized timestamps as depicted in Fig. 1(b), each $p_{i, k}^{*}$ is independent.

To ensure a smoother trajectory, we introduce the uniform optimal formation position sequence ${\hat{p}}_{i}^{*}$ , which is generated by considering the formation similarity error $J_{s}$ and the uniform distribution cost $J_{u}$

\begin{aligned} (12) & {\hat{p}}_{i}^{*} & = \arg min λ_{s} J_{s} + λ_{u} J_{u}, \\ J_{s} & = \sum_{k = 0}^{M_{c}} f_{s} ({\hat{p}}_{i, k}^{*}), \\ (13) & J_{u} & = E (U^{2}) - E (U)^{2} = \frac{∥ U ∥_{2}^{2}}{M_{c}} - \frac{∥ U ∥_{1}^{2}}{(M_{c})^{2}}, \end{aligned}

where $λ_{s}$ and $λ_{u}$ are the relative weights. $E (\cdot)$ is mathematic expectation and the squared distance vector $U \in R^{M_{c}}$ is

\begin{matrix} (14) & U = (∥ {\hat{p}}_{i, 1}^{*} - {\hat{p}}_{i, 0}^{*} ∥_{2}^{2}, \dots, ∥ {\hat{p}}_{i, M_{c}}^{*} - {\hat{p}}_{i, M_{c} - 1}^{*} ∥_{2}^{2}) . \end{matrix}

We use the quasi-Newton method to solve this unconstrained optimization problem $(12)$ and generate uniform ${\hat{p}}_{i}^{*}$ for the later trajectory optimization $(18)$ .

By doing so, the trajectory resulting from these discretized points in the next section can be smoother and avoid sudden spatial changes.

2 Spatial-temporal trajectory optimization for formation flight

2.1 Trajectory Representation

The differential flatness of multicopters benefits trajectory generation without integrating differential equations. Moreover, the motion planning of multicopters can be performed on low-dimensional smooth trajectories.

Here we adopt a state-of-the-art trajectory representation named MINCO to achieve minimum control effort spatial-temporal trajectory planning for swarm aerial robots in three-dimensional environments. MINCO conducts spatial-temporal deformation of the flat-output $M$ -piece trajectory $p (t)$ by decoupling the space and time parameters with a linear-complexity mapping $M$

\begin{matrix} (15) & p (t) = M_{q, T} (t), \forall t \in [t_{0}, t_{M}], \end{matrix}

where $q = (q_{1}, \dots, q_{M - 1})^{⊤} \in R^{3 \times (M - 1)}$ are the adjacent intermediate points between each pair of connected pieces and $T = (T_{1}, \dots, T_{M})^{⊤} \in R_{> 0}^{M}$ the time duration of each piece.

A $m$ -dimensional $M$ -piece trajectory $p (t)$ is represented by piecewise polynomials. And $i^{t h}$ piece $p_{i} (t)$ is defined as a multi-degree polynomial ( $Q = 5$ in this paper)

\begin{matrix} (16) & p_{i} (t) = c_{i}^{⊤} β (t), \forall t \in [0, T_{i}], \end{matrix}

where $c_{i} \in R^{(Q + 1) \times m}$ is the coefficient matrix and $β (t) = [t^{0}, t^{1}, \dots, t^{Q}]^{⊤}$ is the natural basis.

For an $s$ -integrator ( $s = 3$ here) chain dynamics system, a $M$ -piece $2 s - 1$ degree trajectory $p (t)$ is defined by constant boundaries and minimum control effort ${q, T}$ .

Furthermore, MINCO is advanced in convert ${q, T}$ to ${c, T}$ using a linear-time and space parameter mapping $c = M (q, T)$ , where $c = (c_{1}^{⊤}, \dots, c_{M}^{⊤})^{⊤}$ is polynomial coefficients.

2.2 Problem Formulation

After determining the desired formation shape in the last section, we expect a cluster of trajectories for swarm robots, which are smooth, collision-free, and formation maintained.

In practice, navigating swarm robots in an unknown dense environment with FOV-limited sensors and onboard computer requires an efficient real-time planner focusing on local information. Besides, centralized optimization is limited by the scale of the swarm.

Therefore, we choose a distributed local trajectory optimization for formation flight as follows

\begin{aligned} (17a) & min_{q, T} & \int_{t_{0}}^{t_{M}} ∥ p^{(s)} (t) ∥^{2} d t + ρ \cdot T_{Σ}, \\ (17b) & s.t. & p (t) = M_{q, T} (t), \forall t \in [t_{0}, t_{M}], \\ (17c) & p^{[s - 1]} (0) = {\bar{p}}_{0}, \\ (17d) & p^{[s - 1]} (t_{M}) = {\bar{p}}_{f}, \\ (17e) & H (p (t), . . ., p^{(s)} (t)) ⪯ 0, \forall t \in [t_{0}, t_{M}] . \end{aligned}

We define costs $(17 a)$ for smoothness and aggressiveness to achieve smooth and efficient flight. $ρ$ is time regularization parameter, $T_{Σ} = \sum_{i = 1}^{M} T_{i}$ . The state of robot $p (t)$ $(17 b)$ is parameterized by the optimization variables ${q, T}$ . $p^{[s - 1]} (t) = (p (t)^{⊤}, \dot{p} (t)^{⊤}, . . ., p^{(s - 1)} (t)^{⊤})^{⊤} \in R^{m s}$ represents the higher-order derivatives of a chain dynamic system with $s$ -integrator. Boundary conditions involve initial state ${\bar{p}}_{0} \in R^{m s}$ $(17 c)$ and terminal state ${\bar{p}}_{f} \in R^{m s}$ $(17 d)$ . Continuous-time constraints $H$ $(17 e)$ include swarm formation similarity, dynamic feasibility, obstacle avoidance, and swarm reciprocal avoidance.

2.3 Constraints Transcription

To solve the continuous constrained optimization problem $(17)$ in real-time, we use the optimization variable of MINCO $(15)$ to eliminate all kinds of equality constraints $(17 b)$ - $(17 d)$ (See MINCO for details). And penalty function method is used to deal with the inequality constraints $(17 e)$ . Then, every integral is evaluated by a finite sum of sample points.

Penalty function method in [4] (Concluded by ChatGPT 5.5) A functional inequality constrained optimization problem has the form
$\begin{aligned} min_{x} f (x), \\ s.t. g (x,, t) ⩽ 0, \forall t \in [0, T], \end{aligned}$
where $f (x)$ is the cost function, $g (x, t)$ is the constraint function, and $t$ is a continuous variable. The penalty function method transforms this problem into an unconstrained optimization problem by introducing a penalty term that penalizes any violation of the constraints. The transformed problem can be expressed as
$min_{x} f (x) + r \int_{0}^{T} max (0, g (x, t))^{2} d t,$
where $r$ is a penalty parameter that controls the weight of the penalty term. The key observation is that $max (0, g (x, t))^{2}$ is zero when the constraint is satisfied (i.e., $g (x, t) ⩽ 0$ ) and positive when the constraint is violated (i.e., $g (x, t) > 0$ ).

Finally, the continuous constrained optimization problem is converted to a discrete unconstrained optimization problem

\begin{matrix} (18) & min_{q, T} \sum_{⋆} λ_{⋆} {\tilde{J}}_{⋆} (q, T, δ), \end{matrix}

where ${\tilde{J}}_{⋆}$ are various terms of cost function or penalties, and $λ_{⋆}$ are relative weights. Subscripts $⋆ = {f, e, t, o, r, d}$ :

( $f$ ) swarm formation similarity;
( $e$ ) denote control effort;
( $t$ ) total time;
( $o$ ) obstacle avoidance;
( $r$ ) swarm reciprocal avoidance;
( $d$ ) dynamic feasibility.

$δ$ is the sampling time interval.

In the previous work[2], we used the fixed number sampling points ${\hat{p}}_{i, k} = p_{i} ((k / κ_{i}) \cdot T_{i})$ to transform the optimization problem, where $p_{i} (t)$ is the $i^{t h}$ piece trajectory and $κ_{i}$ is the fixed sample number on this piece.

However, considering that the total time $T_{Σ}$ changes during the optimization process, the fixed number sampling points ${\hat{p}}_{i, k}$ are difficult to space on the whole trajectory equally.

Therefore, we take fixed-time-interval sampling points for the whole trajectory to ensure the accuracy of the penalty function sampling transformation

\begin{aligned} {\tilde{p}}_{k} (t) = p_{i} (k δ - \sum_{l = 1}^{i - 1} T_{l}), \\ (19) & k \in {0, \dots, κ}, κ = ⌊ \frac{T_{Σ}}{δ} ⌋, \end{aligned}

where $κ$ is the sample number and $T_{l}$ is the preceding time for any $1 ⩽ l < i$ .

For the trajectory planning of swarm robots, the fixed time interval sampling points ${\tilde{p}}_{k} (t)$ can simplify the optimization problem. Compared with ${\hat{p}}_{i, k}$ , the timestamp corresponding to ${\tilde{p}}_{k} (t)$ is fixed, so the states of other robots at this timestamp are also constant during the optimization process.

Therefore, it is feasible to calculate the states of other robots w.r.t ${\tilde{p}}_{k} (t)$ according to the broadcast trajectories before optimization.

Then we can solve the uniform formation position sequence optimization $(12)$ in advance and use ${\hat{p}}_{i}^{*}$ to replace the formation similarity metric $f_{s}$ in trajectory optimization $(17 a)$ of $i^{t h}$ robot. This decoupled formation trajectory optimization results in higher computational efficiency, making our method suitable for large-scale swarm robots.

Despite the optimization problem is not differentiable when sampling number $κ$ changes, the cost function remains continuous w.r.t. time duration $T$ . We use the quasi-Newton method to solve the non-smooth discrete unconstrained optimization problem $(18)$ .

2.4 Cost Functions and Gradients

Given the fixed sampling time interval $δ$ , we can evaluate the cost functions and gradients of the whole trajectory by a finite sum of sampling points ${\tilde{p}}_{k} (t)$ .

The cost of various general purpose penalties at $k^{t h}$ sampling points is

\begin{matrix} (20) & P_{⋆} (c, T, k δ) = P_{⋆} ({\tilde{p}}_{k} (t)), \end{matrix}

then the cost function ${\tilde{J}}_{⋆}$ in $(18)$ is calculated as follows

\begin{aligned} {\tilde{J}}_{⋆} (q, T, δ) = & J_{⋆} (c, T, δ) \\ = & δ \sum_{k = 0}^{κ} {\bar{ω}}_{k} P_{⋆} (c, T, k δ) + \\ (21) & + \frac{1}{2} (T_{Σ} - κ δ) [P_{⋆} (c, T, κ δ) + P_{⋆} (c, T, T_{Σ})], \end{aligned}

where $({\bar{ω}}_{0}, {\bar{ω}}_{1}, \dots, {\bar{ω}}_{κ - 1}, {\bar{ω}}_{κ}) = (1 / 2, 1, \dots, 1, 1 / 2)$ are the orthogonal coefficients following the trapezoidal rule.

And MINCO allows any second-order continuous cost function ${\tilde{J}}_{⋆} (q, T)$ to be represented by $J_{⋆} (c, T)$ . Hence, $\partial {\tilde{J}}_{⋆} / \partial q$ and $\partial {\tilde{J}}_{⋆} / \partial T$ can be efficiently obtained from $\partial J_{⋆} / \partial c$ and $\partial J_{⋆} / \partial T$ respectively, which is benefit to the construction and solution of the optimization problem.

In $(19)$ , the sampling time $t = k δ - \sum_{l = 1}^{i - 1} T_{l}$ is related to the preceding time $T_{l}$ , so the gradient of $J_{⋆}$ w.r.t $c_{i}$ and $T_{l}$ are computed as

\begin{aligned} (22) & \frac{\partial J_{⋆}}{\partial c_{i}} & = \frac{\partial J_{⋆}}{\partial P_{⋆}} \frac{\partial P_{⋆}}{\partial {\tilde{p}}_{k} (t)} \frac{\partial {\tilde{p}}_{k} (t)}{\partial c_{i}}, \\ (23) & \frac{\partial J_{⋆}}{\partial T_{l}} & = \frac{\partial J_{⋆}}{\partial P_{⋆}} \frac{\partial P_{⋆}}{\partial {\tilde{p}}_{k} (t)} \frac{\partial {\tilde{p}}_{k} (t)}{\partial t} \frac{\partial t}{\partial T_{l}}, \\ (24) & \frac{\partial {\tilde{p}}_{k} (t)}{\partial c_{i}} & = β (t), \frac{\partial {\tilde{p}}_{k} (t)}{\partial t} = {\dot{\tilde{p}}}_{k} (t), \frac{\partial t}{\partial T_{l}} = {\begin{cases} 0, & l = i, \\ - 1, & l < i, \end{cases} \end{aligned}

where the calculation of $\partial J_{⋆} / \partial P_{⋆}$ is simple and the details of $P_{⋆} ({\tilde{p}}_{k} (t))$ for various general purpose are given as follow.

Cost of Swarm Formation Similarity $P_{f}$ In Sec. 1.3, we decouple the formation similarity error metric from trajectory optimization by constructing an unconstrained optimization problem to calculate the uniform optimal formation position sequence ${\hat{p}}_{i}^{*}$ for each sampling point. This improvement avoids multiple calculations of formation similarity metric $f_{s}$ . Then, we use the quadratic form to calculate the cost of swarm formation similarity

\begin{matrix} (25) & P_{f} ({\tilde{p}}_{k} (t)) = max {∥ {\tilde{p}}_{k} (t) - {\hat{p}}_{i, k}^{*} ∥^{2}, 0}^{3} . \end{matrix}

Control Effort $J_{e}$ The $s^{t h}$ ( $s = 3$ here) control input for the trajectory and its gradients are written as

\begin{aligned} (26) & J_{e} & = \sum_{i = 1}^{M} \int_{0}^{T_{i}} ∥ p_{i}^{(s)} (t) ∥^{2} d t, \\ (27) & \frac{\partial J_{e}}{\partial c_{i}} & = 2 (\int_{0}^{T_{i}} β^{(s)} (t) β^{(s)} (t)^{⊤} d t) c_{i}, \\ (28) & \frac{\partial J_{e}}{\partial T_{i}} & = c_{i}^{⊤} β^{(s)} (T_{i}) β^{(s)} (T_{i})^{⊤} c_{i} . \end{aligned}

Total Time $J_{t}$ In order to ensure the aggressiveness of the trajectory, we minimize the total time $J_{t} = \sum_{i = 1}^{M} T_{i}$ . The gradients are given by $\partial J_{t} / \partial c = 0$ and $\partial J_{t} / \partial T = 1$ .
Cost of Obstacle Avoidance $P_{o}$ Obstacle avoidance penalty $J_{o}$ is computed using Euclidean Signed Distance Field (ESDF). We penalize the sampling points which are too close to the obstacles

\begin{aligned} (29) & P_{o} ({\tilde{p}}_{k} (t)) & = max {ψ_{o} ({\tilde{p}}_{k} (t)), 0}^{3}, \\ (30) & ψ_{o} ({\tilde{p}}_{k} (t)) & = d_{o} - d_{o} ({\tilde{p}}_{k} (t)), \end{aligned}

where $d_{o}$ is the safety threshold set according to the actual situation and $d_{o} ({\tilde{p}}_{k} (t))$ is the distance between ${\tilde{p}}_{k} (t)$ and the closest obstacle around it. The gradient of $P_{o}$ w.r.t ${\tilde{p}}_{k} (t)$ is

\begin{matrix} (31) & \frac{\partial P_{o}}{\partial {\tilde{p}}_{k} (t)} = - \nabla d^{⊤}, \end{matrix}

where the $\nabla d$ is the gradient of ESDF in ${\tilde{p}}_{k} (t)$ .

Cost of Swarm Reciprocal Avoidance $P_{r}$ We penalize ${\tilde{p}}_{k} (t)$ when it is too close to the trajectories $p_{ϕ} (t), ϕ \in Φ$ at the fixed timestamp $t = k δ$ , where $Φ$ represents the all other robots in the swarm. Compared to previous work[2], the state of other robots with fixed timestamp $p_{ϕ} (k δ)$ are constant during the optimization process and do not produce a gradient w.r.t $T$ for the cost function $J_{r}$ . So the optimization problem and the gradients are simplified. The cost of swarm reciprocal avoidance is defined as

\begin{array}{r} (32) & P_{r} ({\tilde{p}}_{k} (t)) = \sum_{Φ} max {ψ_{r} ({\tilde{p}}_{k} (t), p_{ϕ} (k δ)), 0}^{3}, \\ (33) & ψ_{r} ({\tilde{p}}_{k} (t), p_{ϕ} (k δ)) = d_{r}^{2} - ∥ {\tilde{p}}_{k} (t) - p_{ϕ} (k δ) ∥^{2}, \end{array}

where $d_{r}$ is the safe clearance between each robot. And the gradient of $P_{r}$ w.r.t ${\tilde{p}}_{k} (t)$ is

\begin{matrix} (34) & \frac{\partial P_{r}}{\partial {\tilde{p}}_{k} (t)} = - 2 ({\tilde{p}}_{k} (t) - p_{ϕ} (k δ))^{⊤} . \end{matrix}

Cost of Dynamic feasibility $P_{d}$ We limit the maximum value of velocity and acceleration to guarantee that the robots can execute the trajectory.

\begin{aligned} P_{d} ({\tilde{p}}_{k} (t)) & = P_{d, v} ({\tilde{p}}_{k} (t)) + P_{d, a} ({\tilde{p}}_{k} (t)), \\ P_{d, v} ({\tilde{p}}_{k} (t)) & = max {∥ {\dot{\tilde{p}}}_{k} (t) ∥^{2} - v_{m}^{2}, 0}^{3}, \\ (35) & P_{d, a} ({\tilde{p}}_{k} (t)) & = max {∥ {\ddot{\tilde{p}}}_{k} (t) ∥^{2} - a_{m}^{2}, 0}^{3}, \end{aligned}

where $v_{m}$ and $a_{m}$ are the maximum velocity and acceleration.

2.5 Discussion on solution quality of trajectory optimization

The proposed trajectory optimization process $(17)$ aims to solve a challenging multi-stage Linear Quadratic Minimum Time (LQMT) problem, which is inherently non-convex and non-linear. Additionally, incorporating ESDF for obstacle avoidance introduces further non-convex constraints. As a result, guaranteeing the global optimal solution with the quasi-Newton method is not always possible.

To address concerns regarding local minima and infeasible solutions, we have implemented measures that prioritize safety and dynamic feasibility while maintaining high-performance formation flight.

Firstly, we utilize hybrid-A searching algorithm* to generate initial trajectories that are collision-free and dynamically feasible, ensuring a valid final solution trajectory.
During optimization, we give greater weight to obstacle avoidance and dynamic constraints to prioritize safety and feasibility.
Additionally, we conduct collision checks on trajectories to enhance safety.
Moreover, our distributed swarm optimization framework effectively mitigates the impact of local minima on overall formation performance.

Implementing these measures, our method reliably achieves robust formation flight while maintaining computational efficiency.

9 Conclusion

Procedure (two-step decoupled formation trajectory optimization)

Inputs:

Desired formation (relative positions) ${p_{i}^{d}}_{i = 1}^{N}$ ;
Initial/terminal states $({\bar{p}}_{0}, {\bar{p}}_{f})$ ;
Obstacle ESDF $d_{o} (\cdot)$ ;
Safety clearances $(d_{o}, d_{r})$ ;
Dynamic limits $(v_{m}, a_{m})$ ;
Weights $λ_{⋆}$ ;
Sampling interval $δ$ .

Describe the desired formation with a graph.

Build the (complete) formation graph $G$ , compute $A$ , $D$ , and normalized Laplacian $\hat{L}$ via $(1)$ – $(4)$ .
From ${p_{i}^{d}}$ compute ${\hat{L}}_{des}$ (codes).

cpp

bool SwarmGraph::setDesiredForm()

Define formation similarity and its gradients.

Use the differentiable similarity metric $f_{s} (\hat{L}, {\hat{L}}_{des})$ in $(5)$ and its gradients $(6)$ – $(8)$ to measure formation-shape error (codes).

cpp

bool PolyTrajOptimizer::swarmGraphGradCostP()

Precompute the (uniform) optimal formation position sequence.

For each robot $i$ and each discrete timestamp $k \in {0, \dots, M_{c}}$ , solve the formation-only optimization to obtain the optimal formation positions $p_{i, k}^{*}$ (Sec. 1.3).
Replace the $O (N^{2})$ formation similarity term by a quadratic distance surrogate $(10)$ , yielding the decoupled two-step pipeline $(11)$ .
Add the uniformity regularizer and solve $(12)$ – $(14)$ to obtain the uniform sequence ${\hat{p}}_{i, k}^{*}$ .

Initialize a feasible trajectory for each robot.

Generate an initial collision-free, dynamically feasible trajectory (e.g., hybrid-A*) to warm start the optimizer (Sec. 2.5). Codes: computeInitReferenceState $\to$ astarWithMinTraj $\to$ astarSearchAndGetSimplePath $\to$ AstarSearch

Parameterize each trajectory using MINCO.

Represent the piecewise polynomial trajectory with variables $(q, T)$ using $(15)$ – $(16)$ : (Codes: Variables & Grad in costFunctionCallback).
Formulate the distributed local optimization objective and constraints as $(17)$ .

Transcribe continuous constraints with fixed-interval sampling.

Sample the full trajectory using the fixed time interval $δ$ to obtain ${\tilde{p}}_{j} (t)$ as in $(19)$ , so timestamps remain constant while $T$ changes.
For reciprocal avoidance, query other robots’ broadcast trajectories at the fixed timestamps $t = k δ$ .

Assemble the discrete unconstrained optimization objective.

Convert $(17)$ into the sampled objective $(18)$ using trapezoidal integration $(21)$ .
Use the following penalty/cost terms at samples:
- Formation similarity via distance to ${\hat{p}}_{i, k}^{*}$ $(25)$ .
- Control effort and time terms $(26)$ – $(28)$ .
- Obstacle avoidance using ESDF $(29)$ – $(31)$ .
- Reciprocal avoidance $(32)$ – $(34)$ .
- Dynamic feasibility $(35)$ .

Optimize per robot with quasi-Newton and analytic gradients.

Compute gradients w.r.t. coefficients and durations using the MINCO mapping and chain rules $(22)$ – $(24)$ .
Solve $(18)$ with a quasi-Newton method until convergence; run collision checks and keep high weights on obstacle/dynamic penalties for safety (Sec. 2.5).

Execute and iterate.

Broadcast the optimized trajectory to neighbors and execute; repeat the above steps in a distributed replanning loop as new environment/neighbor information arrives.

Multi-Agent Planning ​

Notation ​

1 Adaptive description of swarm formation (Sec. IV in [1]) ​

1.1 Graph-based Formation Definition ​

1.2 Differentiable Formation Similarity Error Metric ​

1.3 Optimal Formation Position Sequence ​

2 Spatial-temporal trajectory optimization for formation flight ​

2.1 Trajectory Representation ​

2.2 Problem Formulation ​

2.3 Constraints Transcription ​

2.4 Cost Functions and Gradients ​

2.5 Discussion on solution quality of trajectory optimization ​

9 Conclusion ​

Procedure (two-step decoupled formation trajectory optimization) ​