Skip to content

Multi-Agent Planning

Source:

[1] L. Quan et al., ‘Robust and efficient trajectory planning for formation flight in dense environments’, IEEE Trans. Robot., vol. 39, no. 6, pp. 4785–4804, Dec. 2023, doi: 10.1109/TRO.2023.3301295.

[2] L. Quan, L. Yin, C. Xu, and F. Gao, “Distributed swarm trajectory optimization for formation flight in dense environments,” in Proc. Int. Conf. Robot. Autom., 2022, pp. 4979–4985.

[3] Z. Wang, X. Zhou, C. Xu, and F. Gao, "Geometrically constrained trajectory optimization for multicopters", IEEE Trans. Robot., vol. 38, no. 5, pp. 3259–3278, Oct. 2022, doi: 10.1109/TRO.2022.3160022.

Notation

SignMeaning
AAdjacency matrix of the formation graph
amMaximum acceleration
ciCoefficients of the ith piece
DDegree matrix of the formation graph
doSafety threshold
drSafe clearance between each robot
desSign for desired trajectory
fs()Differentiable formation similarity error metric: (5)
H()Cost function
H()Continuous-time constraint function: similarity in group formation, dynamic feasibility, obstacle avoidance, and reciprocal avoidance of swarms
J()2nd-order continuous cost function
JsFormation similarity error
JuUniform distribution cost
LLaplacian matrix of the formation graph
L^Symmetric normalized Laplacian matrix
McNumber of sample points with corresponding time stamps
M()Parameter mapping from (q,T) to c
mDimension of the robot
NNumber of robots
PCost: subscript f: swarm formation similarity; e: control effort; t: total time; o: obstacle avoidance; r: swarm reciprocal avoidance; d: dynamic feasibility
pi,jThe jth sample point of the ith robot trajectory
pi,jThe optimal jth sample point of the ith robot trajectory
p(t)Trajectory, piecewise polynomial
p(s1)(t)The (s1)th derivative of the trajectory, i.e., the control input
p[s1](t)(p(t),p˙(t),,p(s1)(t))Rms
pi(t)The ith piece of the trajectory
p¯0Initial state
p¯fFinal state
p~i,jThe jth constraint point of the ith piece
QDegree of the polynomial, here Q=2s1=5
qIntermediate waypoints
sIntegrator order, if controlling jerk, s=3
TTime allocation for each piece
TΣTotal time of the trajectory
TMINCOThe class of trajectory parameterization based on piecewise polynomials aiming at minimizing control effort
USquared distance vector: (|p^i,1p^i,0|22,,|p^i,Mcp^i,Mc1|22)RMc
vmMaximum velocity
wiThe weight vector of the robot i, RN, essentially the weights of its N adjacent edges
β(t)Polynomial basis vector: β(t)=[1,t,,tN]
δSampling time interval
λs,λuWeights for the formation similarity and uniform distribution costs
κiNumber of constraint points for the ith piece
ΦAll other robots in the swarm
ϕElement in Φ
ρTime regularization weight

1 Adaptive description of swarm formation (Sec. IV in [1])

1.1 Graph-based Formation Definition

In this article, a swarm formation of N robots is modeled by an undirected graph G=(V,E), where V:={1,2,...,N} is the set of vertices, and EV×V is the set of edges. In graph G, the vertex i represents the ith robot with position vector pi=[xi,yi,zi]R3 . An edge eijE that connects vertex iV and vertex jV means that robot i and j can measure the geometric distance between each other.

In our work, each robot can obtain the positions of all robots {p1,...,pi,...,pN}, thus the graph G is complete. Then we determine the adjacency matrix ARN×N and degree matrix DRN×N of the formation graph G by

(1)Aij=wij=∥pipj2,(2)Dij={j=1NAij,if  i=j,0,otherwise,

where the non-negative edge weight wij is the squared distance between the ith and jth robots, and denotes the Euclidean norm. Thus, the corresponding Laplacian matrix is

(3)L=DA.

With the above matrices, the symmetric normalized Laplacian matrix of graph G is defined as

(4)L^=D1/2LD1/2=ID1/2AD1/2,

where IRN×N is the identity matrix. L^ contains the information that is invariant to scale, translation, and rotation.

Finally, we use graph theory to describe various desired formation shapes, such as squares, hexagons, and pyramids.

By specifying the desired positions pid=[xid,yid,zid]R3,i=1,...,N, computing L^des is simple.

It's important to note that the desired formation shape is independent of the coordinate system as long as the relative positions are provided.

1.2 Differentiable Formation Similarity Error Metric

To assess the deviation from the desired formation, we propose a differentiable formation similarity error metric as

\begin{aligned} f_s & = f_s(\mathbf{p}_1,...,\mathbf{p}_i,...,\mathbf{p}_N) = f_s (\mathbf{A},\mathbf{D}) = f_s(\mathbf{\hat{L}},\mathbf{\hat{L}}_{des}) \\ & = \parallel\mathbf{\hat{L}}-\mathbf{\hat{L}}_{des}\parallel^2_F = \operatorname{tr}\{(\mathbf{\hat{L}}-\mathbf{\hat{L}}_{des})^\top(\mathbf{\hat{L}}-\mathbf{\hat{L}}_{des})\}, \tag{5} \end{aligned}

where tr{} denotes the trace of a matrix, L^ is the symmetric normalized Laplacian of the current swarm formation, L^des is the counterpart of the desired formation. Frobenius norm F is used in our distance metric.

As a graph representation matrix, L^ contains information about the graph structure. This allows fs to consider only the geometric shape of the formation, and not be influenced by scaling, translation, or rotation. Additionally, fs is a dimensionless value that solely reflects the error in formation shape similarity.

In particular, under the distributed framework, each robot can only change its positions to reduce the overall formation similarity error. Therefore, the only variable for robot i in (5) is pi, and fs(p1,...,pi,...,pN) can be simplified as fs(pi).

Our metric is analytically differentiable with respect to the position of each robot. For robot i, we use the weights of its N adjacent edges {ei1,ei2,...,,eiN} to form a weight vector wi=[wi1,wi2,...,,wiN]. By the chain rule, the gradient of fs with respect to pi can be written as

(6)fspi=fswiwipi.

According to our metric (5), the gradient of fs with respect to each weight wij can be computed as follow

fswij=tr{(fsL^)(L^wij)},fsL^=||L^L^des||F2L^=2(L^L^des),(7)L^wij=(D1/2AD1/2)wij.

Then the gradient fs/wi can be written as

(8)fs/wi=[fs/wi1,fs/wi2,...,fs/wiN].

As for wi/pi, the Jacobian can be easily derived since the weight function (1) is a differentiable quadratic form.

1.3 Optimal Formation Position Sequence

TODO: Previous work Previous work [2] incorporated fs directly into the trajectory optimization, making formation flight a coupled trajectory optimization problem. While this method is suitable for small-scale formation flight, it becomes computationally inefficient as the number N of robots increases.

Considering the simplified equation for coupled trajectory optimization

(9)minpi,0,...,pi,Mcj=0Mcfs(pi,j)+Jother,

where pi,j represent the jth sample point of ith robot trajectory in (19) for convenience. Jother represents all other cost functions, and Mc is the number of sample points with corresponding timestamps.

The primary purpose of calculating fs is to supply gradient information for minimizing formation similarity error. However, since the graph G is a complete graph, computing fs has a complexity of O(N2).

Consequently, the coupled trajectory optimization (9) also exhibits high complexity of O(N2) in each iteration, limiting its applicability to large-scale swarm operations.

To address this issue, we must identify an equivalent approach with reduced computational complexity to replace the function of fs in (9). We introduce the concept of optimal formation position pi,j for robot i at timestamp j, which is the position that minimizes the formation similarity error fs. Fig.1(a) illustrates this concept using a 2D formation as an example.

fig-10-1
Figure 1[1]: Illustration of optimal formation position sequence using a 2D formation. (a) The surface shows the profile of the similarity metric when one UAV moves in the plane and the other three remain still. The minimum suggests the optimal formation position to form the desired shape. (b) The sequence of optimal formation positions corresponds to the timestamps.

It is evident from the figure that there exists an optimal formation position pi,j that results in a minimal formation similarity error, and the partial derivative is fs/pi,j=0.

In the future period with a sequence of timestamps {0,...,j,...,Mc}, we represent the expected positions of robot i with the optimal formation position sequence pi={pi,0,,pi,j,,pi,Mc}, as shown in Fig. 1(b). By precomputing pi, we can utilize its quadratic distance to replace the gradient information offered by fs in (11), thus decreasing the computational requirements as follows

(10)fs(pi,j)pi,jpi,j2.

Since the optimal solutions of fs and quadratic distance cost are equivalent, the trajectory approaches the positions with minimal formation similarity error, maintaining the desired formation.

Thus, we can effectively solve the coupled trajectory optimization with a two-step procedure

1.  pi=\argminj=0Mcfs(pi,j),(11)pi  2.  minpi,0,...,pi,Mcpi,jpi,j2+Jother.

As a result, the previously required calculation of fs in each trajectory optimization process is replaced by the computation of the quadratic distance, simplifying the optimization problem. This significantly reduces computational demands and enables large-scale swarm formation.

Formula (11) indicates that trajectory optimization in the next section is performed on discretized points. Non-uniform discretized points may lead to poor trajectories and sub-optimal performance. Therefore it is crucial to ensure a uniform distribution of these points to maintain the effectiveness of the optimization process.

In engineering practice, since graphs G are constructed from a series of discretized timestamps as depicted in Fig. 1(b), each pi,j is independent.

To ensure a smoother trajectory, we introduce the uniform optimal formation position sequence p^i, which is generated by considering the formation similarity error Js and the uniform distribution cost Ju

(12)p^i=\argminλsJs+λuJu,Js=j=0Mcfs(p^i,j),(13)Ju=E(U2)E(U)2=U22McU12(Mc)2,

where λs and λu are the relative weights. E() is mathematic expectation and the squared distance vector URMc is

(14)U=(p^i,1p^i,022,,p^i,Mcp^i,Mc122).

We use the quasi-Newton method to solve this unconstrained optimization problem (12) and generate uniform p^i for the later trajectory optimization (18).

By doing so, the trajectory resulting from these discretized points in the next section can be smoother and avoid sudden spatial changes.

2 Spatial-temporal trajectory optimization for formation flight

2.1 Trajectory Representation

The differential flatness of multicopters benefits trajectory generation without integrating differential equations. Moreover, the motion planning of multicopters can be performed on low-dimensional smooth trajectories.

Here we adopt a state-of-the-art trajectory representation named MINCO to achieve minimum control effort spatial-temporal trajectory planning for swarm aerial robots in three-dimensional environments. MINCO conducts spatial-temporal deformation of the flat-output M-piece trajectory p(t) by decoupling the space and time parameters with a linear-complexity mapping M

(15)p(t)=Mq,T(t),  t[t0,tM],

where q=(q1,,qM1)R3×(M1) are the adjacent intermediate points between each pair of connected pieces and T=(T1,,TM)R>0M the time duration of each piece.

A m-dimensional M-piece trajectory p(t) is represented by piecewise polynomials. And ith piece pi(t) is defined as a multi-degree polynomial (Q=5 in this paper)

(16)pi(t)=ciβ(t),  t[0,Ti],

where ciR(Q+1)×m is the coefficient matrix and β(t)=[t0,t1,,tQ] is the natural basis.

For an s-integrator (s=3 here) chain dynamics system, a M-piece 2s1 degree trajectory p(t) is defined by constant boundaries and minimum control effort {q,T}.

Furthermore, MINCO is advanced in convert {q,T} to {c,T} using a linear-time and space parameter mapping c=M(q,T), where c=(c1,,cM) is polynomial coefficients.

2.2 Problem Formulation

After determining the desired formation shape in the last section, we expect a cluster of trajectories for swarm robots, which are smooth, collision-free, and formation maintained.

In practice, navigating swarm robots in an unknown dense environment with FOV-limited sensors and onboard computer requires an efficient real-time planner focusing on local information. Besides, centralized optimization is limited by the scale of the swarm.

Therefore, we choose a distributed local trajectory optimization for formation flight as follows

(17a)minq,T  t0tMp(s)(t)2dt+ρTΣ,(17b)s.t.    p(t)=Mq,T(t),t[t0,tM],(17c)p[s1](0)=p¯0,(17d)p[s1](tM)=p¯f,(17e)H(p(t),...,p(s)(t))0,t[t0,tM].

We define costs (17a) for smoothness and aggressiveness to achieve smooth and efficient flight. ρ is time regularization parameter, TΣ=i=1MTi. The state of robot p(t) (17b) is parameterized by the optimization variables {q,T}. p[s1](t)=(p(t),p˙(t),...,p(s1)(t))Rms represents the higher-order derivatives of a chain dynamic system with s-integrator. Boundary conditions involve initial state p¯0Rms (17c) and terminal state p¯fRms (17d). Continuous-time constraints H (17e) include swarm formation similarity, dynamic feasibility, obstacle avoidance, and swarm reciprocal avoidance.

2.3 Constraints Transcription

To solve the continuous constrained optimization problem (17) in real-time, we use the optimization variable of MINCO (15) to eliminate all kinds of equality constraints (17b)-(17d) (See MINCO for details). And penalty function method TODO is used to deal with the inequality constraints (17e). Then, every integral is evaluated by a finite sum of sample points.

Finally, the continuous constrained optimization problem is converted to a discrete unconstrained optimization problem

(18)minq,TxλJ~(q,T,δ),

where J~ are various terms of cost function or penalties, and λ are relative weights. Subscripts ={f,e,t,o,r,d}:

  • (f) swarm formation similarity;
  • (e) denote control effort;
  • (t) total time;
  • (o) obstacle avoidance;
  • (r) swarm reciprocal avoidance;
  • (d) dynamic feasibility.

δ is the sampling time interval.

In the previous work[2], we used the fixed number sampling points p^i,j=pi((j/κi)Ti) to transform the optimization problem, where pi(t) is the ith piece trajectory and κi is the fixed sample number on this piece.

However, considering that the total time TΣ changes during the optimization process, the fixed number sampling points p^i,j are difficult to space on the whole trajectory equally.

Therefore, we take fixed-time-interval sampling points for the whole trajectory to ensure the accuracy of the penalty function sampling transformation

p~j(t)=pi(jδl=1i1Tl),(19)j{0,,κ},κ=TΣδ,

where κ is the sample number and Tl is the preceding time for any 1l<i.

For the trajectory planning of swarm robots, the fixed time interval sampling points p~j(t) can simplify the optimization problem. Compared with p^i,j, the timestamp corresponding to p~j(t) is fixed, so the states of other robots at this timestamp are also constant during the optimization process.

Therefore, it is feasible to calculate the states of other robots w.r.t p~j(t) according to the broadcast trajectories before optimization.

Then we can solve the uniform formation position sequence optimization (12) in advance and use p^i to replace the formation similarity metric fs in trajectory optimization (17a) of ith robot. This decoupled formation trajectory optimization results in higher computational efficiency, making our method suitable for large-scale swarm robots.

Despite the optimization problem is not differentiable when sampling number κ changes, the cost function remains continuous w.r.t. time duration T. We use the quasi-Newton method to solve the non-smooth discrete unconstrained optimization problem (18).

2.4 Cost Functions and Gradients

Given the fixed sampling time interval δ, we can evaluate the cost functions and gradients of the whole trajectory by a finite sum of sampling points p~j(t).

The cost of various general purpose penalties at jth sampling points is

(20)P(c,T,jδ)=P(p~j(t)),

then the cost function J~ in (18) is calculated as follows

J~(q,T,δ)=J(c,T,δ),=δj=0κω¯jP(c,T,jδ)+(21)+12(TΣκδ)[P(c,T,κδ)+P(c,T,TΣ)],

where (ω¯0,ω¯1,,ω¯κ1,ω¯κ)=(1/2,1,,1,1/2) are the orthogonal coefficients following the trapezoidal rule.

And MINCO allows any second-order continuous cost function J~(q,T) to be represented by J(c,T). Hence, J~/q and J~/T can be efficiently obtained from J/c and J/T respectively, which is benefit to the construction and solution of the optimization problem.

In (19), the sampling time t=jδl=1i1Tl is related to the preceding time Tl, so the gradient of J w.r.t ci and Tl are computed as

(22)Jci=JPPp~j(t)p~j(t)ci,(23)JTl=JPPp~j(t)p~j(t)ttTl,(24)p~j(t)ci=β(t),p~j(t)t=p~˙j(t),tTl={0,l=i,1,l<i,

where the calculation of J/P is simple and the details of P(p~j(t)) for various general purpose are given as follow.

  • Cost of Swarm Formation Similarity Pf In Sec. 1.3, we decouple the formation similarity error metric from trajectory optimization by constructing an unconstrained optimization problem to calculate the uniform optimal formation position sequence p^i for each sampling point. This improvement avoids multiple calculations of formation similarity metric fs. Then, we use the quadratic form to calculate the cost of swarm formation similarity

(25)Pf(p~j(t))=max{p~j(t)p^i,j2,0}3.
  • Control Effort Je The sth (s=3 here) control input for the trajectory and its gradients are written as

(26)Je=i=1M0Tipi(s)(t)2 dt,(27)Jeci=2(0Tiβ(s)(t)β(s)(t) dt)ci,(28)JeTi=ciβ(s)(Ti)β(s)(Ti)ci.
  • Total Time Jt In order to ensure the aggressiveness of the trajectory, we minimize the total time Jt=i=1MTi. The gradients are given by Jt/c=0 and Jt/T=1.

  • Cost of Obstacle Avoidance Po Obstacle avoidance penalty Jo is computed using Euclidean Signed Distance Field (ESDF). We penalize the sampling points which are too close to the obstacles

(29)Po(p~j(t))=max{ψo(p~j(t)),0}3,(30)ψo(p~j(t))=dodo(p~j(t)),

where do is the safety threshold set according to the actual situation and do(p~j(t)) is the distance between p~j(t) and the closest obstacle around it. The gradient of Po w.r.t p~j(t) is

(31)Pop~j(t)=d,

where the d is the gradient of ESDF in p~j(t).

  • Cost of Swarm Reciprocal Avoidance Pr We penalize p~j(t) when it is too close to the trajectories pϕ(t),ϕΦ at the fixed timestamp t=jδ, where Φ represents the all other robots in the swarm. Compared to previous work[2], the state of other robots with fixed timestamp pϕ(jδ) are constant during the optimization process and do not produce a gradient w.r.t T for the cost function Jr. So the optimization problem and the gradients are simplified. The cost of swarm reciprocal avoidance is defined as
(32)Pr(p~j(t))=Φmax{ψr(p~j(t),pϕ(jδ)),0}3,(33)ψr(p~j(t),pϕ(jδ))=dr2p~j(t)pϕ(jδ)2,

where dr is the safe clearance between each robot. And the gradient of Pr w.r.t p~j(t) is

(34)Prp~j(t)=2(p~j(t)pϕ(jδ)).
  • Cost of Dynamic feasibility Pd We limit the maximum value of velocity and acceleration to guarantee that the robots can execute the trajectory.
\begin{aligned} \mathcal{P}_d(\tilde{\mathbf{p}}_{j}(t))&=\mathcal{P}_{d,v}(\tilde{\mathbf{p}}_{j}(t))+\mathcal{P}_{d,a}(\tilde{\mathbf{p}}_{j}(t)), \\ \mathcal{P}_{d,v}(\tilde{\mathbf{p}}_{j}(t)) &= \max \{ \parallel \dot{\tilde{\mathbf{p}}}_{j}(t)\parallel^2 - v_m^2, 0\}^3, \\ \mathcal{P}_{d,a}(\tilde{\mathbf{p}}_{j}(t)) &= \max \{ \parallel \ddot{\tilde{\mathbf{p}}}_{j}(t)\parallel^2 - a_m^2, 0\}^3, \tag{35} \end{aligned}

where vm and am are the maximum velocity and acceleration.

2.5 Discussion on solution quality of trajectory optimization

The proposed trajectory optimization process (17) aims to solve a challenging multi-stage Linear Quadratic Minimum Time (LQMT) problem, which is inherently non-convex and non-linear. Additionally, incorporating ESDF for obstacle avoidance introduces further non-convex constraints. As a result, guaranteeing the global optimal solution with the quasi-Newton method is not always possible.

To address concerns regarding local minima and infeasible solutions, we have implemented measures that prioritize safety and dynamic feasibility while maintaining high-performance formation flight.

  • Firstly, we utilize hybrid-A searching algorithm* to generate initial trajectories that are collision-free and dynamically feasible, ensuring a valid final solution trajectory.
  • During optimization, we give greater weight to obstacle avoidance and dynamic constraints to prioritize safety and feasibility.
  • Additionally, we conduct collision checks on trajectories to enhance safety.
  • Moreover, our distributed swarm optimization framework effectively mitigates the impact of local minima on overall formation performance.

Implementing these measures, our method reliably achieves robust formation flight while maintaining computational efficiency.

9 Conclusion

Procedure (two-step decoupled formation trajectory optimization)

Inputs:

  • Desired formation (relative positions) {pid}i=1N;
  • Initial/terminal states (p¯0,p¯f);
  • Obstacle ESDF do();
  • Safety clearances (do,dr);
  • Dynamic limits (vm,am);
  • Weights λ;
  • Sampling interval δ.
  1. Describe the desired formation with a graph.
  • Build the (complete) formation graph G, compute A, D, and normalized Laplacian L^ via (1)(4).
  • From {pid} compute L^des (codes).
cpp
bool SwarmGraph::setDesiredForm()
  1. Define formation similarity and its gradients.
  • Use the differentiable similarity metric fs(L^,L^des) in (5) and its gradients (6)(8) to measure formation-shape error (codes).
cpp
bool PolyTrajOptimizer::swarmGraphGradCostP()
  1. Precompute the (uniform) optimal formation position sequence.
  • For each robot i and each discrete timestamp j{0,,Mc}, solve the formation-only optimization to obtain the optimal formation positions pi,j (Sec. 1.3).
  • Replace the O(N2) formation similarity term by a quadratic distance surrogate (10), yielding the decoupled two-step pipeline (11).
  • Add the uniformity regularizer and solve (12)(14) to obtain the uniform sequence p^i,j.
  1. Initialize a feasible trajectory for each robot.
  1. Parameterize each trajectory using MINCO.
  1. Transcribe continuous constraints with fixed-interval sampling.
  • Sample the full trajectory using the fixed time interval δ to obtain p~j(t) as in (19), so timestamps remain constant while T changes.
  • For reciprocal avoidance, query other robots’ broadcast trajectories at the fixed timestamps t=jδ.
  1. Assemble the discrete unconstrained optimization objective.
  • Convert (17) into the sampled objective (18) using trapezoidal integration (21).
  • Use the following penalty/cost terms at samples:
    • Formation similarity via distance to p^i,j (25).
    • Control effort and time terms (26)(28).
    • Obstacle avoidance using ESDF (29)(31).
    • Reciprocal avoidance (32)(34).
    • Dynamic feasibility (35).
  1. Optimize per robot with quasi-Newton and analytic gradients.
  • Compute gradients w.r.t. coefficients and durations using the MINCO mapping and chain rules (22)(24).
  • Solve (18) with a quasi-Newton method until convergence; run collision checks and keep high weights on obstacle/dynamic penalties for safety (Sec. 2.5).
  1. Execute and iterate.
  • Broadcast the optimized trajectory to neighbors and execute; repeat the above steps in a distributed replanning loop as new environment/neighbor information arrives.

Released under the MIT License.