Skip to content

4 The Pontryagin Maximum Principle

This important chapter moves us beyond the linear dynamics assumed in Chapters 2 and 3, to consider much wider classes of optimal control problems, to introduce the fundamental Pontryagin Maximum Principle, and to illustrate its uses in a variety of examples.

4.1 Calculus of Variations, Hamiltonian Dynamics

We begin in this section with a quick introduction to some variational methods. These ideas will later serve as motivation for the Pontryagin Maximum Principle.

Assume we are given a smooth function L:Rn×RnR,L=L(x,v);L is called the Lagrangian. Let T>0,x0,x1Rn be given.

Basic Problem of the Calculus of Variations

Find a curve x():[0,T]Rn that minimizes the functional

I[x()]:=0TL(x(t),x˙(t))dt

among all fintions x() satisfying x(0)=x0 and x(T)=x1.

Now assume x() solves our variational problem. The fundamental question is this: how can we characterize x()?

4.1.1 Derivation of Euler-Lagrange Equations

Notation

We write L=L(x,v), and regard the variable x as denoting position, the variable v as denoting velocity. The partial derivatives of L are

Lxi=Lxi,Lvi=Lvi(1in),

and we write

xL:=(Lx1,,Lxn),vL:=(Lv1,,Lvn).

Theorem 4.1: Euler-Lagrange Equations

Let x() solve the calculus of variations problem. Then x() solves the Euler–Lagrange differential equations:

(E-L)ddt[vL(x(t),x˙(t))]=xL(x(t),x˙(t)).

The significance of preceding theorem is that if we can solve the Euler–Lagrange equations (E-L), then the solution of our original calculus of variations problem (assuming it exists) will be among the solutions.

Note that (E-L) is a quasilinear system of n second–order ODE. The ith component of the system reads

ddt[Lvi(x(t),x˙(t))]=Lxi(x(t),x˙(t)).

Proof:

  1. Select any smooth curve y[0,T]Rn, satisfying y(0)=y(T)=0(Why? You'll know it). Define
i(τ):=I(x()+τy()).

for τR and x()=x(). (To simplify we omit the superscript .) Notice that x()+τy() takes on the proper values at the endpoints (0,T). Hence, since x() is minimizer (for the variational problem), we have

i(τ)I[x()]=i(0).

Consequently i() has a minimum at τ=0, and so

i(0)=0.
  1. We must compute i(τ). Note first that
i(τ)=0TL(x(t)+τy(t),x˙(t)+τy˙(t))dt.

and hence

i(τ)=0T(i=1nLxi(x(t)+τy(t),x˙(t)+τy˙(t))yi(t)+i=1nLvi()y˙i(t))dt.

(Why sum: Chain rule) Let τ=0. Then

0=i(0)=i=1n0TLxi(x(t),x˙(t))yi(t)+Lvi(x(t),x˙(t))y˙i(t)dt.

This equality holds for all choices of y:[0,T]Rn, with y(0)=y(T)=0.

  1. Fix any 1jn. Choose y() s.t.
yj(t)=ψ(t),yi(t)0,ij,

where ψ is an arbitary function. Use this choice of y() above:

0=0TLxj(x(t),x˙(t))ψ(t)+Lvj(x(t),x˙(t))ψ˙(t)dt.

Integrate by parts, recalling that ψ(0)=ψ(T)=0[Lvjψ(t)]|0T=0:

0=0T[Lxj(x(t),x˙(t))ddt(Lvj(x(t),x˙(t)))]ψ(t)dt.

This holds for alll ψ:[0,T]R,ψ(0)=ψ(T)=0 and therefore

Lxj(x(t),x˙(t))ddt(Lvj(x(t),x˙(t)))=0

for all times 0tT. To see this, observe that otherwise Lxjddt(Lvj) would be, say, positive on some subinterval on I[0,T]. Choose ψ0 off I,ψ>0 on I. Then

0T(Lxjddt(Lvj))ψdt>0,

a contradiction.

4.1.2 Conversion to Hamilton's Equations

Definition: generalized momentum

For the given curve x(), define

p(t):=vL(x(t)x˙(t))(0tT).

We call p() the generalized momentum

Our intention now is to rewrite the Euler–Lagrange equations as a system of first–order ODE for x(),p().

Important Hypothesis: Assume that for all x,pR, we can solve the equation

p=vL(x,v)

for v in terms of x and p. That is, we suppose we can solve the identity for v=v(x,p).

Definition: dynamical systems Hamiltonian H

Define the dynamical systems Hamiltonian H:Rn×RnR by the formula

H(x,p)=pv(x,p)L(x,v(x,p)),

where v id defined above.

Notation

The partial derivatives of H are

Hxi=Hxi,Hpi=Hpi(1in),

and we write

xH:=(Hx1,,Hxn),pH:=(Hp1,,Hpn).

Theorem 4.2: Hamiltonian Dynamics

Let x() solve the Euler-Lagrange equations (E-L) and define p() as above. Then the pair (x(),p()) solves Hamilton's equations:

{x˙(t)=pH(x(t),p(t))(ODE)p˙(t)=xH(x(t),p(t))(ADJ)

Furthermore, the mapping tH(x(t),p(t)) is constant.

Proof: Recall that H(x,p)=pv(x,p)L(x,v(x,p)), where v=v(x,p) or, equivalently, p=vL(x,v). then

xH(x,p)=pxv[xL(x,v(x,p))+vL(x,v(x,p))xv]=[pvL(x,v)]xvxL(x,v(x,p))=xL(x,v(x,p))

(Why not consider p in interms of x? See H(x,p), they are currently divided)

because p=vL. Now p(t)=vL(x(t),x˙(t)) iff x˙(t)=v(x(t),p(t)). Therefore (E-L) implies

p˙(t)=xL(x(t),x˙(t))=xL(x(t),(x(t),p(t)))=xH(x(t),p(t)).

Also

pH(x,p)=v(x,p)+ppvvLpv=v(x,p)

since p=vL(x,v(x,p)). This implies

pH(x(t),p(t))=v(x(t),p(t)).

But

p(t)=vL(x(t),x˙(t))

and so x˙(t)=v(x(t),p(t)). Therefore

x˙(t)=pH(x(t),p(t)).

Finaly note that

ddtH(x(t),p(t))=xHx˙(t)+pHp˙(t)=xHpH+pH(xH)=0.
$\square$

A Physical Example

We define the Lagrangian

L(x,v)=m|v|22V(x),

which we interpret as the kinetic energy minus the potential energy V. then

{xL=V(x)vL=mv.

Therefore the Euler-Lagrange equation is

mx¨(t)=V(x(t)),

which is Newton’s law. Furthermore

p=vL(x,v)=mv

is the momentum, and the Hamiltonian is

H(x,p)=ppmL(x,pm)=|p|2mm2|pm|2+V(x)=|p|22m+V(x).

the sum of the kinetic and potential energies. For this example, Hamilton’s equations read

{x˙(t)=p(t)mp˙(t)=V(x(t)).

4.2 Review of Lagrange Multipliers

Constraint and Lagrange Multipliers

What first strikes us about general optimal control problems is the occurence of many constraints, most notably that the dynamics be governed by the differential equation

{x˙(t)=f(x(t),α(t))(t>0)x(0)=x0.

This is in contrast to standard calculus of variations problems, as discussed in §4.1, where we could take any curve x() as a candidate for a minimizer.

Now it is a general principle of variational and optimization theory that “constraints create Lagrange multipliers” and furthermore that these Lagrange multipliers often “contain valuable information”. This section provides a quick review of the standard method of Lagrange multipliers in solving multivariable constrained optimization problems.

Unconstrainted Optimization

Suppose first that we wish to find a maximum point for a given smooth function f:RnR. In this case there is no constraint, and therefore if f(x)=maxxRnf(x), then x is a critical point of f:

f(x)=0.

Constrainted Optimization

We modify the problem above by introducing the region

R:={xRn|g(x)0},

determined by some given function g:RnR. Suppose xR and f(x)=maxxRf(x). We would like a characterization of x in terms of the gradients of f and g.

  • Case 1: x lies in the interior of R Then the constraint is inactive, and so

    f(x)=0.interior
  • Case 2: x lies on R We look at the direction of the vector f(x). A geometric picture like Figure above is impossible; for if it were so, then f(y) would be greater that f(x) for some other point yR. So it must be f(x) is perpendicular to R at x (Suppose not perpendicular, then there exists a unit vector τ in the tangential direction s.t. τf(x)>0, and this is the direction derivative) Since g is perpendicular to R={g=0} (gradient is the fastest change direction, so normal to perpendicular), it follows that f(x) is parallel to g(x). Therefore

    (4.4)f(x)=λg(x)

    for some real number λ, called a Lagrange multiplier. boundary

Critique

The foregoing argument is in fact incomplete, since we implicitly assumed that g(x)0, in which case the Implicit Function Theorem implies that the set {g=0} is an (n1) -dimension surface near x (as illustrated).

If instead g(x)=0, the set {g=0} need not have this simple form near x; and the reasoning discussed as Case 2 above is not complete.

The correct statement is this:

There exist real numbers λ,μ not both equal to 0, s.t.

(4.5)μf(x)=λg(x).

If μ0, we can divide by μ and convert to the formulation (4.4). And if g(x)=0, we can take λ=1,μ=0, making assertion (4.5) correct (if not particularly useful).

4.3 Statemet of Pontryagin Maximum Principle

We come now to the key assertion of this chapter, the theoretically interesting and practically useful theorem that if α() is an optimal control, then there exists a function p(), called the costate, that satisfies a certain maximization principle. We should think of the function p() as a sort of Lagrange multiplier, which appears owing to the constraint that the optimal curve x() must satisfy (ODE). And just as conventional Lagrange multipliers are useful for actual calculations, so also will be the costate.

We quote Francis Clarke [C2]: “The maximum principle was, in fact, the culmination of a long search in the calculus of variations for a comprehensive multiplier rule, which is the correct way to view it: p(t) is a “Lagrange multiplier” ... It makes optimal control a design tool, whereas the calculus of variations was a way to study nature.”

4.3.1 Fixed Time, Free Endpoint Problem

Let us review the basic set-up for our control problem.

We are given ARm and also f:Rn×ARn,x0Rn. We as before denote the set of admissible controls by

A={α():[0,)Aα() is measurable}.

Then given α()A, we solve for the corresponding evolution of our system:

{x˙(t)=f(x(t),α(t))(t0)x(0)=x0

We also introduce the payoff functional (To show the influence of free endpoint)

(P)P[α()]=0Tr(x(t),α(t))dt+g(x(T))

where the terminal time T>0, running payoff r:Rn×AR and terminal payoff g:RnR are given.

Basic Problem

Find a control α() such that

P[α()]=maxα()AP[α()]

The Pontryagin Maximum Principle, stated below, asserts the existence of a function p(), which together with the optimal trajectory x() satisfies an analog of Hamilton's ODE from §4.1.2. For this, we will need an appropriate Hamiltonian:

Definition

The control theory Hamiltonian is the function

H(x,p,a):=f(x,a)p+r(x,a)(x,pRn,aA)

Theorem 4.3 (Pontryagin Maximum Principle)

Assume α() is optimal for ( ODE ), ( P ), and x() is the corresponding trajectory.

Then there exists a function p:[0,T]Rn such that

(ODE)x˙(t)=pH(x(t),p(t),α(t)),(ADJ)p˙(t)=xH(x(t),p(t),α(t)),

and

(M)H(x(t),p(t),α(t))=maxaAH(x(t),p(t),a)(0tT)

In addition,

 the mapping tH(x(t),p(t),α(t)) is constant. 

Finally, we have the terminal condition

p(T)=g(x(T))

Remarks and Intepretations

  1. The identities (ADJ) are the adjoint equations and (M) the maximization principle. Notice that (ODE) and (ADJ) resemble the structure of Hamilton's equations, discussed in §4.1. We also call ( T ) the transversality condition and will discuss its significance later.

  2. More precisely, formula (ODE) says that for 1in, we have

x˙i(t)=Hpi(x(t),p(t),α(t))=fi(x(t),α(t)),

which is just the original equation of motion. Likewise, (ADJ) says

p˙i(t)=Hxi(x(t),p(t),α(t))=j=1npj(t)fxij(x(t),α(t))rxi(x(t),α(t)).

4.3.2 Free Time, Fixed Endpoint Problem

Let us next record the appropriate form of the Maximum Principle for a fixed endpoint problem.

As before, given a control α()A, we solve for the corresponding evolution of our system:

(ODE){x˙(t)=f(x(t),α(t))(t0)x(0)=x0

Assume now that a target point x1Rn is given. We introduce then the payoff functional

P[α()]=0τr(x(t),α(t))dt.

Here r:Rn×AR is the given running payoff, and τ=τ[α()] denotes the first time the solution of (ODE) hits the target point x1.

As before, the basic problem is to find an optimal control α() such that

(P)P[α()]=maxα()AP[α()].

Define the Hamiltonian H as in §4.3.1.

Theorem 4.4 (Pontryagin Maximum Principle)

Assume α() is optimal for (ODE), (P) and x() is the corresponding trajectory.

Then there exists a function p:[0,τ]Rn such that

(ODE)x˙(t)=pH(x(t),p(t),α(t)),(ADJ)p˙(t)=xH(x(t),p(t),α(t)),

and

(M)H(x(t),p(t),α(t))=maxaAH(x(t),p(t),a)(0tτ)

Also,

H(x(t),p(t),α(t))0(0tτ)

Here τ denotes the first time the trajectory x() hits the target point x1. We call x() the state of the optimally controlled system and p() the costate.

Remark and Warning

More precisely, we should define

H(x,p,q,a)=f(x,a)p+r(x,a)q(qR)

A more careful statement of the Maximum Principle says "there exists a constant q0 and a function p:[0,t]Rn such that (ODE), (ADJ), and (M) hold".

  • If q>0, we can renormalize to get q=1, as we have done above.
  • If q=0, then H does not depend on running payoff r and in this case the Pontryagin Maximum Principle is not useful. This is a so-called "abnormal problem".

Compare these comments with the critique of the usual Lagrange multiplier method at the end of §4.2, and see also the proof in §A. 5 of the Appendix.

4.4 Application and Examples

How to Use the Maximum Principle

We mentioned earlier that the costate p() can be interpreted as a sort of Lagrange multiplier.

Calculations with Lagrange multipliers

Recall our discussion in §4.2 about finding a point x that maximizes a function f, subject to the requirement that g0. Now x=(x1,,xn)T has n unknown components we must find. Somewhat unexpectedly, it turns out in practice to be easier to solve (4.4) for the n+1 unknowns x1,,xn and λ. We repeat this key insight: it is actually easier to solve the problem if we add a new unknown, namely the Lagrange multiplier. Worked examples abound in multivariable calculus books.

Calculations with the costate

This same principle is valid for our much more complicated control theory problems: it is usually best not just to look for an optimal control α() and an optimal trajectory x() alone, but also to look as well for the costate p(). In practice, we add the equations (ADJ) and (M) to (ODE) and then try to solve for α(),x() and for p().

The following examples show how this works in practice, in certain cases for which we can actually solve everything explicitly or, failing that, at least deduce some useful information.

4.4.1 Example 1: Linear Time-Optimal Control.

For this example, let A denote the cube [1,1]n in Rn. We consider again the linear dynamics:

(ODE){x˙(t)=Mx(t)+Nα(t)x(0)=x0

for the payoff functional

P[α()]=0τ1dt=τ

where τ denotes the first time the trajectory hits the target point x1=0. We have r1, and so

H(x,p,a)=fp+r=(Mx+Na)p1

In §3.2 we introduced the Hamiltonian H=(Mx+Na)p, which differs by a constant from the present H. We can redefine H in §3.2 to match the present theory: compare then Theorems 3.4 and 4.4.

4.4.2 Example 2: Control of Production and Consumption

We return to Example 1 in Chapter 1, a model for optimal consumption in a simple economy. Recall that

x(t)= output of economy at time tα(t)= fraction of output reinvested at time t

We have the constraint 0α(t)1; that is, A=[0,1]R. The economy evolves according to the dynamics

(ODE){x˙(t)=α(t)x(t)(0tT)x(0)=x0

where x0>0 and we have set the growth factor k=1. We want to maximize the total consumption

P[α()]:=0T(1α(t))x(t)dt

How can we characterize an optimal control α() ?

Introducing the maximum principle

We apply Pontryagin Maximum Principle, and to simplify notation we will not write the superscripts for the optimal control, trajectory, etc. We have n=m=1,

f(x,a)=xa,g0,r(x,a)=(1a)x

and therefore

H(x,p,a)=f(x,a)p+r(x,a)=pxa+(1a)x=x+ax(p1)

The dynamical equation is

(ODE)x˙(t)=Hp=α(t)x(t)

and the adjoint equation is

(ADJ)p˙(t)=Hx=1α(t)(p(t)1)

The terminal condition reads

(T)p(T)=gx(x(T))=0

Lastly, the maximality principle asserts

(M)H(x(t),p(t),α(t))=max0a1{x(t)+ax(t)(p(t)1)}

Using the maximum principle

We now deduce useful information from (ODE), (ADJ), (M) and (T).

According to (M), at each time t the control value α(t) must be selected to maximize a(p(t)1) for 0a1. This is so, since x(t)>0 (It's fixed for the control is decided by the current state x(t)). Thus (a is a placeholder variable used to denote α(t))

α(t)={1 if p(t)>10 if p(t)1

Hence if we know p(), we can design the optimal control α(). So next we must solve for the costate p(). We know from (ADJ) and (T) that

{p˙(t)=1α(t)[p(t)1](0tT)p(T)=0

Since p(T)=0, we deduce by continuity that p(t)1 for t close to T,t<T. Thus α(t)=0 for such values of t. Therefore p˙(t)=1, and consequently p(t)=Tt for times t in this interval. So we have that p(t)=Tt so long as p(t)1. And this holds for T1tT

But for times tT1, with t near T1, we have α(t)=1; and so ( ADJ ) becomes

p˙(t)=1(p(t)1)=p(t)

Since p(T1)=1, we see that p(t)=eT1t>1 for all times 0tT1. In particular there are no switches in the control over this time interval.

Restoring the superscript * to our notation, we consequently deduce that an optimal control is

α(t)={1 if 0tt0 if ttT

for the optimal switching time t=T1.

We leave it as an exercise to compute the switching time if the growth constant k1.

4.4.3 Example 3: A Simple Linear-Quadratic Regulator

We take n=m=1 for this example, and consider the simple linear dynamics

(ODE){x˙(t)=x(t)+α(t)x(0)=x0

with the quadratic cost functional

0Tx(t)2+α(t)2dt,

which we want to minimize. So we want to maximize the payoff functional

(P)P[α()]=0Tx(t)2+α(t)2dt.

For this problem, the values of the controls are not constrained; that is, A=R.

Introducing the maximum principle

To simplify notation further we again drop the superscripts *. We have n=m=1,

f(x,a)=x+a,g0,r(x,a)=x2a2;

and hence

H(x,p,a)=fp+r=(x+a)p(x2+a2)

The maximality condition becomes

(M)H(x(t),p(t),α(t))=maxaR{(x(t)2+a2)+p(t)(x(t)+a)}

We calculate the maximum on the right hand side by setting Ha=2a+p=0. Thus a=p2, and so

α(t)=p(t)2.

The dynamical equations are therefore

(ODE)x˙(t)=x(t)+p(t)2

and

(ADJ)p˙(t)=Hx=2x(t)p(t).

Moreover x(0)=x0, and the terminal condition is

(T)p(T)=0.

Using the Maximum Principle

So we must look at the system of equations

(x˙p˙)=(11/221)M(xp),

the general solution of which is

(x(t)p(t))=etM(x0p0)

Since we know x0, the task is to choose p0 so that p(T)=0.

Feedback controls

An elegant way to do so is to try to find optimal control in linear feedback form; that is, to look for a function c():[0,T]R for which

α(t)=c(t)x(t)

We henceforth suppose that an optimal feedback control of this form exists, and attempt to calculate c(). Now

p(t)2=α(t)=c(t)x(t)

whence c(t)=p(t)2x(t). Define now

d(t):=p(t)x(t)

so that c(t)=d(t)2.

We will next discover a differential equation that d() satisfies. Compute

d˙=p˙xpx˙x2,

and recall that

{x˙=x+p2p˙=2xp.

Therefore

d˙=2xpxpx2(x+p2)=2dd(1+d2)=22dd22.

Since p(T)=0, the terminal condition is d(T)=0. So we have obtained a nonlinear first-order ODE for d() with a terminal boundary condition:

(R){d˙=22d12d2(0t<T)d(T)=0.

This is called the Riccati equation.

In summary so far, to solve our linear-quadratic regulator problem, we need to first solve the Riccati equation (R) and then set

α(t)=12d(t)x(t)

How to solve the Riccati equation. It turns out that we can convert ( R ) it into a second-order, linear ODE. To accomplish this, write

d(t)=2b˙(t)b(t)

for a function b() to be found. What equation does b() solve? We compute

d˙=2b¨b2(b˙)2b2=2b¨bd22.

Hence (R) gives

2b¨b=d˙+d22=22d=222b˙b

and consequently

{b¨=b2b˙(0t<T)b˙(T)=0,b(T)=1

This is a terminal-value problem for second-order linear ODE, which we can solve by standard techniques. We then set d=2b˙b, to derive the solution of the Riccati equation (R).

We will generalize this example later to systems, in §5.2.

4.4.4 Example 4: Moon Lander

This is a much more elaborate and interesting example, already introduced in Chapter 1.

Introduce the notation

h(t)= height at time tv(t)= velocity =h˙(t)m(t)= mass of spacecraft (changing as fuel is used up) α(t)= thrust at time t

The thrust is constrained so that 0α(t)1; that is, A=[0,1]. There are also the constraints that the height and mass be nonnegative: h(t)0,m(t)0.

The dynamics are

{h˙(t)=v(t)v˙(t)=g+α(t)m(t)m˙(t)=kα(t)

with initial conditions

{h(0)=h0>0v(0)=v0m(0)=m0>0

The goal is to land on the moon safely, maximizing the remaining fuel m(τ), where τ=τ[α()] is the first time h(τ)=v(τ)=0. Since α=m˙k, our intention is equivalently to minimize the total applied thrust before landing; so that

P[α()]=0τα(t)dt

This is so since

0τα(t)dt=m0m(τ)k

Introducing the maximum principle

In terms of the general notation, we have

x(t)=[h(t)v(t)m(t)],f=[vg+a/mka].

Hence the Hamiltonian is

H(x,p,a)=fp+r=(v,g+a/m,ka)(p1,p2,p3)a=a+p1v+p2(g+am)+p3(ka)

We next have to figure out the adjoint dynamics (ADJ). For our particular Hamiltonian,

Hx1=Hh=0,Hx2=Hv=p1,Hx3=Hm=p2am2

Therefore

(ADJ){p˙1(t)=0p˙2(t)=p1(t)p˙3(t)=p2(t)α(t)m(t)2

The maximization condition (M) reads

(M)H(x(t),p(t),α(t))=max0a1H(x(t),p(t),a)=max0a1{a+p1(t)v(t)+p2(t)[g+am(t)]+p3(t)(ka)}=p1(t)v(t)p2(t)g+max0a1{a(1+p2(t)m(t)kp3(t))}

Thus the optimal control law is given by the rule:

α(t)={1 if 1p2(t)m(t)+kp3(t)<00 if 1p2(t)m(t)+kp3(t)>0

Using the maximum principle

Now we will attempt to figure out the form of the solution, and check it accords with the Maximum Principle.

Let us start by guessing that we first leave rocket engine of (i.e., set α0 ) and turn the engine on only at the end. Denote by τ the first time that h(τ)=v(τ)=0, meaning that we have landed. We guess that there exists a switching time t<τ when we turned engines on at full power (i.e., set α1 ). Consequently,

α(t)={0 for 0tt1 for ttτ

Therefore, for times ttτ our ODE becomes

{h˙(t)=v(t)v˙(t)=g+1m(t)m˙(t)=k(ttτ)

with h(τ)=0,v(τ)=0,m(t)=m0. We solve these dynamics:

{m(t)=m0+k(tt)v(t)=g(τt)+1klog[m0+k(tτ)m0+k(tt)]h(t)= complicated formula 

Now put t=t :

{m(t)=m0v(t)=g(τt)+1klog[m0+k(tτ)m0]h(t)=g(tτ)22m0k2log[m0+k(tτ)m0]+tτk

Suppose the total amount of fuel to start with was m1; so that m0m1 is the weight of the empty spacecraft. When α1, the fuel is used up at rate k. Hence

k(τt)m1

and so 0τtm1k. Before time t, we set α0. Then (ODE) reads

{h˙=vv˙=gm˙=0v-h curve

and thus

{m(t)m0v(t)=gt+v0h(t)=12gt2+tv0+h0

We combine the formulas for v(t) and h(t), to discover

h(t)=h012g(v2(t)v02)(0tt)

We deduce that the freefall trajectory (v(t),h(t)) therefore lies on a parabola

h=h012g(v2v02).full v-h curve

If we then move along this parabola until we hit the soft-landing curve from the previous picture, we can then turn on the rocket engine and land safely.

In the second case illustrated below, we miss switching curve, and hence cannot land safely on the moon switching once.

full v-h curve but failed

To justify our guess about the structure of the optimal control, let us now find the costate p() so that α() and x() described above satisfy (ODE),(ADJ),(M). To do this, we will have to figure out appropriate initial conditions

p1(0)=λ1,p2(0)=λ2,p3(0)=λ3

We solve (ADJ) for α() as above, and find

{p1(t)λ1(0tτ)p2(t)=λ2λ1t(0tτ)p3(t)={λ3(0tt)λ3+ttλ2λ1s(m0+k(ts))2ds(ttτ).

Define

r(t):=1p2(t)m(t)+p3(t)k

then

r˙=p˙2m+p2m˙m2+p˙3k=λ1m+p2m2(kα)+(p2αm2)k=λ1m(t).

Choose λ1<0, so that r is decreasing. We calculate

r(t)=1(λ2λ1t)m0+λ3k

and then adjust λ2,λ3 so that r(t)=0. Then r is nonincreasing, r(t)=0, and consequently r>0 on [0,t),r<0 on (t,τ]. But (M) says

α(t)={1 if r(t)<00 if r(t)>0

Thus an optimal control changes just once from 0 to 1 ; and so our earlier guess of α() does indeed satisfy the Pontryagin Maximum Principle.

4.5 Maximum Principle with Transversality Conditions

Consider again the dynamics

(ODE)x˙(t)=f(x(t),α(t))(t>0)

In this section we discuss another variant problem, one for which the initial position is constrained to lie in a given set X0Rn and the final position is also constrained to lie within a given set X1Rn.

full v-h curve but failed

So in this model we get to choose the starting point x0X0 in order to maximize

(P)P[α()]=0τr(x(t),α(t))dt,

where τ=τ[α()] is the first time we hit X1.

Notation

We will assume that X0,X1 are in fact smooth surfaces in Rn. We let T0 denote the tangent plane to X0 at x0, and T1 the tangent plane to X1 at x1.

Theorem 4.5 (More Transversality Conditions)

Let α() and x() solve the problem above, with

x0=x(0),x1=x(τ)

Then there exists a function p():[0,τ]Rn, such that (ODE),(ADJ) and (M) hold for 0tτ. In addition,

(T){p(τ) is perpendicular to T1p(0) is perpendicular to T0

We call (T) the transversality conditions.

Remarks and Intepretations

  • If we have T>0 fixed and
P[α()]=0Tr(x(t),α(t))dt+g(x(T)),

then (T) says

p(T)=g(x(T)),

in agreement with our earlier form of the terminal/transversality condition.

  • Suppose that the surface X1 is the graph X1={xgk(x)=0,k=1,,l}. Then (T) says that p(τ) belongs to the "orthogonal complement" of the subspace T1. But orthogonal complement of T1 is the span of gk(x1)(k=1,,l). Thus
p(τ)=k=1lλkgk(x1)

for some unknown constants λ1,,λl.

4.6 More Applications

4.6.1 Example 1: Distance between Two Sets

As a first and simple example, let

(ODE)x˙(t)=α(t)

for A=S1, the unit sphere in R2:aS1 if and only if |a|2=a12+a22=1. In other words, we are considering only curves that move with unit speed.

We take

P[α()]=0τ|x˙(t)|dt= the length of the curve (P)=0τdt= time it takes to reach X1

We want to minimize the length of the curve and, as a check on our general theory, will prove that the minimum is of course a straight line.

Using the maximum principle

We have

H(x,p,a)=f(x,a)p+r(x,a)=ap1=p1a1+p2a21

The adjoint dynamics equation (ADJ) says

p˙(t)=xH(x(t),p(t),α(t))=0

and therefore

p(t) constant =p00.

The maximization principle ( M ) tells us that

H(x(t),p(t),α(t))=maxaS1[1+p10a1+p20a2].

The right hand side is maximized by a0=p0|p0|, a unit vector that points in the same direction of p0. Thus α()a0 is constant in time. According then to (ODE) we have x˙=a0, and consequently x() is a straight line.

Finally, the transversality conditions say that

(T)p(0)T0,p(t1)T1.

In other words, p0T0 and p0T1; and this means that the tangent planes T0 and T1 are parallel.

distance between sets

Now all of this is pretty obvious from the picture, but it is reassuring that the general theory predicts the proper answer.

4.6.2 Example 2: Commodity Trading

Next is a simple model for the trading of a commodity, say wheat. We let T be the fixed length of trading period, and introduce the variables

x1(t)= money on hand at time tx2(t)= amount of wheat owned at time tα(t)= rate of buying or selling of wheat q(t)= price of wheat at time t (known) λ= cost of storing a unit amount of wheat for a unit of time. 

We suppose that the price of wheat q(t) is known for the entire trading period 0tT (although this is probably unrealistic in practice). We assume also that the rate of selling and buying is constrained:

|α(t)|M

where α(t)>0 means buying wheat, and α(t)<0 means selling.

Our intention is to maximize our holdings at the end time T, namely the sum of the cash on hand and the value of the wheat we then own:

(P)P[α()]=x1(T)+q(T)x2(T).

The evolution is

(ODE){x˙1(t)=λx2(t)q(t)α(t)x˙2(t)=α(t)

This is a nonautonomous ( = time dependent) case, but it turns out that the Pontryagin Maximum Principle still applies.

Using the maximum principle

What is our optimal buying and selling strategy? First, we compute the Hamiltonian

H(x,p,t,a)=fp+r=p1(λx2q(t)a)+p2a

since r0. The adjoint dynamics read

(ADJ){p˙1=0p˙2=λp1

with the terminal condition

(T)p(T)=g(x(T))

In our case g(x1,x2)=x1+q(T)x2, and hence

(T){p1(T)=1p2(T)=q(T).

We then can solve for the costate:

{p1(t)1p2(t)=λ(tT)+q(T).

The maximization principle (M) tells us that

(M)H(x(t),p(t),t,α(t))=max|a|M{p1(t)(λx2(t)q(t)a)+p2(t)a}=λp1(t)x2(t)+max|a|M{a(q(t)+p2(t))}

So

α(t)={M if q(t)<p2(t)M if q(t)>p2(t)

for p2(t):=λ(tT)+q(T).

Critique

In some situations the amount of money on hand x1() becomes negative for part of the time. The economic problem has a natural constraint x20 (unless we can borrow with no interest charges) which we did not take into account in the mathematical model.

4.7 Maximum Principle with State Constraints

We return once again to our usual setting:

(ODE){x˙(t)=f(x(t),α(t))x(0)=x0(P)P[α()]=0τr(x(t),α(t))dt

for τ=τ[α()], the first time that x(τ)=x1. This is the fixed endpoint problem.

State Constraints

We introduce a new complication by asking that our dynamics x() must always remain within a given region RRn. We will as above suppose that R has the explicit representation

R={xRng(x)0}

for a given function g():RnR.

Definition

It will be convenient to introduce the quantity

c(x,a):=g(x)f(x,a).

Notice that if x(t)R for times s0ts1, then c(x(t),α(t))0(s0ts1). This is so since f is then tangent to R, whereas g is perpendicular.

Theorem 4.6 (Maximum Principle with State Constraints)

Let α(),x() solve the control theory problem above. Suppose also that x(t)R for s0ts1.

Then there exists a costate function p():[s0,s1]Rn such that (ODE) holds. There also exists λ():[s0,s1]R such that for times s0ts1 we have

(ADJ’)p˙(t)=xH(x(t),p(t),α(t))+λ(t)xc(x(t),α(t));

and

(M’)(M)H(x(t),p(t),α(t))=maxaA{H(x(t),p(t),a)c(x(t),a)=0}.

To keep things simple, we have omitted some technical assumptions really needed for the Theorem to be valid.

Remarks and Intepretations

  • Let ARm be of this form:
A={aRmg1(a)0,,gs(a)0}

for given functions g1,,gs:RmR. In this case we can use Lagrange multipliers to deduce from (M) that

(M”)aH(x(t),p(t),α(t))=λ(t)ac(x(t),α(t))+i=1sμi(t)agi(x(t))

The function λ() here is that appearing in (ADJ). If x(t) lies in the interior of R for say the times 0t<s0, then the ordinary Maximum Principle holds.

  • Jump conditions. In the situation above, we always have
p(s00)=p(s0+0)

where s0 is a time that x hits R. In other words, there is no jump in p when we hit the boundary of the constraint R.

However,

p(s1+0)=p(s10)λ(s1)g(x(s1))

this says there is (possibly) a jump in p() when we leave R.

4.8 More Applications

4.8.1 Example 1: Shortest Distance between Two Points, Avoiding An Obstacle.

distance between sets with obstacle

What is the shortest path between two points that avoids the disk B=B(0,r), as drawn?

Let us take

(ODE){x˙(t)=α(t)x(0)=x0

for A=S1, with the payoff

(P)P[α()]=0τ|x˙|dt= length of the curve x().

We have

H(x,p,a)=fp+r=p1a1+p2a21.

Case 1: avoiding the obstacle

Assume x(t)B on some time interval. In this case, the usual Pontryagin Maximum Principle applies, and we deduce as before that

p˙=xH=0.

Hence

(ADJ)p(t) constant =p0.

Condition (M) says

H(x(t),p(t),α(t))=maxaS1(1+p10a1+p20a2)

The maximum occurs for α=p0|p0|. Furthermore,

1+p10α1+p20α20

and therefore αp0=1. This means that |p0|=1, and hence in fact α=p0. We have proved that the trajectory x() is a straight line away from the obstacle.

Case 2: touching the obstacle

Suppose now x(t)B for some time interval s0ts1. Now we use the modified version of Maximum Principle, provided by Theorem 4.6.

First we must calculate c(x,a)=g(x)f(x,a). In our case,

R=R2B={xx12+x22r2}={xg:=r2x12x220}

Then g=(2x12x2). Since f=(a1a2), we have

c(x,a)=2a1x12a2x2

Now condition (ADJ) implies

p˙(t)=xH+λ(t)xc;

which is to say,

(4.6){p˙1=2λα1p˙2=2λα2

Next, we employ the maximization principle (M). We need to maximize

H(x(t),p(t),a)

subject to the requirements that c(x(t),a)=0 and g1(a)=a12+a221=0, since A={aR2a12+a22=1}. According to (M) we must solve

aH=λ(t)ac+μ(t)ag1;

that is,

{p1=λ(2x1)+μ2α1p2=λ(2x2)+μ2α2

We can combine these identities to eliminate μ. Since we also know that x(t)B, we have (x1)2+(x2)2=r2; and also α=(α1,α2) is tangent to B. Using these facts, we find after some calculations that

(4.7)λ=p2α1p1α22r.

But we also know

(4.8)(α1)2+(α2)2=1

and

H0=1+p1α1+p2α2

hence

(4.9)p1α1+p2α21.

Solving for the unknowns. We now have the five equations (4.6)(4.9) for the five unknown functions p1,p2,α1,α2,λ that depend on t. We introduce the angle θ, as illustrated, and note that ddθ=rddt. A calculation then confirms that the solutions are

{α1(θ)=sinθα2(θ)=cosθλ=k+θ2r

and

{p1(θ)=kcosθsinθ+θcosθp2(θ)=ksinθ+cosθ+θsinθ

for some constant k.

case 2

Case 3: approaching and leaving the obstacle

In general, we must piece together the results from Case 1 and Case 2. So suppose now x(t)R=R2B for 0t<s0 and x(t)B for s0ts1.

We have shown that for times 0t<s0, the trajectory x() is a straight line. For this case we have shown already that p=α and therefore

{p1cosϕ0p2sinϕ0

for the angle ϕ0 as shown in the picture.

By the jump conditions, p() is continuous when x() hits B at the time s0, meaning in this case that

{kcosθ0sinθ0+θ0cosθ0=cosϕ0ksinθ0+cosθ0+θ0sinθ0=sinϕ0

These identities hold if and only if

{k=θ0θ0+ϕ0=π2

The second equality says that the optimal trajectory is tangent to the disk B when it hits B.

case 3

We turn next to the trajectory as it leaves B : see the next picture. We then have

{p1(θ1)=θ0cosθ1sinθ1+θ1cosθ1p2(θ1)=θ0sinθ1+cosθ1+θ1sinθ1.

Now our formulas above for λ and k imply λ(θ1)=θ0θ12r. The jump conditions give

p(θ1+)=p(θ1)λ(θ1)g(x(θ1))

for g(x)=r2x12x22. Then

λ(θ1)g(x(θ1))=(θ1θ0)(cosθ1sinθ1).case 3 leave

Therefore

{p1(θ1+)=sinθ1p2(θ1+)=cosθ1,

and so the trajectory is tangent to B. If we apply usual Maximum Principle after x() leaves B, we find

p1constant=cosϕ1p2constant=sinϕ1.

Thus

{cosϕ1=sinθ1sinϕ1=cosθ1

and so θ1ϕ1=π2.

Critique

We have carried out elaborate calculations to derive some pretty obvious conclusions in this example. It is best to think of this as a confirmation in a simple case of Theorem 4.6, which applies in far more complicated situations.

4.8.2 An Inventory Control Model

Now we turn to a simple model for ordering and storing items in a warehouse. Let the time period T>0 be given, and introduce the variables

x(t)= amount of inventory at time tα(t)= rate of ordering from manufacturers, α0,d(t)= customer demand ( known )γ= cost of ordering 1 unit β= cost of storing 1 unit. 

Our goal is to fill all customer orders shipped from our warehouse, while keeping our storage and ordering costs at a minimum. Hence the payoff to be maximized is

(P)P[α()]=0Tγα(t)+βx(t)dt.

We have A=[0,) and the constraint that x(t)0. The dynamics are

(ODE){x˙(t)=α(t)d(t)x(0)=x0>0

Guessing the optimal strategy

Let us just guess the optimal control strategy: we should at first not order anything ( α=0 ) and let the inventory in our warehouse fall off to zero as we fill demands; thereafter we should order just enough to meet our demands ( α=d ).

inventory curve

Using the maximum principle

We will prove this guess is right, using the Maximum Principle. Assume first that x(t)>0 on some interval [0,s0]. We then have

H(x,p,a,t)=(ad(t))pγaβx

and (ADJ) says p˙=xH=β. Condition (M) implies

H(x(t),p(t),α(t),t)=maxa0{γaβx(t)+p(t)(ad(t))}=βx(t)p(t)d(t)+maxa0{a(p(t)γ)}

Thus

α(t)={0 if p(t)γ+ if p(t)>γ

If α(t)+ on some interval, then P[α()]=, which is impossible, because there exists a control with finite payoff. So it follows that α()0 on [0,s0] : we place no orders.

According to (ODE), we have

{x˙(t)=d(t)(0ts0)x(0)=x0

Thus s0 is first time the inventory hits 0 . Now since x(t)=x00td(s)ds, we have x(s0)=0. That is, 0s0d(s)ds=x0 and we have hit the constraint. Now use Pontryagin Maximum Principle with state constraint for times ts0

R={x0}={g(x):=x0}

and

c(x,a,t)=g(x)f(x,a,t)=(1)(ad(t))=d(t)a.

We have

(M’)H(x(t),p(t),α(t),t)=maxa0{H(x(t),p(t),a,t)c(x(t),a,t)=0}

But c(x(t),α(t),t)=0 if and only if α(t)=d(t). Then ( ODE ) reads

x˙(t)=α(t)d(t)=0

and so x(t)=0 for all times ts0. We have confirmed that our guess for the optimal strategy was right.

4.9 Numerical Solution of the Two Point Boundary Value Problem (TPBVP)

Reference: Chapter 12.6 in Numerical Optimal Control by Moritz Diehl and Sebastien Gros

At first, give the main drawbacks of the indirect approach:

  1. it must be possible to eliminate the controls from the problem by algebraic manipulations, which is not always straightforward or might even be impossible;
  2. the optimal controls might be a discontinuous function of x and p, such that the BVP is possibly given by a non-smooth differential equation;
  3. the differential equation might become very nonlinear and unstable and not suitable for a forward simulation.

All these issues of the indirect approach can partially be addressed, and most importantly, it offers an exact and elegant characterization of the solution of optimal control problems in continuous time.

In this section we address the question of how we can compute a solution of the boundary value problem (BVP) in the indirect approach. The remarkable observation is that the only non-trivial unknown is the initial value for the adjoints, p(0). Once this value has been found, the complete optimal trajectory can in principle be recovered by a forward simulation of the combined differential equation. Let us first recall that the BVP that we want to solve is given as

(4.1a)b0=x(0)x0=0,(4.1b)bT=p(T)g(x(T))=0,(4.1c)x˙(t)pH(x(t),p(t))=0,t[0,T],(4.1d)p˙(t)+xH(x(t),p(t))=0,t[0,T].

Using the shorthands

y=[xp],φ(y)=[pH(x(t),p(t))xH(x(t),p(t))]

and

b(y(0),y(T),x0)=[b0(y(0),x0)bT(y(T))]

the system of equations can be summarized as:

(4.2a)b(y(0),y(T),x0)=0,(4.2b)y˙(t)φ(y(t))=0,t[0,T].

This BVP has the 2nx differential equations y˙=φ, and the 2nx boundary conditions b and is therefore usually well-defined. We detail here three approaches to solve this TPBVP numerically, single shooting, collocation, and multiple shooting.

4.9.1 Single shooting

Single shooting starts with the following idea: for any guess of the initial value p0, we can use a numerical integration routine in order to obtain the state-costate trajectory as a function of p0,x0, i.e. y(t,p0,x0) for all t[0,T]. This is visualized in Figure 4.9.1. The result is that the differential equation (4.2b) is by construction already satisfied, as well as the initial boundary condition (4.1a). Thus, we only need to enforce the boundary condition (4.1b), which we can do using the terminal trajectory value y(T,p0,x0) :

bT(y(T,p0,x0))=:RT(p0)=0.

For nonlinear dynamics φ, this equation can generally not be solved explicitly. We then use the Newton's method, starting from an initial guess p0, and iterating to the solution, i.e. we iterate

(4.3)p0k+1=p0ktk(RTp0(p0k))1RT(p0k).

for some adequate step-size tk]0,1]. It is important to note that in order to evaluate Rp0(p0k) we have to compute the ODE sensitivities y(T,y0)p0.

In some cases, as said above, the forward simulation of the combined ODE might be an ill-conditioned problem so that single shooting cannot be employed. Even if the forward simulation problem is well-defined, the region of attraction of the Newton iteration on RT(p0)=0 can be very small, such that a good guess for p0 is often required. However, such a guess is typically unavailable. In the following example, we illustrate these observation on a simple optimal control problem.

Example 4.9.1

We consider the optimal control problem:

(4.4)minx(.),u(.)0Tx1(t)2+10x2(t)2+u(t)2dt subject to x˙1(t)=x1(t)x2(t)+u(t),x1(0)=0x˙2(t)=x1(t),x2(0)=1

with T=5. This example does not hold a terminal cost or constraints, such that the terminal condition reads as RT(p0)=p(T,p0,x0)=0. The state-costate trajectory at the solution is displayed in Figure 4.9.1. It is then interesting to build the function p0p(T,p0,x0) for various values of p0, see Figure 4.9.2. This function is very nonlinear, making it difficult for the Newton iteration to find the co-states' initial value p0 resulting in p(T,p0,x0)=0. More specifically, the Newton iteration (full steps or reduced steps) converges only for a specific set of initial guesses p00 provided to the iteration (4.3), see Figure 4.9.3.

traj

Figure 4.9.1: Illustration of the state and co-state trajectories for example 4.9.1 at the solution delivering p(T,p0,x0)=0 for T=5, note that λ in the image is actually p.

costate

Figure 4.9.2: Illustration of the map p0p(T,p0,x0), in the form of level curves for T=5. The black dot represents the solution of the TPBVP problem, where RT(p0)=p(T,p0,x0)=0. One can observe that the map is very nonlinear, such that the Newton method can struggle to converge to the solution p0 of the TPBVP, unless a very good initial guess is provided. Note that λ in the image is actually p

region of convergence

Figure 4.9.3: Illustration of the region of convergence of the Newton iteration (4.3) for problem (4.4) (in black, with full Newton steps on the left-hand side graph and with reduced steps on the right-hand side graph). Here we note p0,1,p0,2 the initial guess provided to the Newton iteration. The grey dots (at (3.22,8.48)) depict the solution to the TPBVP. Only a fairly small, disconnected and highly convoluted set of initial guess for the co-states initial conditions leads to a convergence of the Newton iteration. Note that λ in the image is actually p

A crucial observation that will motivate an alternative to the single-shooting approach is illustrated in Figure 4.9.4, where the map p0p(t,p0,x0) is displayed for the integration times t=3 and t=4. The crucial observation here is that the map is fairly linear up to t=3, and becomes increasingly nonlinear for larger integration times. This observation is general and motivates the core idea behind the alternatives to single shooting, namely that integration shall never be performed over long time intervals, so as to avoid creating strongly nonlinear functions in the TPBVP.

linear and nonlinear

Figure 4.9.4: Illustration of the map p0p(t,p0,x0), in the form of level curves for different times t. The black dot represents the solution of the TPBVP problem, where p(T,p0,x0)=0. One can observe that the map is close to linear for "small" integration times t (upper graphs, where t=3 ), and becomes increasingly nonlinear as the integration time increases (lower graph, where t=4 ), until it reaches the final time T=5, see Figure 4.9.2. This observation is general, and holds for most problems.

4.9.2 Multiple shooting

The nonlinearity of the integration map p0y(t,p0,x0) for long integration times t motivates the "breaking down" of the full integration in small pieces, so as to avoid creating very nonlinear map in the TPBVP conditions. The idea is originally due to Osborne, and is based on dividing the time interval [0,T] into (typically uniform) shooting intervals [tk,tk+1][0,T], where the most common choice is tk=kTN. Let us then frame the integration over a short time interval [tk,tk+1] with initial value βk as the function Φk(βk), defined as:

(4.5a)Φk(βk)=y(tk+1) where (4.5b)y˙(t)φ(y(t))=0,t[tk,tk+1] and y(tk)=βk

for k=0,,N1. We then rewrite the TPBVP conditions (4.2) as:

(4.6a)b(β0,βN,x0)=0, (boundary conditions) (4.6b)Φk(βk)βk+1=0,k=0,,N1.(continuity conditions)

One can then rewrite the conditions (4.6) altogether as the function:

(4.7)RMS(β,x0)=0

where we note β=(β0,,βN). A Newton iteration can be then deployed on (4.7) to find the variables β, it reads as:

(4.8)βk+1=βktk(RMSβ(βk,x0))1RMS(βk,x0).

for some step-size tk]0,1]. We illustrate the Multiple-Shooting approach in the following example.

Example 4.9.2

We consider the optimal control problem (4.4) from Example 4.9.1 with T=5. If we denote βk=(xk,pk), the boundary conditions for this example then become:

x0=[01],pN=0.

We illustrate the Multiple-Shooting procedure (12.14) in Figure 4.9.5 for N=5.

multishooting

Figure 4.9.5: Illustration of the state and co-state trajectories for problem (4.4) during the multiple-shooting iterations (4.8), such that the conditions Φk(βk)βk+1=0 are not yet fulfilled. Here, the discrete times tk are depicted as grey dashed lines, the discrete state-costates βk=(xk,1,xk,2,pk,1,pk,2) are depicted as black dots, and the resulting integrations Φk=(Φk,1x,Φk,2x,Φk,1p,Φk,2p) are depicted as white dots. The black curves represent the state-costate trajectories on the various time intervals [tk,tk+1]. At the solution (12.14), the conditions Φk(βk)=βk+1 are enforced for k=1,,N1, such that the black and white dots coincide on each discrete time tk.

One ought to observe that the time intervals [tk,tk+1] are of size TN, and hence get shorter as N increases. Because one can "control" the length of the time interval over which the integration is performed via N, and because the functions Φk(βk)βk+1 become less nonlinear as the length of the time interval decreases, one can make them "arbitrarily" linear by increasing N. It follows that a sufficiently large N typically allows one to solve the Multiple-Shooting conditions (4.6) using a Newton iteration even if no good initial guess is available.

jacobi of multishooting

Figure 4.9.6: Illustration of sparsity pattern of the Jacobian matrix RMSβ in the Newton iteration (4.8) for the optimal control problem (4.4) approached via indirect multiple-shooting, for Example 4.9.2. Here we use N=5. One can readily observe that the Jacobian matrix is sparse and highly structured. This structure arises via organising the algebraic conditions (4.7) and the variables β in time (i.e. in the order k=0,,N ). Note that here the last variables βN where eliminated using the equality βN=ΦN1(βk1). In the specific case of Example 4.9.2, the elimination has no impact on the Newton iteration because the boundary conditions b(β0,βN,x0) are linear.

It is important to observe that the set of algebraic conditions (4.7) holds a large number of variables β, such that the Newton iteration (4.8) is deployed using large Jacobian matrices RMSβ. However, these matrices are sparse, and if the algebraic conditions and variables are adequately organised, they are highly structured (see Figure 4.9.6), such that their factorization can be performed efficiently.

The second alternative to single-shooting is the object of the next section, and can be construed as an extreme case of Multiple-Shooting. We detail this next.

4.9.3 Collocation & Pseudo-spectral methods

The second alternative approach to single shooting is to use simultaneous collocation or Pseudo-spectral methods. As we will see next, the two approaches are fairly similar. The key idea behind these methods is to introduce all the variables involved in processing the integration of the dynamics, and the related algebraic conditions into the set of algebraic equations to be processed.

The most common implementation of this idea is based on the Orthogonal Collocation method.

We consider the collocation-based integration of the state-costate dynamics on a time interval [tk,tk+1] starting from the initial value βk, as described in equation

ck(vk,tk,i,βk)=[vk,0βkp˙k(tk,1,vk)f(vk,1,tk,i)p˙k(tk,d,vk)f(vk,d,tk,i)]=0.

where v is the coefficients of polynominal.

The integration is then based on solving a set of collocation equations:

(4.10a)vk,0=βk(4.10b)p˙(tk,i,vk)=φ(vk,i,tk,i),i=1,,d

for k=0,,N1, where tk,i[tk,tk+1] for i=0,,d, and the variables vkR2nx(d+1) hold the discretisation of the continuous state-costates dynamics. The TPBVP discretised using orthogonal collocation then holds the variables vk,i and βk for k=0,,N1 and i=1,,d, and the following constraints:

(4.11a)b(β0,βN,x0)=0, (boundary condition), (4.11b)p(tk+1,vk)βk+1=0, (continuity condition), (4.11c)vk,0βk=0, (initial values), (4.11d)p˙(tk,i,vk)φ(vk,i,tk,i)=0, (dynamics). 

One can observe that equations (4.11b) and (4.11c) are linear, while equation (4.11d) is nonlinear when the dynamics are nonlinear. One can also observe that the variables β0,,N1 can actually be eliminated from (4.11), to yield a slightly more compact set of equation, with k=0,,N1 and i=1,,d :

(4.12a)b(vk,0,vN,0,x0)=0, (boundary condition), (4.12b)p(tk+1,vk)vk+1,0=0, (continuity condition), (4.12c)p˙(tk,i,vk)φ(vk,i,tk,i)=0, (dynamics). 

This elimination does not modify the behavior of the Newton iteration. We can gather the algebraic conditions (4.12) and the variables βk,vk in the compact form

(4.13)RIC(w,x0)=0,

where w={v0,0,,v0,d,,vN1,0,,vN1,d,vN,0}. A Newton iteration can be then deployed on (4.13) to find the variables w, it reads as

(4.14)wk+1=wktk(RICw(wk,x0))1RIC(wk,x0),

for some step-size tk[0,1]. We illustrate the indirect collocation approach in the following example.

collocation

Figure 4.9.7: Illustration of the state and co-state trajectories for example 4.9.1 using the orthogonal collocation approach with N=20. The grey curves display the state-costate trajectories after the first full Newton step of (4.14), while the black curves report the state-costate trajectories at convergence. The discrete times tk are depicted as grey dashed lines, the discrete state-costates on the time grid tk,i are depicted as dots. Note that the continuity conditions (4.11b) in the collocation method are linear in the variables w, such that the trajectories are continuous after the first full Newton step (hence the grey curves are continuous, even though the problem is not solved yet).

Example 4.9.3

We consider the optimal control problem from Example 4.9.1 with T=5. We illustrate the Orthogonal Collocation procedure (4.11) in Figure 4.9.7 for N=10. The sparsity pattern of the Jacobian matrix RIC w from the Newton iteration (4.14) is illustrated in Figure 4.9.8. The variables and constraints were ordered with respect to time. Even though it is large, the complexity of forming factorisations of the Jacobian matrix RICw is limited as it is sparse and highly structured.

jacobi of collocation

Figure 4.9.8: Illustration of the sparsity structure for the Jacobian RICw in the Newton iteration (4.14)

Pseudo-spectral methods

Pseudo-spectral methods deploy a very similar approach to the one described here, to the exception that they skip the division of the time interval [0,T] into subintervals [tk,tk+1], and use a single set of basis functions spanning the entire time interval [0,T]. Pseudo-spectral methods for the TPBVP problem (4.2) can then be framed as:

(4.15a)b(p(0,v),p(T,v),x0)=0,(4.15b)p˙(tk,v)φ(p(tk,v),tk)=0,i=k,,n

where tk[0,T], and the variables vRnxn hold the discretization of the continuous dynamics. Because they attempt to capture the state trajectories in a single function p(t,v), with t[0,T], the Newton iteration solving constraints (4.15) generally holds a dense Jacobian matrix, for which structure-exploiting linear algebra is generally inefficient.

Released under the MIT License.