4 The Pontryagin Maximum Principle

This important chapter moves us beyond the linear dynamics assumed in Chapters 2 and 3, to consider much wider classes of optimal control problems, to introduce the fundamental Pontryagin Maximum Principle, and to illustrate its uses in a variety of examples.

4.1 Calculus of Variations, Hamiltonian Dynamics

We begin in this section with a quick introduction to some variational methods. These ideas will later serve as motivation for the Pontryagin Maximum Principle.

Assume we are given a smooth function $L : R^{n} \times R^{n} \to R, L = L (x, v); L$ is called the Lagrangian. Let $T > 0, x^{0}, x^{1} \in R^{n}$ be given.

Basic Problem of the Calculus of Variations

Find a curve $x^{*} (\cdot) : [0, T] \to R^{n}$ that minimizes the functional

I [x (\cdot)] := \int_{0}^{T} L (x (t), \dot{x} (t)) d t

among all fintions $x (\cdot)$ satisfying $x (0) = x^{0}$ and $x (T) = x^{1}$ .

Now assume $x^{*} (\cdot)$ solves our variational problem. The fundamental question is this: how can we characterize $x^{*} (\cdot)$ ?

4.1.1 Derivation of Euler-Lagrange Equations

Notation

We write $L = L (x, v)$ , and regard the variable $x$ as denoting position, the variable $v$ as denoting velocity. The partial derivatives of $L$ are

\frac{\partial L}{\partial x_{i}} = L_{x_{i}}, \frac{\partial L}{\partial v_{i}} = L_{v_{i}} (1 \leq i \leq n),

and we write

\nabla_{x} L := (L_{x_{1}}, \dots, L_{x_{n}}), \nabla_{v} L := (L_{v_{1}}, \dots, L_{v_{n}}) .

Theorem 4.1: Euler-Lagrange Equations

Let $x^{*} (\cdot)$ solve the calculus of variations problem. Then $x^{*} (\cdot)$ solves the Euler–Lagrange differential equations:

\begin{matrix} (E-L) & \frac{d}{d t} [\nabla_{v} L (x^{*} (t), {\dot{x}}^{*} (t))] = \nabla_{x} L (x^{*} (t), {\dot{x}}^{*} (t)) . \end{matrix}

The significance of preceding theorem is that if we can solve the Euler–Lagrange equations (E-L), then the solution of our original calculus of variations problem (assuming it exists) will be among the solutions.

Note that (E-L) is a quasilinear system of $n$ second–order ODE. The $i^{th}$ component of the system reads

\frac{d}{d t} [L_{v_{i}} (x^{*} (t), {\dot{x}}^{*} (t))] = L_{x_{i}} (x^{*} (t), {\dot{x}}^{*} (t)) .

Proof:

Select any smooth curve $y [0, T] \to R^{n}$ , satisfying $y (0) = y (T) = 0$ (Why? You'll know it). Define

i (τ) := I (x^{*} (\cdot) + τ y^{*} (\cdot)) .

for $τ \in R$ and $x (\cdot) = x^{*} (\cdot)$ . (To simplify we omit the superscript $*$ .) Notice that $x (\cdot) + τ y (\cdot)$ takes on the proper values at the endpoints ( $0, T$ ). Hence, since $x (\cdot)$ is minimizer (for the variational problem), we have

i (τ) \geq I [x (\cdot)] = i (0) .

Consequently $i (\cdot)$ has a minimum at $τ = 0$ , and so

i^{'} (0) = 0.

We must compute $i^{'} (τ)$ . Note first that

i (τ) = \int_{0}^{T} L (x (t) + τ y (t), \dot{x} (t) + τ \dot{y} (t)) d t .

and hence

i^{'} (τ) = \int_{0}^{T} (\sum_{i = 1}^{n} L_{x_{i}} (x (t) + τ y (t), \dot{x} (t) + τ \dot{y} (t)) y_{i} (t) + \sum_{i = 1}^{n} L_{v_{i}} (\dots) {\dot{y}}_{i} (t)) d t .

(Why sum: Chain rule) Let $τ = 0$ . Then

0 = i^{'} (0) = \sum_{i = 1}^{n} \int_{0}^{T} L_{x_{i}} (x (t), \dot{x} (t)) y_{i} (t) + L_{v_{i}} (x (t), \dot{x} (t)) {\dot{y}}_{i} (t) d t .

This equality holds for all choices of $y : [0, T] \to R^{n}$ , with $y (0) = y (T) = 0$ .

Fix any $1 \leq j \leq n$ . Choose $y (\cdot)$ s.t.

y_{j} (t) = ψ (t), y_{i} (t) \equiv 0, \forall i \neq j,

where $ψ$ is an arbitary function. Use this choice of $y (\cdot)$ above:

0 = \int_{0}^{T} L_{x_{j}} (x (t), \dot{x} (t)) ψ (t) + L_{v_{j}} (x (t), \dot{x} (t)) \dot{ψ} (t) d t .

Integrate by parts, recalling that $ψ (0) = ψ (T) = 0 \to [L_{v_{j}} ψ (t)] |_{0}^{T} = 0$ :

0 = \int_{0}^{T} [L_{x_{j}} (x (t), \dot{x} (t)) - \frac{d}{d t} (L_{v_{j}} (x (t), \dot{x} (t)))] ψ (t) d t .

This holds for alll $ψ : [0, T] \to R, ψ (0) = ψ (T) = 0$ and therefore

L_{x_{j}} (x (t), \dot{x} (t)) - \frac{d}{d t} (L_{v_{j}} (x (t), \dot{x} (t))) = 0

for all times $0 \leq t \leq T$ . To see this, observe that otherwise $L_{x_{j}} - \frac{d}{d t} (L_{v_{j}})$ would be, say, positive on some subinterval on $I \subseteq [0, T]$ . Choose $ψ \equiv 0$ off $I, ψ > 0$ on $I$ . Then

\int_{0}^{T} (L_{x_{j}} - \frac{d}{d t} (L_{v_{j}})) ψ d t > 0,

a contradiction.

◻

4.1.2 Conversion to Hamilton's Equations

Definition: generalized momentum

For the given curve $x (\cdot)$ , define

p (t) := \nabla_{v} L (x (t) \dot{x} (t)) (0 \leq t \leq T) .

We call $p (\cdot)$ the generalized momentum

Our intention now is to rewrite the Euler–Lagrange equations as a system of first–order ODE for $x (\cdot), p (\cdot)$ .

Important Hypothesis: Assume that for all $x, p \in R$ , we can solve the equation

p = \nabla_{v} L (x, v)

for $v$ in terms of $x$ and $p$ . That is, we suppose we can solve the identity for $v = v (x, p)$ .

Definition: dynamical systems Hamiltonian $H$

Define the dynamical systems Hamiltonian $H : R^{n} \times R^{n} \to R$ by the formula

H (x, p) = p \cdot v (x, p) - L (x, v (x, p)),

where $v$ id defined above.

Notation

The partial derivatives of $H$ are

\frac{\partial H}{\partial x_{i}} = H_{x_{i}}, \frac{\partial H}{\partial p_{i}} = H_{p_{i}} (1 \leq i \leq n),

and we write

\nabla_{x} H := (H_{x_{1}}, \dots, H_{x_{n}}), \nabla_{p} H := (H_{p_{1}}, \dots, H_{p_{n}}) .

Theorem 4.2: Hamiltonian Dynamics

Let $x (\cdot)$ solve the Euler-Lagrange equations (E-L) and define $p (\cdot)$ as above. Then the pair $(x (\cdot), p (\cdot))$ solves Hamilton's equations:

{\begin{cases} \dot{x} (t) = \nabla_{p} H (x (t), p (t)) & (O D E) \\ \dot{p} (t) = - \nabla_{x} H (x (t), p (t)) & (A D J) \end{cases}

Furthermore, the mapping $t \to H (x (t), p (t))$ is constant.

Proof: Recall that $H (x, p) = p \cdot v (x, p) - L (x, v (x, p))$ , where $v = v (x, p)$ or, equivalently, $p = \nabla_{v} L (x, v)$ . then

\begin{aligned} \nabla_{x} H (x, p) & = p \cdot \nabla_{x} v - [\nabla_{x} L (x, v (x, p)) + \nabla_{v} L (x, v (x, p)) \cdot \nabla_{x} v] \\ = [p - \nabla_{v} L (x, v)] \cdot \nabla_{x} v - \nabla_{x} L (x, v (x, p)) \\ = - \nabla_{x} L (x, v (x, p)) \end{aligned}

(Why not consider $p$ in interms of $x$ ? See $H (x, p)$ , they are currently divided)

because $p = \nabla_{v} L$ . Now $p (t) = \nabla_{v} L (x (t), \dot{x} (t))$ iff $\dot{x} (t) = v (x (t), p (t))$ . Therefore (E-L) implies

\begin{aligned} \dot{p} (t) & = \nabla_{x} L (x (t), \dot{x} (t)) \\ = \nabla_{x} L (x (t), (x (t), p (t))) = - \nabla_{x} H (x (t), p (t)) . \end{aligned}

Also

\nabla_{p} H (x, p) = v (x, p) + p \cdot \nabla_{p} v - \nabla_{v} L \cdot \nabla_{p} v = v (x, p)

since $p = \nabla_{v} L (x, v (x, p))$ . This implies

\nabla_{p} H (x (t), p (t)) = v (x (t), p (t)) .

But

p (t) = \nabla_{v} L (x (t), \dot{x} (t))

and so $\dot{x} (t) = v (x (t), p (t))$ . Therefore

\dot{x} (t) = \nabla_{p} H (x (t), p (t)) .

Finaly note that

\begin{aligned} \frac{d}{d t} H (x (t), p (t)) & = \nabla_{x} H \cdot \dot{x} (t) + \nabla_{p} H \cdot \dot{p} (t) \\ = \nabla_{x} H \cdot \nabla_{p} H + \nabla_{p} H \cdot (- \nabla_{x} H) = 0. \end{aligned}

$\square$

A Physical Example

We define the Lagrangian

L (x, v) = \frac{m | v |^{2}}{2} - V (x),

which we interpret as the kinetic energy minus the potential energy $V$ . then

{\begin{cases} \nabla_{x} L = - \nabla V (x) \\ \nabla_{v} L = m v . \end{cases}

Therefore the Euler-Lagrange equation is

m \ddot{x} (t) = - \nabla V (x (t)),

which is Newton’s law. Furthermore

p = \nabla_{v} L (x, v) = m v

is the momentum, and the Hamiltonian is

\begin{aligned} H (x, p) & = p \cdot \frac{p}{m} - L (x, \frac{p}{m}) \\ = \frac{| p |^{2}}{m} - \frac{m}{2} {| \frac{p}{m} |}^{2} + V (x) = \frac{| p |^{2}}{2 m} + V (x) . \end{aligned}

the sum of the kinetic and potential energies. For this example, Hamilton’s equations read

{\begin{cases} \dot{x} (t) = \frac{p (t)}{m} \\ \dot{p} (t) = - \nabla V (x (t)) . \end{cases}

4.2 Review of Lagrange Multipliers

Constraint and Lagrange Multipliers

What first strikes us about general optimal control problems is the occurence of many constraints, most notably that the dynamics be governed by the differential equation

{\begin{cases} \dot{x} (t) = f (x (t), α (t)) (t > 0) \\ x (0) = x^{0} . \end{cases}

This is in contrast to standard calculus of variations problems, as discussed in §4.1, where we could take any curve $x (\cdot)$ as a candidate for a minimizer.

Now it is a general principle of variational and optimization theory that “constraints create Lagrange multipliers” and furthermore that these Lagrange multipliers often “contain valuable information”. This section provides a quick review of the standard method of Lagrange multipliers in solving multivariable constrained optimization problems.

Unconstrainted Optimization

Suppose first that we wish to find a maximum point for a given smooth function $f : R^{n} \to R$ . In this case there is no constraint, and therefore if $f (x *) = max_{x \in R^{n}} f (x)$ , then $x *$ is a critical point of $f$ :

\nabla f (x *) = 0.

Constrainted Optimization

We modify the problem above by introducing the region

R := {x \in R^{n} | g (x) \leq 0},

determined by some given function $g : R^{n} \to R$ . Suppose $x * \in R$ and $f (x *) = max_{x \in R} f (x)$ . We would like a characterization of $x *$ in terms of the gradients of $f$ and $g$ .

Case 1: $x *$ lies in the interior of $R$ Then the constraint is inactive, and so
$\nabla f (x *) = 0.$
Case 2: $x *$ lies on $\partial R$ We look at the direction of the vector $\nabla f (x *)$ . A geometric picture like Figure above is impossible; for if it were so, then $f (y *)$ would be greater that $f (x *)$ for some other point $y * \in \partial R$ . So it must be $\nabla f (x *)$ is perpendicular to $\partial R$ at $x *$ (Suppose not perpendicular, then there exists a unit vector $τ$ in the tangential direction s.t. $τ \cdot \nabla f (x *) > 0$ , and this is the direction derivative) Since $\nabla g$ is perpendicular to $\partial R = {g = 0}$ (gradient is the fastest change direction, so normal to perpendicular), it follows that $\nabla f (x *)$ is parallel to $\nabla g (x *)$ . Therefore
$\begin{matrix} (4.4) & \nabla f (x *) = λ \nabla g (x *) \end{matrix}$
for some real number $λ$ , called a Lagrange multiplier.

Critique

The foregoing argument is in fact incomplete, since we implicitly assumed that $\nabla g (x^{*}) \neq 0$ , in which case the Implicit Function Theorem implies that the set ${g = 0}$ is an $(n - 1)$ -dimension surface near $x^{*}$ (as illustrated).

If instead $\nabla g (x^{*}) = 0$ , the set ${g = 0}$ need not have this simple form near $x^{*}$ ; and the reasoning discussed as Case 2 above is not complete.

The correct statement is this:

There exist real numbers $λ, μ$ not both equal to $0$ , s.t.

\begin{matrix} (4.5) & μ \nabla f (x^{*}) = λ \nabla g (x^{*}) . \end{matrix}

If $μ \neq 0$ , we can divide by $μ$ and convert to the formulation (4.4). And if $\nabla g (x^{*}) = 0$ , we can take $λ = 1, μ = 0$ , making assertion (4.5) correct (if not particularly useful).

4.3 Statemet of Pontryagin Maximum Principle

We come now to the key assertion of this chapter, the theoretically interesting and practically useful theorem that if $α^{*} (\cdot)$ is an optimal control, then there exists a function $p^{*} (\cdot)$ , called the costate, that satisfies a certain maximization principle. We should think of the function $p^{*} (\cdot)$ as a sort of Lagrange multiplier, which appears owing to the constraint that the optimal curve $x^{*} (\cdot)$ must satisfy (ODE). And just as conventional Lagrange multipliers are useful for actual calculations, so also will be the costate.

We quote Francis Clarke [C2]: “The maximum principle was, in fact, the culmination of a long search in the calculus of variations for a comprehensive multiplier rule, which is the correct way to view it: $p (t)$ is a “Lagrange multiplier” ... It makes optimal control a design tool, whereas the calculus of variations was a way to study nature.”

4.3.1 Fixed Time, Free Endpoint Problem

Let us review the basic set-up for our control problem.

We are given $A \subseteq R^{m}$ and also $f : R^{n} \times A \to R^{n}, x^{0} \in R^{n}$ . We as before denote the set of admissible controls by

A = {α (\cdot) : [0, \infty) \to A ∣ α (\cdot) is measurable} .

Then given $α (\cdot) \in A$ , we solve for the corresponding evolution of our system:

{\begin{cases} \dot{x} (t) = f (x (t), α (t)) (t \geq 0) \\ x (0) = x^{0} \end{cases}

We also introduce the payoff functional (To show the influence of free endpoint)

\begin{matrix} (P) & P [α (\cdot)] = \int_{0}^{T} r (x (t), α (t)) d t + g (x (T)) \end{matrix}

where the terminal time $T > 0$ , running payoff $r : R^{n} \times A \to R$ and terminal payoff $g : R^{n} \to R$ are given.

Basic Problem

Find a control $α^{*} (\cdot)$ such that

P [α^{*} (\cdot)] = max_{α (\cdot) \in A} P [α (\cdot)]

The Pontryagin Maximum Principle, stated below, asserts the existence of a function $p^{*} (\cdot)$ , which together with the optimal trajectory $x^{*} (\cdot)$ satisfies an analog of Hamilton's ODE from §4.1.2. For this, we will need an appropriate Hamiltonian:

Definition

The control theory Hamiltonian is the function

H (x, p, a) := f (x, a) \cdot p + r (x, a) (x, p \in R^{n}, a \in A)

Theorem 4.3 (Pontryagin Maximum Principle)

Assume $α^{*} (\cdot)$ is optimal for ( ODE ), ( $P$ ), and $x^{*} (\cdot)$ is the corresponding trajectory.

Then there exists a function $p^{*} : [0, T] \to R^{n}$ such that

\begin{matrix} (ODE) & {\dot{x}}^{*} (t) = \nabla_{p} H (x^{*} (t), p^{*} (t), α^{*} (t)), \\ (ADJ) & {\dot{p}}^{*} (t) = - \nabla_{x} H (x^{*} (t), p^{*} (t), α^{*} (t)), \end{matrix}

and

\begin{matrix} (M) & H (x^{*} (t), p^{*} (t), α^{*} (t)) = max_{a \in A} H (x^{*} (t), p^{*} (t), a) (0 \leq t \leq T) \end{matrix}

In addition,

the mapping t \mapsto H (x^{*} (t), p^{*} (t), α^{*} (t)) is constant.

Finally, we have the terminal condition

p^{*} (T) = \nabla g (x^{*} (T))

Remarks and Intepretations

The identities (ADJ) are the adjoint equations and (M) the maximization principle. Notice that (ODE) and (ADJ) resemble the structure of Hamilton's equations, discussed in §4.1. We also call ( $T$ ) the transversality condition and will discuss its significance later.
More precisely, formula (ODE) says that for $1 \leq i \leq n$ , we have

{\dot{x}}^{i *} (t) = H_{p_{i}} (x^{*} (t), p^{*} (t), α^{*} (t)) = f^{i} (x^{*} (t), α^{*} (t)),

which is just the original equation of motion. Likewise, (ADJ) says

\begin{aligned} {\dot{p}}^{i *} (t) & = - H_{x_{i}} (x^{*} (t), p^{*} (t), α^{*} (t)) \\ = - \sum_{j = 1}^{n} p^{j *} (t) f_{x_{i}}^{j} (x^{*} (t), α^{*} (t)) - r_{x_{i}} (x^{*} (t), α^{*} (t)) . \end{aligned}

4.3.2 Free Time, Fixed Endpoint Problem

Let us next record the appropriate form of the Maximum Principle for a fixed endpoint problem.

As before, given a control $α (\cdot) \in A$ , we solve for the corresponding evolution of our system:

\begin{matrix} (ODE) & {\begin{cases} \dot{x} (t) = f (x (t), α (t)) (t \geq 0) \\ x (0) = x^{0} \end{cases} \end{matrix}

Assume now that a target point $x^{1} \in R^{n}$ is given. We introduce then the payoff functional

P [α (\cdot)] = \int_{0}^{τ} r (x (t), α (t)) d t .

Here $r : R^{n} \times A \to R$ is the given running payoff, and $τ = τ [α (\cdot)] \leq \infty$ denotes the first time the solution of (ODE) hits the target point $x^{1}$ .

As before, the basic problem is to find an optimal control $α^{*} (\cdot)$ such that

\begin{matrix} (P) & P [α^{*} (\cdot)] = max_{α (\cdot) \in A} P [α (\cdot)] . \end{matrix}

Define the Hamiltonian $H$ as in §4.3.1.

Theorem 4.4 (Pontryagin Maximum Principle)

Assume $α^{*} (\cdot)$ is optimal for (ODE), ( $P$ ) and $x^{*} (\cdot)$ is the corresponding trajectory.

Then there exists a function $p^{*} : [0, τ^{*}] \to R^{n}$ such that

\begin{matrix} (ODE) & {\dot{x}}^{*} (t) = \nabla_{p} H (x^{*} (t), p^{*} (t), α^{*} (t)), \\ (ADJ) & {\dot{p}}^{*} (t) = - \nabla_{x} H (x^{*} (t), p^{*} (t), α^{*} (t)), \end{matrix}

and

\begin{matrix} (M) & H (x^{*} (t), p^{*} (t), α^{*} (t)) = max_{a \in A} H (x^{*} (t), p^{*} (t), a) (0 \leq t \leq τ^{*}) \end{matrix}

Also,

H (x^{*} (t), p^{*} (t), α^{*} (t)) \equiv 0 (0 \leq t \leq τ^{*})

Here $τ^{*}$ denotes the first time the trajectory $x^{*} (\cdot)$ hits the target point $x^{1}$ . We call $x^{*} (\cdot)$ the state of the optimally controlled system and $p^{*} (\cdot)$ the costate.

Remark and Warning

More precisely, we should define

H (x, p, q, a) = f (x, a) \cdot p + r (x, a) q (q \in R)

A more careful statement of the Maximum Principle says "there exists a constant $q \geq 0$ and a function $p^{*} : [0, t^{*}] \to R^{n}$ such that (ODE), (ADJ), and (M) hold".

If $q > 0$ , we can renormalize to get $q = 1$ , as we have done above.
If $q = 0$ , then $H$ does not depend on running payoff $r$ and in this case the Pontryagin Maximum Principle is not useful. This is a so-called "abnormal problem".

Compare these comments with the critique of the usual Lagrange multiplier method at the end of §4.2, and see also the proof in §A. 5 of the Appendix.

4.4 Application and Examples

How to Use the Maximum Principle

We mentioned earlier that the costate $p^{*} (\cdot)$ can be interpreted as a sort of Lagrange multiplier.

Calculations with Lagrange multipliers

Recall our discussion in §4.2 about finding a point $x^{*}$ that maximizes a function $f$ , subject to the requirement that $g \leq 0$ . Now $x^{*} = {(x_{1}^{*}, \dots, x_{n}^{*})}^{T}$ has $n$ unknown components we must find. Somewhat unexpectedly, it turns out in practice to be easier to solve (4.4) for the $n + 1$ unknowns $x_{1}^{*}, \dots, x_{n}^{*}$ and $λ$ . We repeat this key insight: it is actually easier to solve the problem if we add a new unknown, namely the Lagrange multiplier. Worked examples abound in multivariable calculus books.

Calculations with the costate

This same principle is valid for our much more complicated control theory problems: it is usually best not just to look for an optimal control $α^{*} (\cdot)$ and an optimal trajectory $x^{*} (\cdot)$ alone, but also to look as well for the costate $p^{*} (\cdot)$ . In practice, we add the equations (ADJ) and (M) to (ODE) and then try to solve for $α^{*} (\cdot), x^{*} (\cdot)$ and for $p^{*} (\cdot)$ .

The following examples show how this works in practice, in certain cases for which we can actually solve everything explicitly or, failing that, at least deduce some useful information.

4.4.1 Example 1: Linear Time-Optimal Control.

For this example, let $A$ denote the cube $[- 1, 1]^{n}$ in $R^{n}$ . We consider again the linear dynamics:

\begin{matrix} (ODE) & {\begin{cases} \dot{x} (t) = M x (t) + N α (t) \\ x (0) = x^{0} \end{cases} \end{matrix}

for the payoff functional

P [α (\cdot)] = - \int_{0}^{τ} 1 d t = - τ

where $τ$ denotes the first time the trajectory hits the target point $x^{1} = 0$ . We have $r \equiv - 1$ , and so

H (x, p, a) = f \cdot p + r = (M x + N a) \cdot p - 1

In §3.2 we introduced the Hamiltonian $H = (M x + N a) \cdot p$ , which differs by a constant from the present $H$ . We can redefine $H$ in §3.2 to match the present theory: compare then Theorems 3.4 and 4.4.

4.4.2 Example 2: Control of Production and Consumption

We return to Example 1 in Chapter 1, a model for optimal consumption in a simple economy. Recall that

\begin{aligned} x (t) = output of economy at time t \\ α (t) = fraction of output reinvested at time t \end{aligned}

We have the constraint $0 \leq α (t) \leq 1$ ; that is, $A = [0, 1] \subset R$ . The economy evolves according to the dynamics

\begin{matrix} (ODE) & {\begin{cases} \dot{x} (t) = α (t) x (t) (0 \leq t \leq T) \\ x (0) = x^{0} \end{cases} \end{matrix}

where $x^{0} > 0$ and we have set the growth factor $k = 1$ . We want to maximize the total consumption

P [α (\cdot)] := \int_{0}^{T} (1 - α (t)) x (t) d t

How can we characterize an optimal control $α^{*} (\cdot)$ ?

Introducing the maximum principle

We apply Pontryagin Maximum Principle, and to simplify notation we will not write the superscripts $*$ for the optimal control, trajectory, etc. We have $n = m = 1$ ,

f (x, a) = x a, g \equiv 0, r (x, a) = (1 - a) x

and therefore

\begin{aligned} H (x, p, a) & = f (x, a) p + r (x, a) \\ = p x a + (1 - a) x = x + a x (p - 1) \end{aligned}

The dynamical equation is

\begin{matrix} (ODE) & \dot{x} (t) = H_{p} = α (t) x (t) \end{matrix}

and the adjoint equation is

\begin{matrix} (ADJ) & \dot{p} (t) = - H_{x} = - 1 - α (t) (p (t) - 1) \end{matrix}

The terminal condition reads

\begin{matrix} (T) & p (T) = g_{x} (x (T)) = 0 \end{matrix}

Lastly, the maximality principle asserts

\begin{matrix} (M) & H (x (t), p (t), α (t)) = max_{0 \leq a \leq 1} {x (t) + a x (t) (p (t) - 1)} \end{matrix}

Using the maximum principle

We now deduce useful information from (ODE), (ADJ), (M) and (T).

According to (M), at each time $t$ the control value $α (t)$ must be selected to maximize $a (p (t) - 1)$ for $0 \leq a \leq 1$ . This is so, since $x (t) > 0$ (It's fixed for the control is decided by the current state $x (t)$ ). Thus ( $a$ is a placeholder variable used to denote $α (t)$ )

α (t) = {\begin{cases} 1 & if p (t) > 1 \\ 0 & if p (t) \leq 1 \end{cases}

Hence if we know $p (\cdot)$ , we can design the optimal control $α (\cdot)$ . So next we must solve for the costate $p (\cdot)$ . We know from $(ADJ)$ and $(T)$ that

{\begin{cases} \dot{p} (t) = - 1 - α (t) [p (t) - 1] & (0 \leq t \leq T) \\ p (T) = 0 \end{cases}

Since $p (T) = 0$ , we deduce by continuity that $p (t) \leq 1$ for $t$ close to $T, t < T$ . Thus $α (t) = 0$ for such values of $t$ . Therefore $\dot{p} (t) = - 1$ , and consequently $p (t) = T - t$ for times $t$ in this interval. So we have that $p (t) = T - t$ so long as $p (t) \leq 1$ . And this holds for $T - 1 \leq t \leq T$

But for times $t \leq T - 1$ , with $t$ near $T - 1$ , we have $α (t) = 1$ ; and so ( $A D J$ ) becomes

\dot{p} (t) = - 1 - (p (t) - 1) = - p (t)

Since $p (T - 1) = 1$ , we see that $p (t) = e^{T - 1 - t} > 1$ for all times $0 \leq t \leq T - 1$ . In particular there are no switches in the control over this time interval.

Restoring the superscript * to our notation, we consequently deduce that an optimal control is

α^{*} (t) = {\begin{cases} 1 & if 0 \leq t \leq t^{*} \\ 0 & if t^{*} \leq t \leq T \end{cases}

for the optimal switching time $t^{*} = T - 1$ .

We leave it as an exercise to compute the switching time if the growth constant $k \neq 1$ . $◻$

4.4.3 Example 3: A Simple Linear-Quadratic Regulator

We take $n = m = 1$ for this example, and consider the simple linear dynamics

\begin{matrix} (ODE) & {\begin{cases} \dot{x} (t) = x (t) + α (t) \\ x (0) = x^{0} \end{cases} \end{matrix}

with the quadratic cost functional

\int_{0}^{T} x (t)^{2} + α (t)^{2} d t,

which we want to minimize. So we want to maximize the payoff functional

\begin{matrix} (P) & P [α (\cdot)] = - \int_{0}^{T} x (t)^{2} + α (t)^{2} d t . \end{matrix}

For this problem, the values of the controls are not constrained; that is, $A = R$ .

Introducing the maximum principle

To simplify notation further we again drop the superscripts *. We have $n = m = 1$ ,

f (x, a) = x + a, g \equiv 0, r (x, a) = - x^{2} - a^{2};

and hence

H (x, p, a) = f p + r = (x + a) p - (x^{2} + a^{2})

The maximality condition becomes

\begin{matrix} (M) & H (x (t), p (t), α (t)) = max_{a \in R} {- (x (t)^{2} + a^{2}) + p (t) (x (t) + a)} \end{matrix}

We calculate the maximum on the right hand side by setting $H_{a} = - 2 a + p = 0$ . Thus $a = \frac{p}{2}$ , and so

α (t) = \frac{p (t)}{2} .

The dynamical equations are therefore

\begin{matrix} (ODE) & \dot{x} (t) = x (t) + \frac{p (t)}{2} \end{matrix}

and

\begin{matrix} (ADJ) & \dot{p} (t) = - H_{x} = 2 x (t) - p (t) . \end{matrix}

Moreover $x (0) = x^{0}$ , and the terminal condition is

\begin{matrix} (T) & p (T) = 0 . \end{matrix}

Using the Maximum Principle

So we must look at the system of equations

(\binom{\dot{x}}{\dot{p}}) = \underset{M}{\underset{⏟}{(\begin{array}{cc} 1 & 1 / 2 \\ 2 & - 1 \end{array})}} (\binom{x}{p}),

the general solution of which is

(\binom{x (t)}{p (t)}) = e^{t M} (\binom{x^{0}}{p^{0}})

Since we know $x^{0}$ , the task is to choose $p^{0}$ so that $p (T) = 0$ .

Feedback controls

An elegant way to do so is to try to find optimal control in linear feedback form; that is, to look for a function $c (\cdot) : [0, T] \to R$ for which

α (t) = c (t) x (t)

We henceforth suppose that an optimal feedback control of this form exists, and attempt to calculate $c (\cdot)$ . Now

\frac{p (t)}{2} = α (t) = c (t) x (t)

whence $c (t) = \frac{p (t)}{2 x (t)}$ . Define now

d (t) := \frac{p (t)}{x (t)}

so that $c (t) = \frac{d (t)}{2}$ .

We will next discover a differential equation that $d (\cdot)$ satisfies. Compute

\dot{d} = \frac{\dot{p}}{x} - \frac{p \dot{x}}{x^{2}},

and recall that

{\begin{cases} \dot{x} = x + \frac{p}{2} \\ \dot{p} = 2 x - p . \end{cases}

Therefore

\dot{d} = \frac{2 x - p}{x} - \frac{p}{x^{2}} (x + \frac{p}{2}) = 2 - d - d (1 + \frac{d}{2}) = 2 - 2 d - \frac{d^{2}}{2} .

Since $p (T) = 0$ , the terminal condition is $d (T) = 0$ . So we have obtained a nonlinear first-order ODE for $d (\cdot)$ with a terminal boundary condition:

\begin{matrix} (R) & {\begin{cases} \dot{d} = 2 - 2 d - \frac{1}{2} d^{2} (0 \leq t < T) \\ d (T) = 0 . \end{cases} \end{matrix}

This is called the Riccati equation.

In summary so far, to solve our linear-quadratic regulator problem, we need to first solve the Riccati equation $(R)$ and then set

α (t) = \frac{1}{2} d (t) x (t)

How to solve the Riccati equation. It turns out that we can convert ( $R$ ) it into a second-order, linear ODE. To accomplish this, write

d (t) = \frac{2 \dot{b} (t)}{b (t)}

for a function $b (\cdot)$ to be found. What equation does $b (\cdot)$ solve? We compute

\dot{d} = \frac{2 \ddot{b}}{b} - \frac{2 (\dot{b})^{2}}{b^{2}} = \frac{2 \ddot{b}}{b} - \frac{d^{2}}{2} .

Hence $(R)$ gives

\frac{2 \ddot{b}}{b} = \dot{d} + \frac{d^{2}}{2} = 2 - 2 d = 2 - 2 \frac{2 \dot{b}}{b}

and consequently

{\begin{cases} \ddot{b} = b - 2 \dot{b} (0 \leq t < T) \\ \dot{b} (T) = 0, b (T) = 1 \end{cases}

This is a terminal-value problem for second-order linear ODE, which we can solve by standard techniques. We then set $d = \frac{2 \dot{b}}{b}$ , to derive the solution of the Riccati equation $(R)$ .

We will generalize this example later to systems, in §5.2.

4.4.4 Example 4: Moon Lander

This is a much more elaborate and interesting example, already introduced in Chapter 1.

Introduce the notation

\begin{aligned} h (t) & = height at time t \\ v (t) & = velocity = \dot{h} (t) \\ m (t) & = mass of spacecraft (changing as fuel is used up) \\ α (t) & = thrust at time t \end{aligned}

The thrust is constrained so that $0 \leq α (t) \leq 1$ ; that is, $A = [0, 1]$ . There are also the constraints that the height and mass be nonnegative: $h (t) \geq 0, m (t) \geq 0$ .

The dynamics are

{\begin{aligned} \dot{h} (t) & = v (t) \\ \dot{v} (t) & = - g + \frac{α (t)}{m (t)} \\ \dot{m} (t) & = - k α (t) \end{aligned}

with initial conditions

{\begin{cases} h (0) = h_{0} > 0 \\ v (0) = v_{0} \\ m (0) = m_{0} > 0 \end{cases}

The goal is to land on the moon safely, maximizing the remaining fuel $m (τ)$ , where $τ = τ [α (\cdot)]$ is the first time $h (τ) = v (τ) = 0$ . Since $α = - \frac{\dot{m}}{k}$ , our intention is equivalently to minimize the total applied thrust before landing; so that

P [α (\cdot)] = - \int_{0}^{τ} α (t) d t

This is so since

\int_{0}^{τ} α (t) d t = \frac{m_{0} - m (τ)}{k}

Introducing the maximum principle

In terms of the general notation, we have

x (t) = [\begin{matrix} h (t) \\ v (t) \\ m (t) \end{matrix}], f = [\begin{matrix} v \\ - g + a / m \\ - k a \end{matrix}] .

Hence the Hamiltonian is

\begin{aligned} H (x, p, a) & = f \cdot p + r \\ = (v, - g + a / m, - k a) \cdot (p_{1}, p_{2}, p_{3}) - a \\ = - a + p_{1} v + p_{2} (- g + \frac{a}{m}) + p_{3} (- k a) \end{aligned}

We next have to figure out the adjoint dynamics $(ADJ)$ . For our particular Hamiltonian,

H_{x_{1}} = H_{h} = 0, H_{x_{2}} = H_{v} = p_{1}, H_{x_{3}} = H_{m} = - \frac{p_{2} a}{m^{2}}

Therefore

\begin{matrix} (ADJ) & {\begin{cases} {\dot{p}}^{1} (t) = 0 \\ {\dot{p}}^{2} (t) = - p^{1} (t) \\ {\dot{p}}^{3} (t) = \frac{p^{2} (t) α (t)}{m (t)^{2}} \end{cases} \end{matrix}

The maximization condition $(M)$ reads

\begin{aligned} (M) & H (x (t), p (t), α (t)) = max_{0 \leq a \leq 1} H (x (t), p (t), a) \\ = & max_{0 \leq a \leq 1} {- a + p^{1} (t) v (t) + p^{2} (t) [- g + \frac{a}{m (t)}] + p^{3} (t) (- k a)} \\ = & p^{1} (t) v (t) - p^{2} (t) g + max_{0 \leq a \leq 1} {a (- 1 + \frac{p^{2} (t)}{m (t)} - k p^{3} (t))} \end{aligned}

Thus the optimal control law is given by the rule:

α (t) = {\begin{cases} 1 & if 1 - \frac{p^{2} (t)}{m (t)} + k p^{3} (t) < 0 \\ 0 & if 1 - \frac{p^{2} (t)}{m (t)} + k p^{3} (t) > 0 \end{cases}

Using the maximum principle

Now we will attempt to figure out the form of the solution, and check it accords with the Maximum Principle.

Let us start by guessing that we first leave rocket engine of (i.e., set $α \equiv 0$ ) and turn the engine on only at the end. Denote by $τ$ the first time that $h (τ) = v (τ) = 0$ , meaning that we have landed. We guess that there exists a switching time $t^{*} < τ$ when we turned engines on at full power (i.e., set $α \equiv 1$ ). Consequently,

α (t) = {\begin{cases} 0 & for 0 \leq t \leq t^{*} \\ 1 & for t^{*} \leq t \leq τ \end{cases}

Therefore, for times $t^{*} \leq t \leq τ$ our ODE becomes

{\begin{cases} \dot{h} (t) = v (t) \\ \dot{v} (t) = - g + \frac{1}{m (t)} \\ \dot{m} (t) = - k \end{cases} (t^{*} \leq t \leq τ)

with $h (τ) = 0, v (τ) = 0, m (t^{*}) = m_{0}$ . We solve these dynamics:

{\begin{cases} m (t) = m_{0} + k (t^{*} - t) \\ v (t) = g (τ - t) + \frac{1}{k} \log [\frac{m_{0} + k (t^{*} - τ)}{m_{0} + k (t^{*} - t)}] \\ h (t) = complicated formula \end{cases}

Now put $t = t^{*}$ :

{\begin{cases} m (t^{*}) = m_{0} \\ v (t^{*}) = g (τ - t^{*}) + \frac{1}{k} \log [\frac{m_{0} + k (t^{*} - τ)}{m_{0}}] \\ h (t^{*}) = - \frac{g {(t^{*} - τ)}^{2}}{2} - \frac{m_{0}}{k^{2}} \log [\frac{m_{0} + k (t^{*} - τ)}{m_{0}}] + \frac{t^{*} - τ}{k} \end{cases}

Suppose the total amount of fuel to start with was $m_{1}$ ; so that $m_{0} - m_{1}$ is the weight of the empty spacecraft. When $α \equiv 1$ , the fuel is used up at rate $k$ . Hence

k (τ - t^{*}) \leq m_{1}

and so $0 \leq τ - t^{*} \leq \frac{m_{1}}{k}$ . Before time $t^{*}$ , we set $α \equiv 0$ . Then (ODE) reads

{\begin{cases} \dot{h} = v \\ \dot{v} = - g \\ \dot{m} = 0 \end{cases}

and thus

{\begin{cases} m (t) \equiv m_{0} \\ v (t) = - g t + v_{0} \\ h (t) = - \frac{1}{2} g t^{2} + t v_{0} + h_{0} \end{cases}

We combine the formulas for $v (t)$ and $h (t)$ , to discover

h (t) = h_{0} - \frac{1}{2 g} (v^{2} (t) - v_{0}^{2}) (0 \leq t \leq t^{*})

We deduce that the freefall trajectory $(v (t), h (t))$ therefore lies on a parabola

h = h_{0} - \frac{1}{2 g} (v^{2} - v_{0}^{2}) .

If we then move along this parabola until we hit the soft-landing curve from the previous picture, we can then turn on the rocket engine and land safely.

In the second case illustrated below, we miss switching curve, and hence cannot land safely on the moon switching once.

To justify our guess about the structure of the optimal control, let us now find the costate $p (\cdot)$ so that $α (\cdot)$ and $x (\cdot)$ described above satisfy $(ODE), (ADJ), (M)$ . To do this, we will have to figure out appropriate initial conditions

p^{1} (0) = λ_{1}, p^{2} (0) = λ_{2}, p^{3} (0) = λ_{3}

We solve $(ADJ)$ for $α (\cdot)$ as above, and find

{\begin{cases} p^{1} (t) \equiv λ_{1} & (0 \leq t \leq τ) \\ p^{2} (t) = λ_{2} - λ_{1} t & (0 \leq t \leq τ) \\ p^{3} (t) = {\begin{cases} λ_{3} & (0 \leq t \leq t^{*}) \\ λ_{3} + \int_{t^{*}}^{t} \frac{λ_{2} - λ_{1} s}{(m_{0} + k (t^{*} - s))^{2}} d s & (t^{*} \leq t \leq τ) . \end{cases} \end{cases}

Define

r (t) := 1 - \frac{p^{2} (t)}{m (t)} + p^{3} (t) k

then

\dot{r} = - \frac{{\dot{p}}^{2}}{m} + \frac{p^{2} \dot{m}}{m^{2}} + {\dot{p}}^{3} k = \frac{λ_{1}}{m} + \frac{p^{2}}{m^{2}} (- k α) + (\frac{p^{2} α}{m^{2}}) k = \frac{λ_{1}}{m (t)} .

Choose $λ_{1} < 0$ , so that $r$ is decreasing. We calculate

r (t^{*}) = 1 - \frac{(λ_{2} - λ_{1} t^{*})}{m_{0}} + λ_{3} k

and then adjust $λ_{2}, λ_{3}$ so that $r (t^{*}) = 0$ . Then $r$ is nonincreasing, $r (t^{*}) = 0$ , and consequently $r > 0$ on $[0, t^{*}), r < 0$ on $(t^{*}, τ]$ . But $(M)$ says

α (t) = {\begin{cases} 1 & if r (t) < 0 \\ 0 & if r (t) > 0 \end{cases}

Thus an optimal control changes just once from 0 to 1 ; and so our earlier guess of $α (\cdot)$ does indeed satisfy the Pontryagin Maximum Principle.

◻

4.5 Maximum Principle with Transversality Conditions

Consider again the dynamics

\begin{matrix} (ODE) & \dot{x} (t) = f (x (t), α (t)) (t > 0) \end{matrix}

In this section we discuss another variant problem, one for which the initial position is constrained to lie in a given set $X_{0} \subset R^{n}$ and the final position is also constrained to lie within a given set $X_{1} \subset R^{n}$ .

So in this model we get to choose the starting point $x^{0} \in X_{0}$ in order to maximize

\begin{matrix} (P) & P [α (\cdot)] = \int_{0}^{τ} r (x (t), α (t)) d t, \end{matrix}

where $τ = τ [α (\cdot)]$ is the first time we hit $X_{1}$ .

Notation

We will assume that $X_{0}, X_{1}$ are in fact smooth surfaces in $R^{n}$ . We let $T_{0}$ denote the tangent plane to $X_{0}$ at $x^{0}$ , and $T_{1}$ the tangent plane to $X_{1}$ at $x^{1}$ .

Theorem 4.5 (More Transversality Conditions)

Let $α^{*} (\cdot)$ and $x^{*} (\cdot)$ solve the problem above, with

x^{0} = x^{*} (0), x^{1} = x^{*} (τ^{*})

Then there exists a function $p^{*} (\cdot) : [0, τ^{*}] \to R^{n}$ , such that $(ODE), (ADJ)$ and $(M)$ hold for $0 \leq t \leq τ^{*}$ . In addition,

\begin{matrix} (T) & {\begin{cases} p^{*} (τ^{*}) is perpendicular to T_{1} \\ p^{*} (0) is perpendicular to T_{0} \end{cases} \end{matrix}

We call $(T)$ the transversality conditions.

Remarks and Intepretations

If we have $T > 0$ fixed and

P [α (\cdot)] = \int_{0}^{T} r (x (t), α (t)) d t + g (x (T)),

then $(T)$ says

p^{*} (T) = \nabla g (x^{*} (T)),

in agreement with our earlier form of the terminal/transversality condition.

Suppose that the surface $X_{1}$ is the graph $X_{1} = {x ∣ g_{k} (x) = 0, k = 1, \dots, l}$ . Then $(T)$ says that $p^{*} (τ^{*})$ belongs to the "orthogonal complement" of the subspace $T_{1}$ . But orthogonal complement of $T_{1}$ is the span of $\nabla g_{k} (x^{1}) (k = 1, \dots, l)$ . Thus

p^{*} (τ^{*}) = \sum_{k = 1}^{l} λ_{k} \nabla g_{k} (x^{1})

for some unknown constants $λ_{1}, \dots, λ_{l}$ .

4.6 More Applications

4.6.1 Example 1: Distance between Two Sets

As a first and simple example, let

\begin{matrix} (ODE) & \dot{x} (t) = α (t) \end{matrix}

for $A = S^{1}$ , the unit sphere in $R^{2} : a \in S^{1}$ if and only if $| a |^{2} = a_{1}^{2} + a_{2}^{2} = 1$ . In other words, we are considering only curves that move with unit speed.

We take

\begin{aligned} P [α (\cdot)] & = - \int_{0}^{τ} | \dot{x} (t) | d t = - the length of the curve \\ (P) & = - \int_{0}^{τ} d t = - time it takes to reach X_{1} \end{aligned}

We want to minimize the length of the curve and, as a check on our general theory, will prove that the minimum is of course a straight line.

Using the maximum principle

We have

\begin{aligned} H (x, p, a) & = f (x, a) \cdot p + r (x, a) \\ = a \cdot p - 1 = p_{1} a_{1} + p_{2} a_{2} - 1 \end{aligned}

The adjoint dynamics equation $(ADJ)$ says

\dot{p} (t) = - \nabla_{x} H (x (t), p (t), α (t)) = 0

and therefore

p (t) \equiv constant = p^{0} \neq 0 .

The maximization principle ( M ) tells us that

H (x (t), p (t), α (t)) = max_{a \in S^{1}} [- 1 + p_{1}^{0} a_{1} + p_{2}^{0} a_{2}] .

The right hand side is maximized by $a^{0} = \frac{p^{0}}{| p^{0} |}$ , a unit vector that points in the same direction of $p^{0}$ . Thus $α (\cdot) \equiv a^{0}$ is constant in time. According then to $(ODE)$ we have $\dot{x} = a^{0}$ , and consequently $x (\cdot)$ is a straight line.

Finally, the transversality conditions say that

\begin{matrix} (T) & p (0) ⊥ T_{0}, p (t_{1}) ⊥ T_{1} . \end{matrix}

In other words, $p^{0} ⊥ T_{0}$ and $p^{0} ⊥ T_{1}$ ; and this means that the tangent planes $T_{0}$ and $T_{1}$ are parallel.

Now all of this is pretty obvious from the picture, but it is reassuring that the general theory predicts the proper answer.

◻

4.6.2 Example 2: Commodity Trading

Next is a simple model for the trading of a commodity, say wheat. We let $T$ be the fixed length of trading period, and introduce the variables

\begin{aligned} x^{1} (t) & = money on hand at time t \\ x^{2} (t) & = amount of wheat owned at time t \\ α (t) & = rate of buying or selling of wheat \\ q (t) & = price of wheat at time t (known) \\ λ & = cost of storing a unit amount of wheat for a unit of time. \end{aligned}

We suppose that the price of wheat $q (t)$ is known for the entire trading period $0 \leq t \leq T$ (although this is probably unrealistic in practice). We assume also that the rate of selling and buying is constrained:

| α (t) | \leq M

where $α (t) > 0$ means buying wheat, and $α (t) < 0$ means selling.

Our intention is to maximize our holdings at the end time $T$ , namely the sum of the cash on hand and the value of the wheat we then own:

\begin{matrix} (P) & P [α (\cdot)] = x^{1} (T) + q (T) x^{2} (T) . \end{matrix}

The evolution is

\begin{matrix} (ODE) & {\begin{aligned} {\dot{x}}^{1} (t) & = - λ x^{2} (t) - q (t) α (t) \\ {\dot{x}}^{2} (t) & = α (t) \end{aligned} \end{matrix}

This is a nonautonomous ( $=$ time dependent) case, but it turns out that the Pontryagin Maximum Principle still applies.

Using the maximum principle

What is our optimal buying and selling strategy? First, we compute the Hamiltonian

H (x, p, t, a) = f \cdot p + r = p_{1} (- λ x_{2} - q (t) a) + p_{2} a

since $r \equiv 0$ . The adjoint dynamics read

\begin{matrix} (ADJ) & {\begin{cases} {\dot{p}}^{1} = 0 \\ {\dot{p}}^{2} = λ p^{1} \end{cases} \end{matrix}

with the terminal condition

\begin{matrix} (T) & p (T) = \nabla g (x (T)) \end{matrix}

In our case $g (x_{1}, x_{2}) = x_{1} + q (T) x_{2}$ , and hence

\begin{matrix} (T) & {\begin{cases} p^{1} (T) = 1 \\ p^{2} (T) = q (T) . \end{cases} \end{matrix}

We then can solve for the costate:

{\begin{cases} p^{1} (t) \equiv 1 \\ p^{2} (t) = λ (t - T) + q (T) . \end{cases}

The maximization principle $(M)$ tells us that

\begin{matrix} (M) & \begin{aligned} H (x (t), p (t), t, α (t)) \\ = & max_{| a | \leq M} {p^{1} (t) (- λ x^{2} (t) - q (t) a) + p^{2} (t) a} \\ = & - λ p^{1} (t) x^{2} (t) + max_{| a | \leq M} {a (- q (t) + p^{2} (t))} \end{aligned} \end{matrix}

α (t) = {\begin{cases} M & if q (t) < p^{2} (t) \\ - M & if q (t) > p^{2} (t) \end{cases}

for $p^{2} (t) := λ (t - T) + q (T)$ .

Critique

In some situations the amount of money on hand $x^{1} (\cdot)$ becomes negative for part of the time. The economic problem has a natural constraint $x_{2} \geq 0$ (unless we can borrow with no interest charges) which we did not take into account in the mathematical model.

4.7 Maximum Principle with State Constraints

We return once again to our usual setting:

\begin{matrix} (ODE) & {\begin{cases} \dot{x} (t) = f (x (t), α (t)) \\ x (0) = x^{0} \end{cases} \\ (P) & P [α (\cdot)] = \int_{0}^{τ} r (x (t), α (t)) d t \end{matrix}

for $τ = τ [α (\cdot)]$ , the first time that $x (τ) = x^{1}$ . This is the fixed endpoint problem.

State Constraints

We introduce a new complication by asking that our dynamics $x (\cdot)$ must always remain within a given region $R \subset R^{n}$ . We will as above suppose that $R$ has the explicit representation

R = {x \in R^{n} ∣ g (x) \leq 0}

for a given function $g (\cdot) : R^{n} \to R$ .

Definition

It will be convenient to introduce the quantity

c (x, a) := \nabla g (x) \cdot f (x, a) .

Notice that if $x (t) \in \partial R$ for times $s_{0} \leq t \leq s_{1}$ , then $c (x (t), α (t)) \equiv 0 (s_{0} \leq t \leq s_{1})$ . This is so since $f$ is then tangent to $\partial R$ , whereas $\nabla g$ is perpendicular.

Theorem 4.6 (Maximum Principle with State Constraints)

Let $α^{*} (\cdot), x^{*} (\cdot)$ solve the control theory problem above. Suppose also that $x^{*} (t) \in \partial R$ for $s_{0} \leq t \leq s_{1}$ .

Then there exists a costate function $p^{*} (\cdot) : [s_{0}, s_{1}] \to R^{n}$ such that $(ODE)$ holds. There also exists $λ^{*} (\cdot) : [s_{0}, s_{1}] \to R$ such that for times $s_{0} \leq t \leq s_{1}$ we have

\begin{matrix} (ADJ’) & {\dot{p}}^{*} (t) = - \nabla_{x} H (x^{*} (t), p^{*} (t), α^{*} (t)) + λ^{*} (t) \nabla_{x} c (x^{*} (t), α^{*} (t)); \end{matrix}

and

\begin{matrix} (M’) & (M^{'}) H (x^{*} (t), p^{*} (t), α^{*} (t)) = max_{a \in A} {H (x^{*} (t), p^{*} (t), a) ∣ c (x^{*} (t), a) = 0} . \end{matrix}

To keep things simple, we have omitted some technical assumptions really needed for the Theorem to be valid.

Remarks and Intepretations

Let $A \subset R^{m}$ be of this form:

A = {a \in R^{m} ∣ g_{1} (a) \leq 0, \dots, g_{s} (a) \leq 0}

for given functions $g_{1}, \dots, g_{s} : R^{m} \to R$ . In this case we can use Lagrange multipliers to deduce from $(M^{'})$ that

\begin{matrix} (M”) & \nabla_{a} H (x^{*} (t), p^{*} (t), α^{*} (t)) = λ^{*} (t) \nabla_{a} c (x^{*} (t), α^{*} (t)) + \sum_{i = 1}^{s} μ_{i}^{*} (t) \nabla_{a} g_{i} (x^{*} (t)) \end{matrix}

The function $λ^{*} (\cdot)$ here is that appearing in $({ADJ}^{'})$ . If $x^{*} (t)$ lies in the interior of $R$ for say the times $0 \leq t < s_{0}$ , then the ordinary Maximum Principle holds.

Jump conditions. In the situation above, we always have

p^{*} (s_{0} - 0) = p^{*} (s_{0} + 0)

where $s_{0}$ is a time that $x^{*}$ hits $\partial R$ . In other words, there is no jump in $p^{*}$ when we hit the boundary of the constraint $\partial R$ .

However,

p^{*} (s_{1} + 0) = p^{*} (s_{1} - 0) - λ^{*} (s_{1}) \nabla g (x^{*} (s_{1}))

this says there is (possibly) a jump in $p^{*} (\cdot)$ when we leave $\partial R$ .

4.8 More Applications

4.8.1 Example 1: Shortest Distance between Two Points, Avoiding An Obstacle.

What is the shortest path between two points that avoids the disk $B = B (0, r)$ , as drawn?

Let us take

\begin{matrix} (ODE) & {\begin{cases} \dot{x} (t) = α (t) \\ x (0) = x^{0} \end{cases} \end{matrix}

for $A = S^{1}$ , with the payoff

\begin{matrix} (P) & P [α (\cdot)] = - \int_{0}^{τ} | \dot{x} | d t = - length of the curve x (\cdot) . \end{matrix}

We have

H (x, p, a) = f \cdot p + r = p_{1} a_{1} + p_{2} a_{2} - 1 .

Case 1: avoiding the obstacle

Assume $x (t) \notin \partial B$ on some time interval. In this case, the usual Pontryagin Maximum Principle applies, and we deduce as before that

\dot{p} = - \nabla_{x} H = 0 .

Hence

\begin{matrix} (ADJ) & p (t) \equiv constant = p^{0} . \end{matrix}

Condition $(M)$ says

H (x (t), p (t), α (t)) = max_{a \in S^{1}} (- 1 + p_{1}^{0} a_{1} + p_{2}^{0} a_{2})

The maximum occurs for $α = \frac{p^{0}}{| p^{0} |}$ . Furthermore,

- 1 + p_{1}^{0} α_{1} + p_{2}^{0} α_{2} \equiv 0

and therefore $α \cdot p^{0} = 1$ . This means that $| p^{0} | = 1$ , and hence in fact $α = p^{0}$ . We have proved that the trajectory $x (\cdot)$ is a straight line away from the obstacle.

Case 2: touching the obstacle

Suppose now $x (t) \in \partial B$ for some time interval $s_{0} \leq t \leq s_{1}$ . Now we use the modified version of Maximum Principle, provided by Theorem 4.6.

First we must calculate $c (x, a) = \nabla g (x) \cdot f (x, a)$ . In our case,

R = R^{2} - B = {x ∣ x_{1}^{2} + x_{2}^{2} \geq r^{2}} = {x ∣ g := r^{2} - x_{1}^{2} - x_{2}^{2} \leq 0}

Then $\nabla g = (\binom{- 2 x_{1}}{- 2 x_{2}})$ . Since $f = (\binom{a_{1}}{a_{2}})$ , we have

c (x, a) = - 2 a_{1} x_{1} - 2 a_{2} x_{2}

Now condition $({ADJ}^{'})$ implies

\dot{p} (t) = - \nabla_{x} H + λ (t) \nabla_{x} c;

which is to say,

\begin{matrix} (4.6) & {\begin{cases} {\dot{p}}^{1} = - 2 λ α^{1} \\ {\dot{p}}^{2} = - 2 λ α^{2} \end{cases} \end{matrix}

Next, we employ the maximization principle $(M^{'})$ . We need to maximize

H (x (t), p (t), a)

subject to the requirements that $c (x (t), a) = 0$ and $g_{1} (a) = a_{1}^{2} + a_{2}^{2} - 1 = 0$ , since $A = {a \in R^{2} ∣ a_{1}^{2} + a_{2}^{2} = 1}$ . According to $(M^{''})$ we must solve

\nabla_{a} H = λ (t) \nabla_{a} c + μ (t) \nabla_{a} g_{1};

that is,

{\begin{cases} p^{1} = λ (- 2 x^{1}) + μ 2 α^{1} \\ p^{2} = λ (- 2 x^{2}) + μ 2 α^{2} \end{cases}

We can combine these identities to eliminate $μ$ . Since we also know that $x (t) \in \partial B$ , we have ${(x^{1})}^{2} + {(x^{2})}^{2} = r^{2}$ ; and also $α = {(α^{1}, α^{2})}^{⊤}$ is tangent to $\partial B$ . Using these facts, we find after some calculations that

\begin{matrix} (4.7) & λ = \frac{p^{2} α^{1} - p^{1} α^{2}}{2 r} . \end{matrix}

But we also know

\begin{matrix} (4.8) & {(α^{1})}^{2} + {(α^{2})}^{2} = 1 \end{matrix}

and

H \equiv 0 = - 1 + p^{1} α^{1} + p^{2} α^{2}

hence

\begin{matrix} (4.9) & p^{1} α^{1} + p^{2} α^{2} \equiv 1. \end{matrix}

Solving for the unknowns. We now have the five equations $(4.6) - (4.9)$ for the five unknown functions $p^{1}, p^{2}, α^{1}, α^{2}, λ$ that depend on $t$ . We introduce the angle $θ$ , as illustrated, and note that $\frac{d}{d θ} = r \frac{d}{d t}$ . A calculation then confirms that the solutions are

\begin{matrix} {\begin{cases} α^{1} (θ) = - \sin θ \\ α^{2} (θ) = \cos θ \end{cases} \\ λ = - \frac{k + θ}{2 r} \end{matrix}

and

{\begin{cases} p^{1} (θ) = k \cos θ - \sin θ + θ \cos θ \\ p^{2} (θ) = k \sin θ + \cos θ + θ \sin θ \end{cases}

for some constant $k$ .

Case 3: approaching and leaving the obstacle

In general, we must piece together the results from Case 1 and Case 2. So suppose now $x (t) \in R = R^{2} - B$ for $0 \leq t < s_{0}$ and $x (t) \in \partial B$ for $s_{0} \leq t \leq s_{1}$ .

We have shown that for times $0 \leq t < s_{0}$ , the trajectory $x (\cdot)$ is a straight line. For this case we have shown already that $p = α$ and therefore

{\begin{cases} p^{1} \equiv - \cos ϕ_{0} \\ p^{2} \equiv \sin ϕ_{0} \end{cases}

for the angle $ϕ_{0}$ as shown in the picture.

By the jump conditions, $p (\cdot)$ is continuous when $x (\cdot)$ hits $\partial B$ at the time $s_{0}$ , meaning in this case that

{\begin{cases} k \cos θ_{0} - \sin θ_{0} + θ_{0} \cos θ_{0} = - \cos ϕ_{0} \\ k \sin θ_{0} + \cos θ_{0} + θ_{0} \sin θ_{0} = \sin ϕ_{0} \end{cases}

These identities hold if and only if

{\begin{cases} k = - θ_{0} \\ θ_{0} + ϕ_{0} = \frac{π}{2} \end{cases}

The second equality says that the optimal trajectory is tangent to the disk $B$ when it hits $\partial B$ .

We turn next to the trajectory as it leaves $\partial B$ : see the next picture. We then have

{\begin{cases} p^{1} (θ_{1}^{-}) = - θ_{0} \cos θ_{1} - \sin θ_{1} + θ_{1} \cos θ_{1} \\ p^{2} (θ_{1}^{-}) = - θ_{0} \sin θ_{1} + \cos θ_{1} + θ_{1} \sin θ_{1} . \end{cases}

Now our formulas above for $λ$ and $k$ imply $λ (θ_{1}) = \frac{θ_{0} - θ_{1}}{2 r}$ . The jump conditions give

p (θ_{1}^{+}) = p (θ_{1}^{-}) - λ (θ_{1}) \nabla g (x (θ_{1}))

for $g (x) = r^{2} - x_{1}^{2} - x_{2}^{2}$ . Then

λ (θ_{1}) \nabla g (x (θ_{1})) = (θ_{1} - θ_{0}) (\binom{\cos θ_{1}}{\sin θ_{1}}) .

Therefore

{\begin{cases} p^{1} (θ_{1}^{+}) = - \sin θ_{1} \\ p^{2} (θ_{1}^{+}) = \cos θ_{1}, \end{cases}

and so the trajectory is tangent to $\partial B$ . If we apply usual Maximum Principle after $x (\cdot)$ leaves $B$ , we find

\begin{aligned} p^{1} \equiv constant = - \cos ϕ_{1} \\ p^{2} \equiv constant = - \sin ϕ_{1} . \end{aligned}

Thus

{\begin{cases} - \cos ϕ_{1} = - \sin θ_{1} \\ - \sin ϕ_{1} = \cos θ_{1} \end{cases}

and so $θ_{1} - ϕ_{1} = \frac{π}{2}$ .

Critique

We have carried out elaborate calculations to derive some pretty obvious conclusions in this example. It is best to think of this as a confirmation in a simple case of Theorem 4.6, which applies in far more complicated situations.

◻

4.8.2 An Inventory Control Model

Now we turn to a simple model for ordering and storing items in a warehouse. Let the time period $T > 0$ be given, and introduce the variables

\begin{aligned} x (t) & = amount of inventory at time t \\ α (t) & = rate of ordering from manufacturers, α \geq 0, \\ d (t) & = customer demand (known) \\ γ & = cost of ordering 1 unit \\ β & = cost of storing 1 unit. \end{aligned}

Our goal is to fill all customer orders shipped from our warehouse, while keeping our storage and ordering costs at a minimum. Hence the payoff to be maximized is

\begin{matrix} (P) & P [α (\cdot)] = - \int_{0}^{T} γ α (t) + β x (t) d t . \end{matrix}

We have $A = [0, \infty)$ and the constraint that $x (t) \geq 0$ . The dynamics are

\begin{matrix} (ODE) & {\begin{cases} \dot{x} (t) = α (t) - d (t) \\ x (0) = x^{0} > 0 \end{cases} \end{matrix}

Guessing the optimal strategy

Let us just guess the optimal control strategy: we should at first not order anything ( $α = 0$ ) and let the inventory in our warehouse fall off to zero as we fill demands; thereafter we should order just enough to meet our demands ( $α = d$ ).

Using the maximum principle

We will prove this guess is right, using the Maximum Principle. Assume first that $x (t) > 0$ on some interval $[0, s_{0}]$ . We then have

H (x, p, a, t) = (a - d (t)) p - γ a - β x

and $(ADJ)$ says $\dot{p} = - \nabla_{x} H = β$ . Condition $(M)$ implies

\begin{aligned} H (x (t), p (t), α (t), t) & = max_{a \geq 0} {- γ a - β x (t) + p (t) (a - d (t))} \\ = - β x (t) - p (t) d (t) + max_{a \geq 0} {a (p (t) - γ)} \end{aligned}

Thus

α (t) = {\begin{cases} 0 & if p (t) \leq γ \\ + \infty & if p (t) > γ \end{cases}

If $α (t) \equiv + \infty$ on some interval, then $P [α (\cdot)] = - \infty$ , which is impossible, because there exists a control with finite payoff. So it follows that $α (\cdot) \equiv 0$ on $[0, s_{0}]$ : we place no orders.

According to $(ODE)$ , we have

{\begin{cases} \dot{x} (t) = - d (t) (0 \leq t \leq s_{0}) \\ x (0) = x^{0} \end{cases}

Thus $s_{0}$ is first time the inventory hits 0 . Now since $x (t) = x^{0} - \int_{0}^{t} d (s) d s$ , we have $x (s_{0}) = 0$ . That is, $\int_{0}^{s_{0}} d (s) d s = x^{0}$ and we have hit the constraint. Now use Pontryagin Maximum Principle with state constraint for times $t \geq s_{0}$

R = {x \geq 0} = {g (x) := - x \leq 0}

and

c (x, a, t) = \nabla g (x) \cdot f (x, a, t) = (- 1) (a - d (t)) = d (t) - a .

We have

\begin{matrix} (M’) & H (x (t), p (t), α (t), t) = max_{a \geq 0} {H (x (t), p (t), a, t) ∣ c (x (t), a, t) = 0} \end{matrix}

But $c (x (t), α (t), t) = 0$ if and only if $α (t) = d (t)$ . Then ( ODE ) reads

\dot{x} (t) = α (t) - d (t) = 0

and so $x (t) = 0$ for all times $t \geq s_{0}$ . We have confirmed that our guess for the optimal strategy was right.

4.9 Numerical Solution of the Two Point Boundary Value Problem (TPBVP)

Reference: Chapter 12.6 in Numerical Optimal Control by Moritz Diehl and Sebastien Gros

At first, give the main drawbacks of the indirect approach:

it must be possible to eliminate the controls from the problem by algebraic manipulations, which is not always straightforward or might even be impossible;
the optimal controls might be a discontinuous function of $x$ and $p$ , such that the BVP is possibly given by a non-smooth differential equation;
the differential equation might become very nonlinear and unstable and not suitable for a forward simulation.

All these issues of the indirect approach can partially be addressed, and most importantly, it offers an exact and elegant characterization of the solution of optimal control problems in continuous time.

In this section we address the question of how we can compute a solution of the boundary value problem (BVP) in the indirect approach. The remarkable observation is that the only non-trivial unknown is the initial value for the adjoints, $p (0)$ . Once this value has been found, the complete optimal trajectory can in principle be recovered by a forward simulation of the combined differential equation. Let us first recall that the BVP that we want to solve is given as

\begin{aligned} (4.1a) & b_{0} = x (0) - x^{0} & = 0, \\ (4.1b) & b_{T} = p (T) - \nabla g (x (T)) & = 0, \\ (4.1c) & \dot{x} (t) - \nabla_{p} H^{*} (x (t), p (t)) & = 0, t \in [0, T], \\ (4.1d) & \dot{p} (t) + \nabla_{x} H^{*} (x (t), p (t)) & = 0, t \in [0, T] . \end{aligned}

Using the shorthands

y = [\begin{matrix} x \\ p \end{matrix}], φ (y) = [\begin{matrix} \nabla_{p} H^{*} (x (t), p (t)) \\ - \nabla_{x} H^{*} (x (t), p (t)) \end{matrix}]

and

b (y (0), y (T), x^{0}) = [\begin{matrix} b_{0} (y (0), x^{0}) \\ b_{T} (y (T)) \end{matrix}]

the system of equations can be summarized as:

\begin{aligned} (4.2a) & b (y (0), y (T), x^{0}) & = 0, \\ (4.2b) & \dot{y} (t) - φ (y (t)) & = 0, t \in [0, T] . \end{aligned}

This BVP has the $2 n_{x}$ differential equations $\dot{y} = φ$ , and the $2 n_{x}$ boundary conditions $b$ and is therefore usually well-defined. We detail here three approaches to solve this TPBVP numerically, single shooting, collocation, and multiple shooting.

4.9.1 Single shooting

Single shooting starts with the following idea: for any guess of the initial value $p_{0}$ , we can use a numerical integration routine in order to obtain the state-costate trajectory as a function of $p_{0}, x^{0}$ , i.e. $y (t, p_{0}, x^{0})$ for all $t \in [0, T]$ . This is visualized in Figure 4.9.1. The result is that the differential equation $(4.2 b)$ is by construction already satisfied, as well as the initial boundary condition $(4.1 a)$ . Thus, we only need to enforce the boundary condition $(4.1 b)$ , which we can do using the terminal trajectory value $y (T, p_{0}, x^{0})$ :

\underset{=: R_{T} (p_{0})}{\underset{⏟}{b_{T} (y (T, p_{0}, x^{0}))}} = 0 .

For nonlinear dynamics $φ$ , this equation can generally not be solved explicitly. We then use the Newton's method, starting from an initial guess $p_{0}$ , and iterating to the solution, i.e. we iterate

\begin{matrix} (4.3) & p_{0}^{k + 1} = p_{0}^{k} - t_{k} {(\frac{\partial R_{T}}{\partial p_{0}} (p_{0}^{k}))}^{- 1} R_{T} (p_{0}^{k}) . \end{matrix}

for some adequate step-size $t_{k} \in] 0, 1]$ . It is important to note that in order to evaluate $\frac{\partial R}{\partial p_{0}} (p_{0}^{k})$ we have to compute the ODE sensitivities $\frac{\partial y (T, y_{0})}{\partial p_{0}}$ .

In some cases, as said above, the forward simulation of the combined ODE might be an ill-conditioned problem so that single shooting cannot be employed. Even if the forward simulation problem is well-defined, the region of attraction of the Newton iteration on $R_{T} (p_{0}) = 0$ can be very small, such that a good guess for $p_{0}$ is often required. However, such a guess is typically unavailable. In the following example, we illustrate these observation on a simple optimal control problem.

Example 4.9.1

We consider the optimal control problem:

\begin{matrix} (4.4) & \begin{aligned} \min_{x (.), u (.)} & \int_{0}^{T} x_{1} (t)^{2} + 10 x_{2} (t)^{2} + u (t)^{2} d t \\ subject to & {\dot{x}}_{1} (t) = x_{1} (t) x_{2} (t) + u (t), & x_{1} (0) = 0 \\ {\dot{x}}_{2} (t) = x_{1} (t), & x_{2} (0) = 1 \end{aligned} \end{matrix}

with $T = 5$ . This example does not hold a terminal cost or constraints, such that the terminal condition reads as $R_{T} (p_{0}) = p (T, p_{0}, x^{0}) = 0$ . The state-costate trajectory at the solution is displayed in Figure 4.9.1. It is then interesting to build the function $p_{0} \mapsto p (T, p_{0}, x^{0})$ for various values of $p_{0}$ , see Figure 4.9.2. This function is very nonlinear, making it difficult for the Newton iteration to find the co-states' initial value $p_{0}$ resulting in $p (T, p_{0}, x^{0}) = 0$ . More specifically, the Newton iteration (full steps or reduced steps) converges only for a specific set of initial guesses $p_{0}^{0}$ provided to the iteration (4.3), see Figure 4.9.3.

Figure 4.9.1: Illustration of the state and co-state trajectories for example 4.9.1 at the solution delivering $p (T, p_{0}, x^{0}) = 0$ for $T = 5$ , note that $λ_{*}$ in the image is actually $p_{*}$ .

Figure 4.9.2: Illustration of the map $p_{0} \mapsto p (T, p_{0}, x^{0})$ , in the form of level curves for $T = 5$ . The black dot represents the solution of the TPBVP problem, where $R_{T} (p_{0}) = p (T, p_{0}, x^{0}) = 0$ . One can observe that the map is very nonlinear, such that the Newton method can struggle to converge to the solution $p_{0}$ of the TPBVP, unless a very good initial guess is provided. Note that $λ_{*}$ in the image is actually $p_{*}$

Figure 4.9.3: Illustration of the region of convergence of the Newton iteration $(4.3)$ for problem $(4.4)$ (in black, with full Newton steps on the left-hand side graph and with reduced steps on the right-hand side graph). Here we note $p_{0, 1}, p_{0, 2}$ the initial guess provided to the Newton iteration. The grey dots (at $(3.22, 8.48)$ ) depict the solution to the TPBVP. Only a fairly small, disconnected and highly convoluted set of initial guess for the co-states initial conditions leads to a convergence of the Newton iteration. Note that $λ_{*}$ in the image is actually $p_{*}$

A crucial observation that will motivate an alternative to the single-shooting approach is illustrated in Figure 4.9.4, where the map $p_{0} \mapsto p (t, p_{0}, x^{0})$ is displayed for the integration times $t = 3$ and $t = 4$ . The crucial observation here is that the map is fairly linear up to $t = 3$ , and becomes increasingly nonlinear for larger integration times. This observation is general and motivates the core idea behind the alternatives to single shooting, namely that integration shall never be performed over long time intervals, so as to avoid creating strongly nonlinear functions in the TPBVP.

Figure 4.9.4: Illustration of the map $p_{0} \mapsto p (t, p_{0}, x^{0})$ , in the form of level curves for different times $t$ . The black dot represents the solution of the TPBVP problem, where $p (T, p_{0}, x^{0}) = 0$ . One can observe that the map is close to linear for "small" integration times $t$ (upper graphs, where $t = 3$ ), and becomes increasingly nonlinear as the integration time increases (lower graph, where $t = 4$ ), until it reaches the final time $T = 5$ , see Figure 4.9.2. This observation is general, and holds for most problems.

4.9.2 Multiple shooting

The nonlinearity of the integration map $p_{0} \mapsto y (t, p_{0}, x^{0})$ for long integration times $t$ motivates the "breaking down" of the full integration in small pieces, so as to avoid creating very nonlinear map in the TPBVP conditions. The idea is originally due to Osborne, and is based on dividing the time interval $[0, T]$ into (typically uniform) shooting intervals $[t_{k}, t_{k + 1}] \subset [0, T]$ , where the most common choice is $t_{k} = k \frac{T}{N}$ . Let us then frame the integration over a short time interval $[t_{k}, t_{k + 1}]$ with initial value $β_{k}$ as the function $Φ_{k} (β_{k})$ , defined as:

\begin{matrix} (4.5a) & Φ_{k} (β_{k}) = y (t_{k + 1}) where \\ (4.5b) & \dot{y} (t) - φ (y (t)) = 0, t \in [t_{k}, t_{k + 1}] and y (t_{k}) = β_{k} \end{matrix}

for $k = 0, \dots, N - 1$ . We then rewrite the TPBVP conditions (4.2) as:

\begin{aligned} (4.6a) & b (β_{0}, β_{N}, x^{0}) & = 0, & (boundary conditions) \\ (4.6b) & Φ_{k} (β_{k}) - β_{k + 1} & = 0, & k = 0, \dots, N - 1 . & (continuity conditions) \end{aligned}

One can then rewrite the conditions (4.6) altogether as the function:

\begin{matrix} (4.7) & R_{MS} (β, x^{0}) = 0 \end{matrix}

where we note $β = (β_{0}, \dots, β_{N})$ . A Newton iteration can be then deployed on $(4.7)$ to find the variables $β$ , it reads as:

\begin{matrix} (4.8) & β^{k + 1} = β^{k} - t_{k} {(\frac{\partial R_{MS}}{\partial β} (β^{k}, x^{0}))}^{- 1} R_{MS} (β^{k}, x^{0}) . \end{matrix}

for some step-size $t_{k} \in] 0, 1]$ . We illustrate the Multiple-Shooting approach in the following example.

Example 4.9.2

We consider the optimal control problem $(4.4)$ from Example 4.9.1 with $T = 5$ . If we denote $β_{k} = (x_{k}, p_{k})$ , the boundary conditions for this example then become:

x_{0} = [\begin{array}{l} 0 \\ 1 \end{array}], p_{N} = 0 .

We illustrate the Multiple-Shooting procedure (12.14) in Figure 4.9.5 for $N = 5$ .

Figure 4.9.5: Illustration of the state and co-state trajectories for problem $(4.4)$ during the multiple-shooting iterations $(4.8)$ , such that the conditions $Φ_{k} (β_{k}) - β_{k + 1} = 0$ are not yet fulfilled. Here, the discrete times $t_{k}$ are depicted as grey dashed lines, the discrete state-costates $β_{k} = (x_{k, 1}, x_{k, 2}, p_{k, 1}, p_{k, 2})$ are depicted as black dots, and the resulting integrations $Φ_{k} = (Φ_{k, 1}^{x}, Φ_{k, 2}^{x}, Φ_{k, 1}^{p}, Φ_{k, 2}^{p})$ are depicted as white dots. The black curves represent the state-costate trajectories on the various time intervals $[t_{k}, t_{k + 1}]$ . At the solution (12.14), the conditions $Φ_{k} (β_{k}) = β_{k + 1}$ are enforced for $k = 1, \dots, N - 1$ , such that the black and white dots coincide on each discrete time $t_{k}$ .

One ought to observe that the time intervals $[t_{k}, t_{k + 1}]$ are of size $\frac{T}{N}$ , and hence get shorter as $N$ increases. Because one can "control" the length of the time interval over which the integration is performed via $N$ , and because the functions $Φ_{k} (β_{k}) - β_{k + 1}$ become less nonlinear as the length of the time interval decreases, one can make them "arbitrarily" linear by increasing $N$ . It follows that a sufficiently large $N$ typically allows one to solve the Multiple-Shooting conditions $(4.6)$ using a Newton iteration even if no good initial guess is available.

Figure 4.9.6: Illustration of sparsity pattern of the Jacobian matrix $\frac{\partial R_{MS}}{\partial β}$ in the Newton iteration $(4.8)$ for the optimal control problem $(4.4)$ approached via indirect multiple-shooting, for Example 4.9.2. Here we use $N = 5$ . One can readily observe that the Jacobian matrix is sparse and highly structured. This structure arises via organising the algebraic conditions $(4.7)$ and the variables $β$ in time (i.e. in the order $k = 0, \dots, N$ ). Note that here the last variables $β_{N}$ where eliminated using the equality $β_{N} = Φ_{N - 1} (β_{k - 1})$ . In the specific case of Example 4.9.2, the elimination has no impact on the Newton iteration because the boundary conditions $b (β_{0}, β_{N}, x^{0})$ are linear.

It is important to observe that the set of algebraic conditions $(4.7)$ holds a large number of variables $β$ , such that the Newton iteration $(4.8)$ is deployed using large Jacobian matrices $\frac{\partial R_{MS}}{\partial β}$ . However, these matrices are sparse, and if the algebraic conditions and variables are adequately organised, they are highly structured (see Figure 4.9.6), such that their factorization can be performed efficiently.

The second alternative to single-shooting is the object of the next section, and can be construed as an extreme case of Multiple-Shooting. We detail this next.

4.9.3 Collocation & Pseudo-spectral methods

The second alternative approach to single shooting is to use simultaneous collocation or Pseudo-spectral methods. As we will see next, the two approaches are fairly similar. The key idea behind these methods is to introduce all the variables involved in processing the integration of the dynamics, and the related algebraic conditions into the set of algebraic equations to be processed.

The most common implementation of this idea is based on the Orthogonal Collocation method.

We consider the collocation-based integration of the state-costate dynamics on a time interval $[t_{k}, t_{k + 1}]$ starting from the initial value $β_{k}$ , as described in equation

c_{k} (v_{k}, t_{k, i}, β_{k}) = [\begin{matrix} v_{k, 0} - β_{k} \\ {\dot{p}}_{k} (t_{k, 1}, v_{k}) - f (v_{k, 1}, t_{k, i}) \\ ⋮ \\ {\dot{p}}_{k} (t_{k, d}, v_{k}) - f (v_{k, d}, t_{k, i}) \end{matrix}] = 0.

where $v$ is the coefficients of polynominal.

The integration is then based on solving a set of collocation equations:

\begin{aligned} (4.10a) & v_{k, 0} & = β_{k} \\ (4.10b) & \dot{p} (t_{k, i}, v_{k}) & = φ (v_{k, i}, t_{k, i}), i = 1, \dots, d \end{aligned}

for $k = 0, \dots, N - 1$ , where $t_{k, i} \in [t_{k}, t_{k + 1}]$ for $i = 0, \dots, d$ , and the variables $v_{k} \in R^{2 n_{x} \cdot (d + 1)}$ hold the discretisation of the continuous state-costates dynamics. The TPBVP discretised using orthogonal collocation then holds the variables $v_{k, i}$ and $β_{k}$ for $k = 0, \dots, N - 1$ and $i = 1, \dots, d$ , and the following constraints:

\begin{aligned} (4.11a) & b (β_{0}, β_{N}, x^{0}) = 0, & (boundary condition), \\ (4.11b) & p (t_{k + 1}, v_{k}) - β_{k + 1} = 0, & (continuity condition), \\ (4.11c) & v_{k, 0} - β_{k} = 0, & (initial values), \\ (4.11d) & \dot{p} (t_{k, i}, v_{k}) - φ (v_{k, i}, t_{k, i}) = 0, & (dynamics). \end{aligned}

One can observe that equations $(4.11 b)$ and $(4.11 c)$ are linear, while equation $(4.11 d)$ is nonlinear when the dynamics are nonlinear. One can also observe that the variables $β_{0, \dots, N - 1}$ can actually be eliminated from (4.11), to yield a slightly more compact set of equation, with $k = 0, \dots, N - 1$ and $i = 1, \dots, d$ :

\begin{aligned} (4.12a) & b (v_{k, 0}, v_{N, 0}, x_{0}) & = 0, & (boundary condition), \\ (4.12b) & p (t_{k + 1}, v_{k}) - v_{k + 1, 0} & = 0, & (continuity condition), \\ (4.12c) & \dot{p} (t_{k, i}, v_{k}) - φ (v_{k, i}, t_{k, i}) & = 0, & (dynamics). \end{aligned}

This elimination does not modify the behavior of the Newton iteration. We can gather the algebraic conditions $(4.12)$ and the variables $β_{k}, v_{k}$ in the compact form

\begin{matrix} (4.13) & R_{IC} (w, x^{0}) = 0, \end{matrix}

where $w = {v_{0, 0}, \dots, v_{0, d}, \dots, v_{N - 1, 0}, \dots, v_{N - 1, d}, v_{N, 0}}$ . A Newton iteration can be then deployed on $(4.13)$ to find the variables $w$ , it reads as

\begin{matrix} (4.14) & w^{k + 1} = w^{k} - t_{k} {(\frac{\partial R_{IC}}{\partial w} (w^{k}, x^{0}))}^{- 1} R_{IC} (w^{k}, x^{0}), \end{matrix}

for some step-size $t_{k} \in [0, 1]$ . We illustrate the indirect collocation approach in the following example.

Figure 4.9.7: Illustration of the state and co-state trajectories for example 4.9.1 using the orthogonal collocation approach with $N = 20$ . The grey curves display the state-costate trajectories after the first full Newton step of $(4.14)$ , while the black curves report the state-costate trajectories at convergence. The discrete times $t_{k}$ are depicted as grey dashed lines, the discrete state-costates on the time grid $t_{k, i}$ are depicted as dots. Note that the continuity conditions $(4.11 b)$ in the collocation method are linear in the variables $w$ , such that the trajectories are continuous after the first full Newton step (hence the grey curves are continuous, even though the problem is not solved yet).

Example 4.9.3

We consider the optimal control problem from Example 4.9.1 with $T = 5$ . We illustrate the Orthogonal Collocation procedure $(4.11)$ in Figure 4.9.7 for $N = 10$ . The sparsity pattern of the Jacobian matrix $\frac{\partial R_{IC}}{\partial w}$ from the Newton iteration $(4.14)$ is illustrated in Figure 4.9.8. The variables and constraints were ordered with respect to time. Even though it is large, the complexity of forming factorisations of the Jacobian matrix $\frac{\partial R_{I C}}{\partial w}$ is limited as it is sparse and highly structured.

Figure 4.9.8: Illustration of the sparsity structure for the Jacobian $\frac{\partial R_{IC}}{\partial w}$ in the Newton iteration $(4.14)$

Pseudo-spectral methods

Pseudo-spectral methods deploy a very similar approach to the one described here, to the exception that they skip the division of the time interval $[0, T]$ into subintervals $[t_{k}, t_{k + 1}]$ , and use a single set of basis functions spanning the entire time interval $[0, T]$ . Pseudo-spectral methods for the TPBVP problem $(4.2)$ can then be framed as:

\begin{aligned} (4.15a) & b (p (0, v), p (T, v), x^{0}) & = 0, \\ (4.15b) & \dot{p} (t_{k}, v) - φ (p (t_{k}, v), t_{k}) & = 0, i = k, \dots, n \end{aligned}

where $t_{k} \in [0, T]$ , and the variables $v \in R^{n_{x} \cdot n}$ hold the discretization of the continuous dynamics. Because they attempt to capture the state trajectories in a single function $p (t, v)$ , with $t \in [0, T]$ , the Newton iteration solving constraints $(4.15)$ generally holds a dense Jacobian matrix, for which structure-exploiting linear algebra is generally inefficient.

4 The Pontryagin Maximum Principle ​

4.1 Calculus of Variations, Hamiltonian Dynamics ​

Basic Problem of the Calculus of Variations ​

4.1.1 Derivation of Euler-Lagrange Equations ​

4.1.2 Conversion to Hamilton's Equations ​

A Physical Example ​

4.2 Review of Lagrange Multipliers ​

Constraint and Lagrange Multipliers ​

Unconstrainted Optimization ​

Constrainted Optimization ​

Critique ​

4.3 Statemet of Pontryagin Maximum Principle ​

4.3.1 Fixed Time, Free Endpoint Problem ​

Basic Problem ​

Remarks and Intepretations ​

4.3.2 Free Time, Fixed Endpoint Problem ​

Remark and Warning ​

4.4 Application and Examples ​

How to Use the Maximum Principle ​

Calculations with Lagrange multipliers ​

Calculations with the costate ​

4.4.1 Example 1: Linear Time-Optimal Control. ​

4.4.2 Example 2: Control of Production and Consumption ​

Introducing the maximum principle ​

Using the maximum principle ​

4.4.3 Example 3: A Simple Linear-Quadratic Regulator ​

Introducing the maximum principle ​

Using the Maximum Principle ​

Feedback controls ​

4.4.4 Example 4: Moon Lander ​

Introducing the maximum principle ​

Using the maximum principle ​

4.5 Maximum Principle with Transversality Conditions ​

Remarks and Intepretations ​

4.6 More Applications ​

4.6.1 Example 1: Distance between Two Sets ​

Using the maximum principle ​

4.6.2 Example 2: Commodity Trading ​

Using the maximum principle ​

Critique ​

4.7 Maximum Principle with State Constraints ​

State Constraints ​

Remarks and Intepretations ​

4.8 More Applications ​

4.8.1 Example 1: Shortest Distance between Two Points, Avoiding An Obstacle. ​

Case 1: avoiding the obstacle ​

Case 2: touching the obstacle ​

Case 3: approaching and leaving the obstacle ​

Critique ​

4.8.2 An Inventory Control Model ​

Guessing the optimal strategy ​

Using the maximum principle ​

4.9 Numerical Solution of the Two Point Boundary Value Problem (TPBVP) ​

4.9.1 Single shooting ​

Example 4.9.1 ​

4.9.2 Multiple shooting ​

Example 4.9.2 ​

4.9.3 Collocation & Pseudo-spectral methods ​

Example 4.9.3 ​

Pseudo-spectral methods ​

4 The Pontryagin Maximum Principle

4.1 Calculus of Variations, Hamiltonian Dynamics

Basic Problem of the Calculus of Variations

4.1.1 Derivation of Euler-Lagrange Equations

4.1.2 Conversion to Hamilton's Equations

A Physical Example

4.2 Review of Lagrange Multipliers

Constraint and Lagrange Multipliers

Unconstrainted Optimization

Constrainted Optimization

Critique

4.3 Statemet of Pontryagin Maximum Principle

4.3.1 Fixed Time, Free Endpoint Problem

Basic Problem

Remarks and Intepretations

4.3.2 Free Time, Fixed Endpoint Problem

Remark and Warning

4.4 Application and Examples

How to Use the Maximum Principle

Calculations with Lagrange multipliers

Calculations with the costate

4.4.1 Example 1: Linear Time-Optimal Control.

4.4.2 Example 2: Control of Production and Consumption

Introducing the maximum principle

Using the maximum principle

4.4.3 Example 3: A Simple Linear-Quadratic Regulator

Introducing the maximum principle

Using the Maximum Principle

Feedback controls

4.4.4 Example 4: Moon Lander

Introducing the maximum principle

Using the maximum principle

4.5 Maximum Principle with Transversality Conditions

Remarks and Intepretations

4.6 More Applications

4.6.1 Example 1: Distance between Two Sets

Using the maximum principle

4.6.2 Example 2: Commodity Trading

Using the maximum principle

Critique

4.7 Maximum Principle with State Constraints

State Constraints

Remarks and Intepretations

4.8 More Applications

4.8.1 Example 1: Shortest Distance between Two Points, Avoiding An Obstacle.

Case 1: avoiding the obstacle

Case 2: touching the obstacle

Case 3: approaching and leaving the obstacle

Critique

4.8.2 An Inventory Control Model

Guessing the optimal strategy

Using the maximum principle

4.9 Numerical Solution of the Two Point Boundary Value Problem (TPBVP)

4.9.1 Single shooting

Example 4.9.1

4.9.2 Multiple shooting

Example 4.9.2

4.9.3 Collocation & Pseudo-spectral methods

Example 4.9.3

Pseudo-spectral methods