1 Introduction

This first 5 parts refers this e-book: An Introduction to Mathematical Optimal Control Theory by Lawrence C. Evans in 2024. The Chapter 6 (differential game) and 7 (stochastic control) of this e-book are not contained This 6th part is the note from various notes online

1.1 basic problem

DYNAMICS: Consider an ODE:

{\begin{aligned} \dot{x} (t) & = f (x (t)) (t > 0) \\ x (0) & = x^{0} \end{aligned}

$x^{0} \in R^{n}$ : initial point
$f : R^{n} \to R^{n}$ : dynamic function
$x : [0, \infty) \to R^{n}$ : unknown; dynamic evolution of the state of some "system"

CONTROLLED DYNAMICS: Generalize a bit, $f$ also depends upon some "control" parameters $a \in A \subset R^{m}$ . So $f : R^{n} \times A \to R^{n}$ :

{\begin{aligned} \dot{x} (t) & = f (x (t), a) (t > 0) \\ x (0) & = x^{0} \end{aligned}

Change the value $a$ as the system evolves, like:

α (t) = {\begin{cases} a_{1} & 0 \leq t \leq t_{1} \\ a_{2} & t_{1} < t \leq t_{2} \\ a_{3} & t_{2} < t \leq t_{3} \end{cases} e t c .

Then the dynamic equation becomes:

{\begin{aligned} \dot{x} (t) & = f (x (t), α (t)) (t > 0) \\ x (0) & = x^{0} \end{aligned}

We call a function $α : [0, \infty) \to A$ a control and regard the trajectory $x (t)$ as te corresponding response of the system.

Notation

Introduce:
$A = {α : [0, \infty) \to A | α (\cdot) measurable}$
To denote the collection of all admissible controls
$x (\cdot) = x (\cdot, α (\cdot), x^{0})$ would be more precise

Payoffs

Overall task will be to determine what is the "best" control for our system. For this we need to specify a specific payoff (or reward) criterion. Let us define the payoff functional

P [α (\cdot)] := \int_{0}^{T} r (x (t), α (t)) d t + g (x (T))

where:

$x (\cdot)$ solves ODE for the control $α (\cdot)$
$r : R^{n} \times A \to R$ : Given; runnig payoff
$g : R^{n} \to R$ : Given; terminal payoff;
$T$ : Given; terminal time

The basic problem

Aim: find a control $α^{*} (\cdot)$ which maximizes the payoff:

P [α^{*} (\cdot)] \geq P [α (\cdot)] .

for all controls $α (\cdot) \in A$ . Such a control $α^{*} (\cdot)$ is called an optimal control.

This task presents us with these mathematical issues:

Does an optimal control exist?
How can we characterize an optimal control mathematically?
How can we construct an optimal control?

These turn out to be sometimes subtle problems, as the following collection of examples illustrates.

1.2 Examples

Example 1: Control of Production and Consumption

Suppose we own, say, a factory whose output we can control. Let us begin to construct a mathematical model by setting

$x (t) =$ amount of output produced at time $t \geq 0$

We suppose that we consume some fraction of our output at each time, and likewise can reinvest the remaining fraction. Let us denote:

$α (t) =$ fraction of output reinvested at time $t \geq 0$

This will be our control, and s.t. the obvious constraint that:

0 \leq α (t) \leq 1, for each time t \geq 0

The corresponding dynamic equation is:

{\begin{aligned} \dot{x} (t) & = k α (t) x (t) (t > 0) \\ x (0) & = x^{0} \end{aligned}

The constant $k > 0$ modelling the growth rate of our reinvestment.

Take as a payoff functional:

P [α (\cdot)] := \int_{0}^{T} (1 - α (t)) x (t) d t

That means we want to maximize our total consumption of the output, our consumption at a given time t being $(1 - α (t)) x (t)$ . Here $n = m = 1$ and:

A = [0, 1], f (x, a) = k a x, r (x, a) = (1 - a) x, g \equiv 0.

In 4.4.2, we will see that the optimal control is $α^{*}$ is given by:

α^{*} = {\begin{cases} 1 & 0 \leq x (t) \leq t^{*} \\ 0 & t^{*} < x (t) \leq T \end{cases}

In other words, we should reinvest all the output (and therefore consume nothing) up until time $t^{*}$ , and afterwards, we should consume everything (and therefore reinvest nothing). The switchover time $t^{*}$ will have to be determined. We call $α^{*} (\cdot)$ a bang–bang control.

In this example, we consider a population of social insects, a population if bees. Write $T$ for the length of the season,, and introduce the variables:

$w (t) =$ number of workers at time $t$
$q (t) =$ number of queens
$α (t) =$ fraction of colony effort devoted to increasing work force

Constraint of $α (t)$ : $0 \leq α (t) \leq 1$

Introduce the dynamic for the numbers of workers and the number of queens:

workers: ${\begin{aligned} \dot{w} (t) & = - μ w (t) + b s (t) α (t) w (t) \\ w (0) & = w^{0} \end{aligned}$
- $μ$ :is the death rate of workers; given constant
- $s (t)$ : the known rate at which each worker contributes to the bee economy
queens: ${\begin{aligned} \dot{q} (t) & = - ν q (t) + c (1 - α (t)) s (t) w (t) \\ q (0) & = q^{0} \end{aligned}$
- $ν, c$ constant

Goal: maximize the queens at time $T$ :

P [α (\cdot)] = q (T)

We have $x (t) = (w (t), q (t))^{⊤}$ and $x^{0} = (w^{0}, q^{0})^{⊤}$ , take $r \equiv 0$ and $g (w, q) = q$

answer will again turn out to be a bang–bang control

Example 3: A Pendulum

A hanging pendlum: $θ (t) =$ angle of the pendulum at time $t \geq 0$

If no external force:

{\begin{aligned} \ddot{θ} (t) + λ \dot{θ} (t) + ω^{2} θ (t) = 0 \\ θ (0) = θ_{1}, \dot{θ} (0) = θ_{2} \end{aligned}

The solution will be a damped oscillation, provided $λ > 0$

Let $α (t)$ denote an applied torque: $| α | \leq 1$

Our dynamics now become

{\begin{cases} \ddot{θ} (t) + λ \dot{θ} (t) + ω^{2} θ (t) = α (t) \\ θ (0) = θ_{1}, \dot{θ} (0) = θ_{2} \end{cases}

Define $x_{1} (t) = θ (t), x_{2} (t) = \dot{θ} (t)$ , and $x (t) = (x_{1} (t), x_{2} (t))$ . Then we can write the evolution as the system

\dot{x} (t) = (\binom{{\dot{x}}_{1}}{{\dot{x}}_{2}}) = (\binom{\dot{θ}}{\ddot{θ}}) = (\binom{x_{2}}{- λ x_{2} - ω^{2} x_{1} + α (t)}) = f (x, α) .

We introduce as well

P [α (\cdot)] = - \int_{0}^{τ} 1 d t = - τ

for

τ = τ (α (\cdot)) = first time that x (τ) = 0 (that is, θ (τ) = \dot{θ} (τ) = 0 .)

Maximize $P [\cdot]$ , meaning that we want to minimize the time it takes to bring the pendulum to rest.

The terminal time isn't fixed, but rather depends upon the control. This's a fixed endpoint, free time problem.

Example 4: A Moon Lander

This model asks us to bring a spacecraft to a soft landing on the lunar surface, using the least amount of fuel.

Introduce the notation:

Notation

$h (t)$ : height at time $t$
$v (t)$ : velocity $= \dot{h} (t)$
$m (t)$ : mass of spacecraft at time $t$ (changing as fuel is burned)
$α (t)$ : thrust at time t, assumed that $0 \leq α (t) \leq 1$

For Newton's law:

m \ddot{h} = - g m + α

Modelled by ODE:

{\begin{aligned} \dot{v} (t) & = - g + \frac{α (t)}{m (t)} \\ \dot{h} (t) & = v (t) \\ \dot{m} (t) & = - k α (t) \end{aligned}

We want to minimize the amount of fuel used up, that is, to maximize the amount remaining once we have landed. Thus

P [α (\cdot)] = m (τ)

where $τ$ is the first time that $h (τ) = v (τ) = 0$ . This's a variable endpoint problem, since the final time is not given in advance.

We have also the extra constraints $h (t) \geq 0, m (t) \geq 0$

Example 5: Rocket Railroad Car

Imagine a railroad car powered by rocket engines on each side. We introduce the variables:

$q (t)$ : position at time $t$
$v (t) = \dot{q} (t)$ : velocity at time $t$
$α (t)$ : thrust from rockets at time $t$ , assumed that $- 1 \leq α (t) \leq 1$

We want to figure out how to fire the rockets, so as to arrive at the origin 0 with zero velocity in a minimum amount of time. Assuming the car has mass $m = 1$ , the law of motion is

\ddot{q} (t) = α (t)

Rewrie by setting $x (t) = (q (t), v (t))^{⊤}$ . Then

{\begin{cases} \dot{x} (t) = (\begin{array}{ll} 0 & 1 \\ 0 & 0 \end{array}) x (t) + (\binom{0}{1}) α (t) \\ x (0) = x^{0} = {(q_{0}, v_{0})}^{T} \end{cases}

Take

P [α (\cdot)] = - \int_{0}^{τ} 1 d t = - τ

for $τ =$ first time that $q (τ) = v (τ) = 0$

1.3 A geometric solution

Introduce some ad hoc calculus and geometry methods for the rocke car problem.

First of all, let us guess that to find an optimal solution we will need only to consider the cases $α = 1$ or $α = - 1$ . In other words, we will focus our attention only upon those controls for which at each moment of time either the left or the right rocket engine is fired at full power.

CASE 1: $α \equiv 1$
${\begin{cases} \dot{q} = v \\ \dot{v} = 1 \end{cases}$
Then
$v \dot{v} = \dot{q}$
And so
$\frac{1}{2} (v^{2})^{\cdot} = \dot{q}$
Let $t_{0}$ belong to the time interval where $α \equiv 1$ and interate from $t_{0}$ to $t$ :
$\frac{v^{2} (t)}{2} - \frac{v^{2} (t_{0})}{2} = q (t) - q (t_{0})$
Then
$v^{2} (t) = 2 q (t) + \underset{b}{\underset{⏟}{(v^{2} (t_{0}) - 2 q (t_{0}))}}$
In other words, so long as the control is set for $α \equiv 1$ , the trajectory stays on the curve $v^{2} = 2 q + b$ for some constant $b$ .
CASE 2: $α \equiv - 1$
${\begin{cases} \dot{q} = v \\ \dot{v} = - 1 \end{cases}$

Then

\frac{1}{2} (v^{2})^{\cdot} = - \dot{q}

Let $t_{1}$ belong to the time interval where $α \equiv - 1$ and interate from $t_{1}$ to $t$ :

v^{2} (t) = - 2 q (t) + \underset{c}{\underset{⏟}{(2 q (t_{1}) - v^{2} (t_{1}))}}

As long as the control is set for $α \equiv - 1$ , the trajectory stays on the curve $v^{2} = - 2 q + c$ for some constant $c$ .

Geometric interpretation

Now we can design an optimal control $α^{*} (\cdot)$ , which causes the trajectory to jump between the families of right– and left–pointing parabolas, as drawn. Say we start at the black dot, and wish to steer to the origin. This we accomplish by first setting the control to the value $α = - 1$ , causing us to move down along the second family of parabolas. We then switch to the control $α = 1$ , and thereupon move to a parabola from the first family, along which we move up and to the left, ending up at the origin. See the picture.

1.4 Optimal Control Solutions

3-Direct method (Single/Multiple shooting, collocation method) - Zhuanlan in Zhihu including Numerical Optimal Control by Moritz Diehl and Sebastien Gros, which is the reference e-book in 1.4.

There are 3 basic families of approaches to address continuous-time optimal control problems (OCP):

State-space approaches: Hamilton-Jacobian-Bellman (HJB) Equation (Dynamic Programming for discrete);
- Core: states that each subarc of an optimal trajectory must be optimal
  - Use the principle of optimality:
- A PDE in the state space
- Methods to numerically compute solution approximations exist
  - but the approach severely suffers from Bellman’s “curse of dimensionality”, thus restricted to small state dimensions
Indirect Methods: Pontryagin Maximum Principle (PMP);
- Core: derive a Boundary Value Problem (BVP) in ODE
  - Use the necessary conditions of optimality of the infinite dimensional problem:
- This BVP must numerically be solved
- first optimize, then discretize
  - the conditions of optimality are first written in continuous time for the given problem, and then discretized in one way or another in order for computing a numerical solution
- numerical solution of the BVP: shooting techniques or by collocation
  - 2 major drawbacks:
    - the underlying differential equations are often difficult to solve due to strong nonlinearity and instability, and that changes in the control structure, i.e. the sequence of arcs where different constraints are active, are difficult to handle: they usually require a completely new problem setup
    - on so-called singular arcs, higher index differential-algebraic equations (DAE) arise which necessitate specialized solution techniques
Direct Methods:
- Core: Transform the original infinite-dimensional OCP into a finite-dimensional NonLinear Program (NLP)
- solved by structure-exploiting numerical optimization methods
- first discretize, then optimize
  - the problem is first converted into a discrete one, on which optimization techniques are then deployed
- Advantages over indirect ones: they can easily treat all sorts of constraints, such as e.g. the inequality path constraints in one formulation
  - the activation and de-activation of the inequality constraints, i.e. structural changes in active constraints, occurring during the optimization procedure are treated by well-developed NLP methods that can efficiently deal with such active set changes
- All direct methods are based on one form or another of finite-dimensional parameterization of the control trajectory, but differ significantly in the way the state trajectory is handled
- For solution of constrained optimal control problems in real world applications, direct methods are nowadays by far the most widespread and successfully used techniques
1. Direct Single Shooting
2. Direct Multiple Shooting
3. Direct Collocation

1 Introduction ​

1.1 basic problem ​

Payoffs ​

The basic problem ​

1.2 Examples ​

Example 1: Control of Production and Consumption ​

Example 2: Reproduction Strategies in Social Insects ​

Example 3: A Pendulum ​

Example 4: A Moon Lander ​

Example 5: Rocket Railroad Car ​

1.3 A geometric solution ​

Geometric interpretation ​

1.4 Optimal Control Solutions ​