Skip to content

1 Introduction

Reference e-book: An Introduction to Mathematical Optimal Control Theory by Lawrence C. Evans in 2024

1.1 basic problem

DYNAMICS: Consider an ODE:

{x˙(t)=f(x(t))(t>0)x(0)=x0
  • x0Rn: initial point
  • f:RnRn: dynamic function
  • x:[0,)Rn: unknown; dynamic evolution of the state of some "system"

CONTROLLED DYNAMICS: Generalize a bit, f also depends upon some "control" parameters aARm. So f:Rn×ARn:

{x˙(t)=f(x(t),a)(t>0)x(0)=x0

Change the value a as the system evolves, like:

α(t)={a10tt1a2t1<tt2a3t2<tt3etc.

Then the dynamic equation becomes:

{x˙(t)=f(x(t),α(t))(t>0)x(0)=x0contolled dynamics

We call a function α:[0,)A a control and regard the trajectory x(t) as te corresponding response of the system.

Notation

  • Introduce:

    A={α:[0,)A|α()measurable}

    To denote the collection of all admissible controls

  • x()=x(,α(),x0) would be more precise

Payoffs

Overall task will be to determine what is the "best" control for our system. For this we need to specify a specific payoff (or reward) criterion. Let us define the payoff functional

P[α()]:=0Tr(x(t),α(t))dt+g(x(T))

where:

  • x() solves ODE for the control α()
  • r:Rn×AR: Given; runnig payoff
  • g:RnR: Given; terminal payoff;
  • T: Given; terminal time

The basic problem

Aim: find a control α() which maximizes the payoff:

P[α()]P[α()].

for all controls α()A. Such a control α() is called an optimal control.

This task presents us with these mathematical issues:

  1. Does an optimal control exist?
  2. How can we characterize an optimal control mathematically?
  3. How can we construct an optimal control?

These turn out to be sometimes subtle problems, as the following collection of examples illustrates.

1.2 Examples

Example 1: Control of Production and Consumption

Suppose we own, say, a factory whose output we can control. Let us begin to construct a mathematical model by setting

x(t)= amount of output produced at time t0

We suppose that we consume some fraction of our output at each time, and likewise can reinvest the remaining fraction. Let us denote:

α(t)= fraction of output reinvested at time t0

This will be our control, and s.t. the obvious constraint that:

0α(t)1,for each time t0

The corresponding dynamic equation is:

{x˙(t)=kα(t)x(t)(t>0)x(0)=x0

The constant k>0 modelling the growth rate of our reinvestment.

Take as a payoff functional:

P[α()]:=0T(1α(t))x(t)dt

That means we want to maximize our total consumption of the output, our consumption at a given time t being (1α(t))x(t). Here n=m=1 and:

A=[0,1],f(x,a)=kax,r(x,a)=(1a)x,g0.

In 4.4.2, we will see that the optimal control is α is given by:

α={10x(t)t0t<x(t)T

In other words, we should reinvest all the output (and therefore consume nothing) up until time t, and afterwards, we should consume everything (and therefore reinvest nothing). The switchover time t will have to be determined. We call α() a bang–bang control.

Example 2: Reproduction Strategies in Social Insects

In this example, we consider a population of social insects, a population if bees. Write T for the length of the season,, and introduce the variables:

  • w(t)= number of workers at time t
  • q(t)= number of queens
  • α(t)= fraction of colony effort devoted to increasing work force

Constraint of α(t): 0α(t)1

Introduce the dynamic for the numbers of workers and the number of queens:

  • workers:{w˙(t)=μw(t)+bs(t)α(t)w(t)w(0)=w0
    • μ:is the death rate of workers; given constant
    • s(t): the known rate at which each worker contributes to the bee economy
  • queens:{q˙(t)=νq(t)+c(1α(t))s(t)w(t)q(0)=q0
    • ν,c constant

Goal: maximize the queens at time T:

P[α()]=q(T)

We have x(t)=(w(t),q(t)) and x0=(w0,q0), take r0 and g(w,q)=q

answer will again turn out to be a bang–bang control

Example 3: A Pendulum

A hanging pendlum: θ(t)= angle of the pendulum at time t0

If no external force:

{θ¨(t)+λθ˙(t)+ω2θ(t)=0θ(0)=θ1,θ˙(0)=θ2

The solution will be a damped oscillation, provided λ>0

Let α(t) denote an applied torque: |α|1

Our dynamics now become

{θ¨(t)+λθ˙(t)+ω2θ(t)=α(t)θ(0)=θ1,θ˙(0)=θ2

Define x1(t)=θ(t),x2(t)=θ˙(t), and x(t)=(x1(t),x2(t)). Then we can write the evolution as the system

x˙(t)=(x˙1x˙2)=(θ˙θ¨)=(x2λx2ω2x1+α(t))=f(x,α).

We introduce as well

P[α()]=0τ1dt=τ

for

τ=τ(α())= first time that x(τ)=0 (that is, θ(τ)=θ˙(τ)=0.)

Maximize P[], meaning that we want to minimize the time it takes to bring the pendulum to rest.

The terminal time isn't fixed, but rather depends upon the control. This's a fixed endpoint, free time problem.

Example 4: A Moon Lander

This model asks us to bring a spacecraft to a soft landing on the lunar surface, using the least amount of fuel.

Introduce the notation:

Notation

  • h(t): height at time t
  • v(t): velocity =h˙(t)
  • m(t): mass of spacecraft at time t (changing as fuel is burned)
  • α(t): thrust at time t, assumed that 0α(t)1

For Newton's law:

mh¨=gm+α

Modelled by ODE:

{v˙(t)=g+α(t)m(t)h˙(t)=v(t)m˙(t)=kα(t)

We want to minimize the amount of fuel used up, that is, to maximize the amount remaining once we have landed. Thus

P[α()]=m(τ)

where τ is the first time that h(τ)=v(τ)=0. This's a variable endpoint problem, since the final time is not given in advance.

We have also the extra constraints h(t)0,m(t)0

Example 5: Rocket Railroad Car

Imagine a railroad car powered by rocket engines on each side. We introduce the variables:

  • q(t): position at time t
  • v(t)=q˙(t): velocity at time t
  • α(t): thrust from rockets at time t, assumed that 1α(t)1

We want to figure out how to fire the rockets, so as to arrive at the origin 0 with zero velocity in a minimum amount of time. Assuming the car has mass m=1, the law of motion is

q¨(t)=α(t)

Rewrie by setting x(t)=(q(t),v(t)). Then

{x˙(t)=(0100)x(t)+(01)α(t)x(0)=x0=(q0,v0)T

Take

P[α()]=0τ1dt=τ

for τ= first time that q(τ)=v(τ)=0

1.3 A geometric solution

Introduce some ad hoc calculus and geometry methods for the rocke car problem.

First of all, let us guess that to find an optimal solution we will need only to consider the cases α=1 or α=1. In other words, we will focus our attention only upon those controls for which at each moment of time either the left or the right rocket engine is fired at full power.

  • CASE 1: α1

    {q˙=vv˙=1

    Then

    vv˙=q˙

    And so

    12(v2)=q˙

    Let t0 belong to the time interval where α1 and interate from t0 to t:

    v2(t)2v2(t0)2=q(t)q(t0)

    Then

    v2(t)=2q(t)+(v2(t0)2q(t0))b

    In other words, so long as the control is set for α1, the trajectory stays on the curve v2=2q+b for some constant b.

    curve1
  • CASE 2: α1

    {q˙=vv˙=1

Then

12(v2)=q˙

Let t1 belong to the time interval where α1 and interate from t1 to t:

v2(t)=2q(t)+(2q(t1)v2(t1))c

As long as the control is set for α1, the trajectory stays on the curve v2=2q+c for some constant c.

curve2

Geometric interpretation

Now we can design an optimal control α(), which causes the trajectory to jump between the families of right– and left–pointing parabolas, as drawn. Say we start at the black dot, and wish to steer to the origin. This we accomplish by first setting the control to the value α=1, causing us to move down along the second family of parabolas. We then switch to the control α=1, and thereupon move to a parabola from the first family, along which we move up and to the left, ending up at the origin. See the picture.

control

Released under the MIT License.