arXiv:math/0703377v1 [math.OC] 13 Mar 2007

NONLINEAR OPTIMAL CONTROL VIA OCCUPATION MEASURES AND LMI-RELAXATIONS JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, ´ AND EMMANUEL TRELAT Abstract. We consider the class of nonlinear optimal control problems (OCP) with polynomial data, i.e., the differential equation, state and control constraints and cost are all described by polynomials, and more generally for OCPs with smooth data. In addition, state constraints as well as state and/or action constraints are allowed. We provide a simple hierarchy of LMI (linear matrix inequality)-relaxations whose optimal values form a nondecreasing sequence of lower bounds on the optimal value. Under some convexity assumptions, the sequence converges to the optimal value of the OCP. Preliminary results show that good approximations are obtained with few moments.

1. INTRODUCTION Solving a general nonlinear optimal control problem (OCP) is a difficult challenge, despite powerful theoretical tools are available, e.g. the maximum principle and Hamilton-Jacobi-Bellman (HJB) optimality equation. The problem is even more difficult in the presence of state and/or control constraints. State constraints are particularly difficult to handle, and the interested reader is referred to CapuzzoDolcetta and Lions [7] and Soner [39] for a detailed account of HJB theory in the case of state constraints. There exist many numerical methods to compute the solution of a given optimal control problem; for instance, multiple shooting techniques which solve two-point boundary value problems as described, e.g., in [40, 34], or direct methods, as, e.g., in [41, 12, 13], which use, among others, descent or gradientlike algorithms. To deal with optimal control problems with state constraints, some adapted versions of the maximum principle have been developed (see [25, 32], and see [14] for a survey of this theory), but happen to be very hard to implement in general. On the other hand, the OCP can be written as an infinite-dimensional linear program (LP) over two spaces of measures. This is called the weak formulation of the OCP in Vinter [44] (stated in the more general context of differential inclusions). The two unknown measures are the state-action occupation measure (o.m.) up to the final time T , and the state o.m. at time T . The optimal value of the resulting LP always provides a lower bound on the optimal value of the OCP, and under some convexity assumptions, both values coincide; see Vinter [44] and HernandezHernandez et al. [23] as well. Date: February 2, 2008. 1991 Mathematics Subject Classification. 90C22, 93C10, 28A99. Key words and phrases. Nonlinear control; Optimal control; Semidefinite Programming; Measures; Moments. 1

´ 2JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

The dual of the original infinite dimensional LP has an interpretation in terms of ”subsolutions” of related HJB-like optimality conditions, as for the unconstrained case. The only difference with the unconstrained case is the underlying function space involved, which directly incorporate the state constraints. Namely, the functions are only defined on the state constraint set . An interesting feature of this LP approach with o.m.’s is that state constraints, as well as state and/or action constraints are all easy to handle; indeed they simply translate into constraints on the supports of the unknown o.m.’s. It thus provides an alternative to the use of maximum principles with state constraints. However, although this LP approach is valid for any OCP, solving the corresponding (infinitedimensional) LP is difficult in general; however, general LP approximation schemes based on grids have been proposed in e.g. Hernandez and Lasserre [21]. This LP approach has also been used in the context of discrete-time Markov control processes, and is dual to Bellman’s optimality principle. For more details the interested reader is referred to Borkar [4], Hernandez-Lerma and Lasserre [18, 19, 22] and many references therein. For some continuous-time stochastic control problems (e.g., modeled by diffusions) and optimal stopping problems, the LP approach has also been used with success to prove existence of stationary optimal policies; see for instance Cho and Stockbridge [8], Helmes and Stockbridge [15], Helmes et al. [16], Kurtz and Stockbridge [27], and also Fleming and Vermes [11]. In some of these works, the moment approach is also used to approximate the resulting infinitedimensional LP. Contribution. In this paper, we consider the particular class of nonlinear OCP’s with state and/or control constraints, for which all data describing the problem (dynamics, state and control constraints) are polynomials. The approach also extends to the case of problems with smooth data and compact sets, because polynomials are dense in the space of functions considered; this point of view is detailed in §4. In this restricted polynomial framework, the LP approach has interesting additional features that can be exploited for effective numerical computation. Indeed, what makes this LP approach attractive is that for the class of OCPs considered: • Only the moments of the o.m.’s appear in the LP formulation, so that we already end up with countably many variables, a significant progress. • Constraints on the support of the o.m.’s translate easily into either LP or SDP (Semi Definite Programming) necessary constraints on their moments. Even more, for (semi-algebraic) compact supports, relatively recent powerful results from real algebraic geometry make these constraints also sufficient. • When truncating to finitely many moments, the resulting LP or SDP’s are solvable and their optimal values form a monotone nondecreasing sequence (indexed by the number of moments considered) of lower bounds on the optimal value of the LP (and thus of the OCP). Therefore, based on the above observations, we propose an approximation of the optimal value of the OCP via solving a hierarchy of SDPs (or linear matrix inequalities, LMIs)that provides a monotone nondecreasing sequence of lower bounds on the optimal value of the weak LP formulation of the OCP. In adddition, under some compactness assumption of the state and control constraint sets, the sequence of lower bounds is shown to converge to the optimal value of the LP, and thus the optimal value of the OCP when the former and latter are equal.

OPTIMAL CONTROL AND LMI-RELAXATIONS

3

As such, it could be seen as a complement to the above shooting or direct methods, and when the sequence of lower bounds converges to the optimal value, a test of their efficiency. Finally this approach can also be used to provide a certificate of unfeasibility. Indeed, if in the hierarchy of LMI-relaxations of the minimum time OCP, one is infeasible then the OCP itself is infeasible. It turns out that sometimes this certificate is provided at an early stage in the hierarchy, i.e. with very few moments. This is illustrated on two simple examples. In a pioneering paper, Dawson [10] had suggested the use of moments in the LP approach with o.m.’s, but results on the K-moment problem by Schm¨ udgen [38] and Putinar [37] were either not available at that time. Later, Helmes and Stockbridge [15] and Helmes, R¨ ohl and Stockbridge [16] have used LP moment conditions for computing some exit time moments in some diffusion model, whereas for the same models, Lasserre and Prieto-Rumeau [29] have shown that SDP moment conditions are superior in terms of precision and number of moments to consider; in [30], they have extended the moment approach for options pricing problems in some mathematical finance models. More recently, Lasserre, Prieur and Henrion [31] have used the o.m. approach for minimum time OCP without state constraint. Preliminary experimental results on Brockett’s integrator example, and the double integrator show fast convergence with few moments. 2. Occupation measures and the LP approach 2.1. Definition of the optimal control problem. Let n and m be nonzero integers. Consider on Rn the control system (2.1)

x(t) ˙ = f (t, x(t), u(t)),

where f : [0, +∞)×Rn ×Rm −→ Rn is smooth, and where the controls are bounded measurable functions, defined on intervals [0, T (u)] of R+ , and taking their values in a compact subset U of Rm . Let x0 ∈ Rn , and let X and K be compact subsets of Rn . For T > 0, a control u is said admissible on [0, T ] whenever the solution x(·) of (2.1), such that x(0) = x0 , is well defined on [0, T ], and satisfies (2.2)

(x(t), u(t)) ∈ X × U a.e. on [0, T ],

and (2.3)

x(T ) ∈ K.

Denote by UT the set of admissible controls on [0, T ]. For u ∈ UT , the cost of the associated trajectory x(·) is defined by Z T h(t, x(t), u(t))dt + H(x(T )), (2.4) J(0, T, x0 , u) = 0

where h : [0, +∞) × Rn × Rm −→ R and H : Rn → R are smooth functions. Consider the optimal control problem of determining a trajectory solution of (2.1, starting from x(0) = x0 , satisfying the state and control constraints (2.2), the terminal constraint (2.3), and minimizing the cost (2.4). The final time T may be fixed or not. If the final time T is fixed, we set (2.5)

J ∗ (0, T, x0 ) := inf J(0, T, x0 , u), u∈UT

´ 4JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

and if T is free, we set (2.6)

J ∗ (0, x0 ) :=

inf

T >0, u∈UT

J(0, T, x0 , u),

Note that a particular OCP is the minimal time problem from x0 to K, by letting h ≡ 1, H ≡ 0. In this particular case, the minimal time is the first hitting time of the set K. It is possible to associate a stochastic or deterministic OCP with an abstract infinite dimensional linear programming (LP) problem P together with its dual P∗ (see for instance Hern´andez-lerma and Lasserre [18] for discrete-time Markov control problems, and Vinter [44], Hernandez et al. [23] for deterministic optimal control problems, as well as many references therein). We next describe this LP approach in the present context. 2.2. Notations and definitions. For a topological space X , let M(X ) be the Banach space of finite signed Borel measures on X , equipped with the norm of total variation, and denote by M(X )+ its positive cone, that is, the space of finite measures on X . Let C(X ) be the Banach space of bounded continuous functions on X , equipped with the sup-norm. Notice that when X is compact Hausdorff, then M(X ) ≃ C(X )∗ , i.e., M(X ) is the topological dual of C(X ). Let Σ := [0, T ] × X, S := Σ × U, and let C1 (Σ) be the Banach space of functions ϕ ∈ C(Σ) that are continuously differentiable. For ease of exposition we use the same notation g (resp. h) for a polynomial g ∈ R[t, x] (resp. h ∈ R[x]) and its restriction to the compact set Σ (resp. K). Next, with u ∈ U, let A : C1 (Σ) → C(S) be the mapping ∂ϕ (t, x) + hf (t, x, u), ∇x ϕ(t, x)i. ∂t Notice that ∂ϕ/∂t + h∇x ϕ, f i ∈ C(S) for every ϕ ∈ C1 (Σ), because both X and U are compact, and f is understood as its restriction to S. Next, let L : C1 (Σ) → C(S) × C(K) be the linear mapping

(2.7)

ϕ 7→ Aϕ(t, x, u) :=

(2.8)

ϕ 7→ Lϕ := (−Aϕ, ϕT ),

where ϕT (x) := ϕ(T, x), for all x ∈ X. Obviously, L is continuous with respect to the strong topologies of C1 (Σ) and C(S) × C(K). Both (C(S), M(S)) and (C(K), M(K)) form a dual pair of vector spaces, with duality brackets Z hh, µi = h dµ, ∀ (h, µ) ∈ C(S) × M(S),

and

hg, νi = .

Z

g dν,

∀ (g, ν) ∈ C(K) × M(K)

Let L∗ : M (S) × M (K) → C1 (Σ)∗ be the adjoint mapping of L, defined by (2.9)

h(µ, ν), Lϕi = hL∗ (µ, ν), ϕi,

for all ((µ, ν), ϕ) ∈ M (S) × M (K) × C1 (Σ). Remark 2.1. (i) The mapping L∗ is continuous with respect to the weak topologies σ(M(S) × M(K), C(S) × C(K)), and σ(C1 (Σ)∗ , C1 (Σ)).

OPTIMAL CONTROL AND LMI-RELAXATIONS

5

(ii) Since the mapping L is continuous in the strong sense, it is also continuous with respect to the weak topologies σ(C1 (Σ), C1 (Σ)∗ ) and σ(C(S) × C(K), M(S) × M(K)). (iii) In the case of a free terminal time T ≤ T0 , it suffices to redefine L : C1 (Σ) → C(S) × C([0, T0 ] × K) by Lϕ := (−Aϕ, ϕ). The operator L∗ : M (S)×M ([0, T0 ]×K) → C1 (Σ)∗ is still defined by (2.9), for all ((µ, ν), ϕ) ∈ M (S) × M ([0, T0 ] × K) × C1 (Σ). For time-homogeneous free terminal time problems, one only needs functions ϕ of x only, and so Σ = S = X × U and L : C1 (Σ) → C(S) × C(K). 2.3. Occupation measures and primal LP formulation. Let T > 0, and let u = {u(t), 0 ≤ t < T } be a control such that the solution of (2.1), with x(0) = x0 , is well defined on [0, T ]. Define the probability measure ν u on Rn , and the measure µu on [0, T ] × Rn × Rm , by (2.10)

ν u (D)

(2.11)

µu (A × B × C)

:= ID [x(T )], D ∈ Bn , Z IB×C [(x(t), u(t))] dt, := [0,T ]∩A

for all rectangles (A × B × C), with (A, B, C) ∈ A × Bn × Bm , and where Bn (resp. Bm ) denotes the usual Borel σ-algebra associated with Rn (resp. Rm ), and A the Borel σ-algebra of [0, T ], and IB (•) the indicator function of the set B. The measure µu is called the occupation measure of the state-action (deterministic) process (t, x(t), u(t)) up to time T , whereas ν u is the occupation measure of the state x(T ) at time T . Remark 2.2. If the control u is admissible on [0, T ], i.e., if the trajectory x(·) satisfies the constraints (2.2) and (2.3), then ν u is a probability measure supported on K (i.e. ν u ∈ M(K)+ ), and µu is supported on [0, T ]×X×U (i.e. µu ∈ M(S)+ ). In particular, T = µu (S). Conversely, if the support of µu is contained in S = [0, T ] × X × U and if u µ (S) = T , then (x(t), u(t)) ∈ X × U for almost every t ∈ [0, T ]. Indeed, with (2.11), Z T T = IX×U [(x(s), u(s))] ds 0

⇒ IX×U [(x(s), u(s))] = 1 a.e. in [0, T ], and hence (x(t), u(t)) ∈ X × U, for almost every t ∈ [0, T ]. If moreover the support of ν u is contained in K, then x(T ) ∈ K. Therefore, u is an admissible control on [0, T ]. Then, observe that the optimization criterion (2.5) now writes Z Z H dν u + h dµu = h(µu , ν u ), (h, H)i, J(0, T, x0 , u) = K

S

and one infers from (2.1), (2.2) and (2.3) that Z Z ∂g + h∇x g, f i dµu , gT dν u − g(0, x0 ) = (2.12) ∂t S K

for every g ∈ C1 (Σ) (where gT (x) ≡ g(T, x) for every x ∈ K), or equivalently, in view of (2.8) and (2.9), hg, L∗ (µu , ν u )i = hg, δ(0,x0 ) i,

∀g ∈ C1 (Σ).

´ 6JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

This in turn implies that L∗ (µu , ν u ) = δ(0,x0 ) . Therefore, consider the infinite-dimensional linear program P (2.13)

P:

inf

(µ,ν)∈∆

{h(µ, ν), (h, H)i | L∗ (µ, ν) = δ(0,x0 ) }

(where ∆ := M(S)+ × M(K)+ ). Denote by inf P its optimal value and min P is the infimum is attained, in which case P is said to be solvable. The problem P is said feasible if there exists (µ, ν) ∈ ∆ such that L∗ (µ, ν) = δ(0,x0 ) . Note that P is feasible whenever there exists an admissible control. The linear program P is a rephrasing of the OCP (2.1)–(2.5) in terms of the occupation measures of its trajectories (t, x(t), u(t)). Its dual LP reads (2.14)

P∗ :

sup

{hδ(0,x0 ) , ϕi | Lϕ ≤ (h, H)}

ϕ∈C1 (Σ)

where Lϕ ≤ (h, H) ⇔

Aϕ(t, x, u) + h(t, x, u) ϕ(T, x)

≥0 ≤ H(x)

∀(t, x, u) ∈ S . ∀x ∈ K

Denote by sup P∗ its optimal value and max P∗ is the supremum is attained, i.e. if P∗ is solvable. Discrete-time stochastic analogues of the linear programs P and P∗ are also described in Hern´andez-Lerma and Lasserre [18, 19] for discrete time Markov control problems. Similarly see Cho and Stockbridge [8], Kurtz and Stockbridge [27], and Helmes and Stcokbridge [16] for some continuous-time stochastic problems. Theorem 2.3. If P is feasible, then: (i) P is solvable, i.e., inf P = min P ≤ J(0, T, x0 ). (ii) There is no duality gap, i.e., sup P∗ = min P. (iii) If moreover, for every (t, x) ∈ Σ, the set f (t, x, U) ⊂ Rn is convex, and the function v 7→ gt,x (v) := inf { h(t, x, u) : u∈U

v = f (t, x, u)}

is convex, then the OCP (2.1)–(2.5) has an optimal solution and sup P∗ = inf P = min P = J ∗ (0, T, x0 ). For a proof see §5.4. Theorem 2.3(iii) is due to Vinter [44]. 3. Semidefinite programming relaxations of P The linear program P is infinite dimensional, and thus, not tractable as it stands. Therefore, we next present a relaxation scheme that provides a sequence of semidefinite programming, or linear matrix inequality relaxations (in short, LMIrelaxations) {Qr }, each with finitely many constraints and variables. Let R[x] = [x1 , . . . xn ] (resp. R[t, x, u] = R[t, x1 , . . . xn , u1 , . . . , um ]) denote the set of polynomial functions of the variable x (resp., of the variables t, x, u). Assume that X and K (resp., U) are compact semi-algebraic subsets of Rn (resp. of Rm ), of the form (3.1) (3.2)

X := K :=

{x ∈ Rn {x ∈ Rn

| vj (x) ≥ 0, | θj (x) ≥ 0,

(3.3)

U :=

{u ∈ Rm

| wj (u) ≥ 0,

j ∈ J}, j ∈ JT }, j ∈ W },

OPTIMAL CONTROL AND LMI-RELAXATIONS

7

for some finite index sets JT , J and W , where vj , θj and wj are polynomial functions. Define (3.4)

d(X, K, U) :=

max

j∈J1 , l∈J, k∈W

(deg θj , deg vl , deg wk ).

To highlight the main ideas, in this section we assume that f , h and H are polynomial functions, that is, h ∈ R[t, x, u], H ∈ R[x], and f : [0, +∞)×Rn ×Rm → Rn is polynomial, i.e., every component of f satisfies fk ∈ R[t, x, u], for k = 1, . . . , n. 3.1. The underlying idea. Observe the following important facts. The restriction of R[t, x] to Σ belongs to C1 (Σ). Therefore, L∗ (µ, ν) = δ(0,x0 )

⇔

hg, L∗ (µ, ν)i = g(0, x0 ),

∀g ∈ R[t, x],

because Σ being compact, polynomial functions are dense in C1 (Σ) for the supnorm. Indeed, on a compact set, one may simultaneously approximate a function and its (continuous) partial derivatives by a polynomial and its derivatives, uniformly (see [24] pp. 65-66). Hence, the linear program P can be written ( inf {h(µ, ν), (h, H)i (µ,ν)∈∆ P: s.t. hg, L∗ (µ, ν)i = g(0, x0 ), ∀g ∈ R[t, x], or, equivalently, by linearity, ( inf {h(µ, ν), (h, H)i (µ,ν)∈∆ (3.5) P: s.t. hLg, (µ, ν)i = g(0, x0 ),

∀ g = (tp xα ); (p, α) ∈ N × Nn .

The constraints of P, (3.6)

hLg, (µ, ν)i = g(0, x0 ),

∀ g = (tp xα ); (p, α) ∈ N × Nn ,

define countably many linear equality constraints linking the moments of µ and ν, because if g is polynomial then so are ∂g/∂t and ∂g/∂xk , for every k, and h∇x g, f i. And so, Lg is polynomial. The functions h, H being also polynomial, the cost h(µ, ν), (h, H)i of the OCP (2.1)–(2.5) is also a linear combination of the moments of µ and ν. Therefore, the linear program P in (3.5) can be formulated as a LP with countably many variables (the moments of µ and ν), and countably many linear equality constraints. However, it remains to express the fact that the variables should be moments of some measures µ and ν, with support contained in S and K respectively. At this stage, one will make some (weak) additional assumptions on the polynomials that define the compact semi-algebraic sets X, K and U. Under such assumptions, one may then invoke recent results of real algebraic geometry on the representation of polynomials positive on a compact set, and get necessary and sufficient conditions on the variables of P to be indeed moments of two measures µ and ν, with appropriate support. We will use Putinar’s Positivstellensatz [37] described in the next section, which yields SDP constraints on the variables. One might also use other representation results like e.g. Krivine [26] and Vasilescu [43], and obtain linear constraints on the variables (as opposed to SDP constraints). This is the approach taken in e.g. Helmes et al. [16]. However, a comparison of the use of LP-constraints versus SDP constraints on a related problem [29] has dictated our choice of the former.

´ 8JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

Finally, if g in (3.6) runs only over all monomials of degree less than r, one then obtains a corresponding relaxation Qr of P, which is now a finite-dimensional SDP that one may solve with public software packages. At last, one lets r → ∞. 3.2. Notations, definitions and auxiliary results. For a multi-index α = αn 1 (α1 , . . . , αn ) ∈ Nn , and for x = (x1 , . . . , xn ) ∈ Rn , denote xα := xα 1 · · · xn . Consider the canonical basis {xα }α∈Nn (resp., {tp xα uβ }p∈N,α∈Nn,β∈Nm ) of R[x] (resp., of R[t, x, u]). Given two sequences y = {yα }α∈Nn and z = {zγ }γ∈N×Nn×Nm of real numbers, define the linear functional Ly : R[x] → R by X X H(:= Hα xα ) 7→ Ly (H) := Hα y α , α∈Nn

α∈Nn

and similarly, define the linear functional Lz : R[t, x, u] → R by X X h 7→ Lz (h) := hγ z γ = hpαβ zpαβ , γ∈N×Nn ×Nm

P

p∈N,α∈Nn ,β∈Nm

p α β

where h(t, x, u) = p∈N,α∈Nn ,β∈Nm hpαβ t x u . Note that, for a given measure ν (resp., µ) on R (resp., on R × Rn × Rm ), there holds, for every H ∈ R[x] (resp., for every h ∈ R[t, x, u]), Z Z X X Hα xα dν = Hα yα = Ly (H), hν, Hi = Hdν = R R R α where the real numbers yα = x dν are the moments of the measure ν (resp., hµ, hi = Lz (h), where z is the sequence of moments of the measure µ). Definition 3.1. For a given sequence z = {zγ }γ∈N×Nn×Nm of real numbers, the moment matrix Mr (z) of order r associated with z, has its rows and columns indexed in the canonical basis {tp xα uβ }, and is defined by Mr (z)(γ, β) = zγ+β , γ, β ∈ N × Nn × Nm , |γ|, |β| ≤ r, P where |γ| := j γj . The moment matrix Mr (y) of order r associated with a given sequence y = {yα }α∈Nn , has its rows and columns indexed in the canonical basis {xα }, and is defined in a similar fashion.

(3.7)

Note that, if z has a representing measure µ, i.e., R if z is the sequence of moments of the measure µ on R × Rn × Rm , then Lz (h) = hdµ, for every h ∈ R[t, x, u], and if h denotes the vector of coefficients of h ∈ R[t, x, u] of degree less than r, then Z hh, Mr (z)hi = Lz (h2 ) = h2 dµ ≥ 0.

This implies that Mr (z) is symmetric nonnegative (denoted Mr (z) 0), for every r. The same holds for Mr (y). Conversely, not every sequence y that satisfies Mr (y) 0 for every r, has a representing measure. However, several sufficient conditions exist, and in particular the following one, due to Berg [3]. Proposition 3.2. If y = {yα }α∈Nn satisfies |yα | ≤ 1 for every α ∈ Nn , and Mr (y) 0 for every integer r, then y has a representing measure on Rn , with support contained in the unit ball [−1, 1]n . We next present another sufficient condition which is crucial in the proof of our main result.

OPTIMAL CONTROL AND LMI-RELAXATIONS

9

Definition 3.3. For a given polynomial θ ∈ R[t, x, u], written X θδ tp xα uβ , θ(t, x, u) = δ=(p,α,β)

define the localizing matrix Mr (θ z) associated with z, θ, and with rows and columns also indexed in the canonical basis of R[t, x, u], by X (3.8) Mr (θ z)(γ, β) = θδ zδ+γ+β γ, β ∈ N × Nn × Nm , |γ|, |β| ≤ r. δ

The localizing matrix Mr (θ y) associated with a given sequence y = {yα }α∈Nn is defined similarly. Note that, if z has a representing measure µ on R × Rn × Rm with support contained in the level set {(t, x, u) : θ(t, x, u) ≥ 0}, and if h ∈ R[t, x, u] has degree less than r, then Z hh, Mr (θ, z)hi = Lz (θ h2 ) = θh2 dµ ≥ 0. Hence, Mr (θ z) 0, for every r. Let Σ2 ⊂ R[x] be the convex cone generated in R[x] by all squares of polynomials, and let Ω ⊂ Rn be the compact basic semi-algebraic set defined by (3.9)

Ω := {x ∈ Rn

| gj (x) ≥ 0,

j = 1, . . . , m}

for some family {gj }m j=1 ⊂ R[x]. Definition 3.4. The set Ω ⊂ Rn defined by (3.9) satisfies Putinar’s condition if Pm 2 there exists u ∈ R[x] such that u = u0 + j=1 uj gj for some family {uj }m j=0 ⊂ Σ , and the level set {x ∈ Rn | u(x) ≥ 0} is compact. Putinar’s condition is satisfied if e.g. the level set {x : gk (x) ≥ 0} is compact for some k, or if all the gj ’s are linear, in which case Ω is a polytope. In addition, if one knows some M such that kxk ≤ M whenever x ∈ Ω, then it suffices to add the redundant quadratic constraint M 2 − kxk2 ≥ 0 in the definition (3.9) of Ω, and Putinar’s condition is satisfied (take u := M 2 − kxk2 ). Theorem 3.5 (Putinar’s Positivstellensatz [37]). Assume that the set Ω defined by (3.9) satisfies Putinar’s condition. (a) If f ∈ R[x] and f > 0 on Ω, then (3.10)

f = f0 +

m X

fj gj ,

j=1

2 for some family {fj }m j=0 ⊂ Σ . (b) Let y = {yα }α∈Nn be a sequence of real numbers. If

(3.11)

Mr (y) 0 ;

Mr (gj y) 0,

j = 1, . . . , m;

∀ r = 0, 1, . . .

then y has a representing measure with support contained in Ω.

´ 10 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

3.3. LMI-relaxations of P. Consider the linear program P defined by (3.5). Since the semi-algebraic sets X, K and U defined respectively by (3.1), (3.2) and (3.3) are compact, with no loss of generality, we assume (up to a scaling of the variables x, u and t) that T = 1, X, K ⊆ [−1, 1]n and U ⊆ [−1, 1]m . Next, given a sequence z = {zγ } indexed in the basis of R[t, x, u] denote z(t), z(x) and z(u) its marginals with respect to the variables t, x and u, respectively. These sequences are indexed in the canonical basis of R[t], R[x] and R[u] repectively. For instance, writing γ = (k, α, β) ∈ N × Nn × Nn , {z(t)} = {zk,0,0 }k∈N ;

{z(x)} = {z0,α,0 }α∈Nn ;

{z(u)} = {z0,0,β }β∈Nm .

Let r0 be an integer such that 2r0 ≥ max (deg f, deg h, deg H, 2d(X, K, U)), where d(X, K, U) is defined by (3.4). For every r ≥ r0 , consider the LMI-relaxation inf Lz (h) + Ly (H) y,z M r (y), Mr (z) 0 j ∈ J1 M r−deg θj (θj y) 0, (v z(x)) 0, j∈J M j r−deg v j (3.12) Qr : , M (w z(u)) 0, k∈W r−deg w k k Mr−1 (t(1 − t) z(t)) 0 p α L y (g1 ) − Lz (∂g/∂t + h∇x g, f i) = g(0, x0 ), ∀g = (t x ) with p + |α| − 1 + deg f ≤ 2r

whose optimal value is denoted by inf Qr .

OCP with free terminal time. For the OCP (2.6), i.e., with free terminal time T ≤ T0 , we need adapt the notation because T is now a variable. As already mentioned in Remark 2.1(iii), the measure ν in the infinite dimensional linear program P defined in (2.13), is now supported in [0, T0 ] × K (and [0, 1] × K after re-scaling) instead of K previously. Hence, the sequence y associated with ν is now indexed in the basis {tp xα } of R[t, x] instead of {xα } previously. Therefore, given y = {ykα } indexed in that basis, let y(t) and y(x) be the subsequences of y defined by: y(t) := {yk0 }k ,

k ∈ N; ;

y(x) = {y0α },

α ∈ Nn .

Then again (after rescaling), the LMI-relaxation Qr now reads inf Lz (h) + Ly (H) y,z Mr (y), Mr (z) 0 Mr−r(θj ) (θj y) 0, j ∈ J1 Mr−r(vj ) (vj z(x)) 0, j ∈ J . (3.13) Qr : Mr−r(wk ) (wk z(u)) 0, k ∈ W M (t(1 − t) y(t)) 0 r−1 Mr−1 (t(1 − t) z(t)) 0 Ly (g) − Lz (∂g/∂t + h∇x g, f i) = g(0, x0 ), ∀g = (tp xα ) with p + |α| − 1 + deg f ≤ 2r

The particular case of minimal time problem is obtained with h ≡ 1, H ≡ 0. For time-homogeneous problems, i.e., when h and f do not depend on t, one may take µ (resp. ν) supported on X × U (resp. K), which simplifies the associated LMI-relaxation (3.13). The main result is the following.

OPTIMAL CONTROL AND LMI-RELAXATIONS

11

Theorem 3.6. Let X, K ⊂ [−1, 1]n, and U ⊂ [−1, 1]m be compact basic semialgebraic-sets respectively defined by (3.1), (3.2) and (3.3). Assume that X, K and U satisfy Putinar’s condition (see Definition (3.4)), and let Qr be the LMIrelaxation defined in (3.12). Then, (i) inf Qr ↑ min P as r → ∞; (ii) if moreover, for every (t, x) ∈ Σ, the set f (t, x, U) ⊂ Rn is convex, and the function v 7→ gt,x (v) := inf { h(t, x, u) | v = f (t, x, u)} u∈U

is convex, then inf Qr ↑ min P = J ∗ (0, T, x0 ), as r → ∞. The proof of this result is postponed to the Appendix in Section §5.5. 3.4. The dual Q∗r . We describe the dual of the LMI-relaxation Qr which is also a semidefinite program, denoted Q∗r , and relate Q∗r with the dual P∗ of P, defined in (2.14). Let s(r) be the cardinal of the set Vr := {(k, α) ∈ N × Nn | k + |α| ≤ r − r0 }, and given λ ∈ Rs(r) , let Λr ∈ R[t, x] be the polynomial X λkα tk xα . (t, x) 7→ Λr (t, x) := (k,α)∈Vr

Consider the semidefinite program: Λr (0, x0 ), sup y q0 ,qjx ,qk ,l0 ,lj ,Λr P P h + A Λr = q0 t(1 − t) + k∈W qku wk + j∈J qjx vj , P ∗ H − Λr (1, .) = l0 + j∈J1 lj θj , . (3.14) Qr : q0 ∈ R[t], qku ∈ R[u], qjx ∈ R[x], lj ∈ R[x] x u of squares polynomials), and {q0 , qj , qk , l0 , lxj } s.o.s. (sums deg lj θj , deg qj vj , deg qku wk , deg q0 ≤ 2r − 2; deg l0 ≤ 2r.

The LMI Q∗r is a reinforcement of P∗ in the following sense:

• the unknown function ϕ ∈ C1 (Σ) is now replaced with a polynomial Λr ∈ R[t, x] of degree less than 2r; • the constraint −Aϕ ≤ h for (t, x, u) ∈ S, is now replaced with the constraint h + AΛr ≥ 0 on S and the polynomial h + AΛrP≥ 0 which is nonnegative P on S, has Putinar’s representation q0 t(1 − t) + k∈W qku wk + j∈J qjx vj ; • the constraint ϕ1 ≤ H for x ∈ K, is replaced with the constraint H − Λr (1, .) ≥ 0 on K, and the polynomial P H − Λr (1, .) which is nonnegative on K, has Putinar’s representation l0 + j∈J1 lj θj .

Assume that Q∗r is solvable. A natural question is to know whether or not we can use an optimal solution q0 , qjx , qky , l0 , lj , Λr

´ 12 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

of Q∗r to obtain some information on an optimal solution of P. The most natural idea is to look for the zero set in S of the polynomial X X (t, x, u) 7→ q0 t(1 − t) + qku wk + qjx vj . k∈W

j∈J

sup Q∗r

Indeed, under the assumptions of Theorem 3.6, if = inf Qr , then Λr (0, x0 ) ≈ inf Qr ≈ min P = sup P∗ , and so, the polynomial Λr ∈ R[t, x] seems to be a good candidate to approximate a nearly optimal solution ϕ ∈ C1 (Σ) of P∗ . Next, as Q∗r is an approximation of a weak formulation of the HJB optimality equation, one may hope that the zero set in S of the polynomial h + AΛr provides some good information on the possible states x∗ (t) and controls u∗ (t) at time t in an optimal solution of the OCP (2.1)–(2.5). That is, fixing an arbitrary t0 ∈ [0, 1], one may solve the equation X X qku (u) wk (u) + qjx (x) vj (x) = −q0 t0 (1 − t0 ), k∈W

j∈J

and look for solutions (x, u) ∈ X × U. All these issues deserve further investigation beyond the scope of the present paper. However, at least in the minimum time problem for the (state and control constrained) double integrator example considered in §5.1, we already have some numerical support for the above claims. 3.5. Certificates of non controllability. For minimum time OCPs, i.e., with free terminal time T and instantaneous cost h ≡ 1, and H ≡ 0, the LMI-relaxations Qr defined in (3.13) may provide certificates of non controllability. Indeed, if for a given initial state x0 ∈ X, some LMI-relaxation Qr in the hierarchy has no feasible solution, then the initial state x0 cannot be steered to the origin in finite time. In other words, inf Qr = +∞ provides a certificate of uncontrollability of the initial state x0 . It turns out that sometimes such certificates can be provided at cheap cost, i.e., with LMI-relaxations of low order r. This is illustrated on the Zermelo problem in §5.3. Moreover, one may also consider controllability in given finite time T , that is, consider the LMI-relaxations as defined in (3.12) with T fixed, and H ≡ 0, h ≡ 1. Again, if for a given initial state x0 ∈ X, the LMI-relaxation Qr has no feasible solution, the initial state x0 cannot be steered to the origin in less than T units of time. And so, inf Qr = +∞ also provides a certificate of uncontrollability of the initial state x0 . 4. Generalization to smooth optimal control problems In the previous section, we assumed, to highlight the main ideas, that f , h and H were polynomials. In this section, we generalize Theorem 3.6, and simply assume that f , h and H are smooth. Consider the linear program P defined in the previous section ( inf {h(µ, ν), (h, H)i (µ,ν)∈∆ P: s.t. hg, L∗ (µ, ν)i = g(0, x0 ), ∀g ∈ R[t, x]. Since the sets X, K and U, defined previously, are compact, it follows from [9] (see also [24, pp. 65-66]) that f (resp. h, resp. H) is the limit in C1 (S) (resp. C1 (S), resp. C1 (K)) of a sequence of polynomials fp (resp. hp , resp. Hp ) of degree p, as p → +∞.

OPTIMAL CONTROL AND LMI-RELAXATIONS

13

Hence, for every integer p, consider the linear program Pp ( inf {h(µ, ν), (hp , Hp )i (µ,ν)∈∆ Pp : s.t. hg, L∗p (µ, ν)i = g(0, x0 ), ∀g ∈ R[t, x], the smooth analogue of P, where the linear mapping Lp : C1 (Σ) → C(S) × C(K) is defined by Lp ϕ := (−Ap ϕ, ϕT ), and where Ap : C1 (Σ) → C(S) is defined by ∂ϕ (t, x) + hfp (t, x, u), ∇x ϕ(t, x)i. ∂t For every integer r ≥ max(p/2, d(X, K, U)), let Qr,p denote the LMI-relaxation (3.12) associated with the linear program Pp . Recall that from Theorem 3.6, if K, X and U satisfy Putinar’s condition, then inf Qr,p ↑ min Pp as r → +∞; The next result, generalizing Theorem 3.6, shows that it is possible to let p tend to +∞. For convenience, set Ap ϕ(t, x, u) :=

vr,p = inf Qr,p , vp = min Pp , v = min P. Theorem 4.1. Let X, K ⊂ [−1, 1]n , and U ⊂ [−1, 1]m be compact semi-algebraicsets respectively defined by (3.1), (3.2) and (3.3). Assume that X, K and U satisfy Putinar’s condition (see Definition (3.4)). Then, (i) v = lim lim vr,p = lim sup vr,p ≤ J ∗ (0, T, x0 ). p→+∞ r>p/2

p→+∞ r→+∞ 2r>p

(ii) Moreover if for every (t, x) ∈ Σ, the set f (t, x, U) ⊂ Rn is convex, and the function v 7→ gt,x (v) := inf { h(t, x, u) | v = f (t, x, u)} u∈U

∗

is convex, then v = J (0, T, x0 ). The proof of this result is in the Appendix, Section §5.6. From the numerical point of view, depending on the functions f , h, H, the degree of the polynomials of the approximate OCP Pp may be required to be large, and hence the hierarchy of LMI-relaxations (Qr ) in (3.12) might not be efficiently implementable, at least in view of the performances of public SDP solvers available at present. Remark 4.2. The previous construction extends to smooth optimal control problems on Riemannian manifolds, as follows. Let M and N be smooth Riemannian manifolds. Consider on M the control system (2.1), where f : [0, +∞) × M × N −→ T M is smooth, and where the controls are bounded measurable functions, defined on intervals [0, T (u)] of R+ , and taking their values in a compact subset U of N . Let x0 ∈ M , and let X and K be compact subsets of M . Admissible controls are defined as previously. For an admissible control u on [0, T ], the cost of the associated trajectory x(·) is defined by (2.4), where where h : [0, +∞) × M × N −→ R and H : M → R are smooth functions. According to Nash embedding Theorem [33], there exist an integer n (resp. m) such that M (resp. N ) is smoothly isometrically embedded in Rn (resp. Rm ). In this context, all previous results apply.

´ 14 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

This remark is important for the applicability of the method described in this article. Indeed, many practical control problems (in particular, in mechanics) are expressed on manifolds, and since the optimal control problem investigated here is global, they cannot be expressed in general as control systems in Rn (in a global chart). 5. Illustrative examples We consider here the minimal time OCP, that is, we aim to approximate the minimal time to steer a given initial condition to the origin. We have tested the above methodology on two test OCPs, the double and Brockett integrators, for which the associated optimal value T ∗ can be calculated exactly. The numerical examples in this section were processed with our Matlab package GloptiPoly 3 1. 5.1. The double integrator. Consider the double integrator system in R2 (5.1)

x˙ 1 (t) = x˙ 2 (t) =

x2 (t), u(t),

where x = (x1 , x2 ) is the state and the control u = u(t) ∈ U, satisfies the constraint |u(t)| ≤ 1, for all t ≥ 0. In addition, the state is constrained to satisfy x2 (t) ≥ −1, for all t. In this time-homogeneous case, and with the notation of Section 2, we have X = {x ∈ R2 : x2 ≥ −1}, K = {(0, 0)}, and U = [−1, 1]. Remark 5.1. The theorem obviously extends, up to scaling, to the case of arbitrary compact subsets X, K ⊂ Rn and U ⊂ [−1, 1]m . Observe that X is not compact and so the convergence result of Theorem 3.6 may not hold. In fact, we may impose the additional constraint kx(t)k∞ ≤ M for some large M (and modify X accordingly), because for initial states x0 with kx0 k∞ relatively small with respect to M , the optimal trajectory remains in X. However, in the numerical experiments, we have not enforced an additional constraint. We have maintained the original constraint x2 ≥ −1 in the localizing constraint Mr−r(vj ) (vj z(x)) 0, with x 7→ vj (x) = x2 + 1. 5.1.1. Exact computation. For this very simple system, one is able to compute exactly the optimal minimum time. Denoting T (x) the minimal time to reach the origin from x = (x1 , x2 ), we have: If x1 ≥ 1 − x22 /2 and x2 ≥ −1 then T (x) p = x22 /2 + x1 + x2 + 1. If −x22 /2 sign x2 ≤ then T (x) = 2 x22 /2 + x1 + x2 . If x1 < −x22 /2 sign x2 x1 ≤ 1 − x22 /2 and x2 ≥ −1 p and x2 ≥ −1 then T (x) = 2 x22 /2 − x1 − x2 .

5.1.2. Numerical approximation. Table 1 displays the values of the initial state x0 ∈ X, and denoting inf Qr (x0 ) the optimal value of the LMI-relaxation (3.13) for the minimum time OCP (5.1) with initial state x0 , Tables 2, 3, and 4 display the numerical values of the ratii inf Qr (x0 )/T (x0 ) for r = 2, 3 and 5 respectively. In Figures 1, 2, and 3 one displays the level sets of the ratii inf Qr /T (x0 ) for r = 2, 3 and 5 respectively. The closer to white the color, the closer to 1 the ratio inf Qr /T (x0 ). Finally, for this double integrator example we have plotted in Figure 4 the level sets of the function Λ5 (x) − T (x) where T (x) is the known optimal minimum time 1 GloptiPoly 3 can be downloaded at www.laas.fr/∼henrion/software

OPTIMAL CONTROL AND LMI-RELAXATIONS

15

Table 1. Double integrator: data initial state x0 = (x01 , x02 ) x01 x02

0.0 -1.0

0.2 -0.8

0.4 -0.6

0.6 -0.4

0.8 -0.2

1.0 0.0

1.2 0.2

1.4 0.4

1.6 0.6

1.8 0.8

2.0 1.0

Table 2. Double integrator: ratio inf Q2 /T (x0 ) second LMI-relaxation: r=2 0.4598 0.4534 0.4390 0.4301 0.4212 0.0000 0.4501 0.4878 0.5248 0.5629 0.6001

0.3964 0.3679 0.9722 0.7698 0.4919 0.2238 0.3536 0.4493 0.5142 0.5673 0.6099

0.3512 0.9653 0.8653 0.7057 0.5039 0.3165 0.3950 0.4699 0.5355 0.5842 0.6245

0.9817 0.9347 0.8457 0.7050 0.5422 0.3877 0.4403 0.5025 0.5591 0.6044 0.6470

0.9674 0.9355 0.8518 0.7286 0.5833 0.4476 0.4846 0.5342 0.5840 0.6296 0.6617

0.9634 0.9383 0.8639 0.7542 0.6230 0.5005 0.5276 0.5691 0.6124 0.6465 0.6792

0.9628 0.9385 0.8720 0.7752 0.6613 0.5460 0.5663 0.5981 0.6312 0.6674 0.6972

0.9608 0.9386 0.8848 0.7964 0.6870 0.5839 0.5934 0.6219 0.6544 0.6829 0.7028

0.9600 0.9413 0.8862 0.8085 0.7121 0.6158 0.6204 0.6446 0.6689 0.6906 0.7153

0.9596 0.9432 0.8983 0.8187 0.7329 0.6434 0.6474 0.6647 0.6869 0.7083 0.7279

0.9595 0.9445 0.9015 0.8351 0.7513 0.6671 0.6667 0.6824 0.7005 0.7204 0.7369

0.9984 0.9815 0.9339 0.8605 0.7711 0.6925 0.6972 0.7192 0.7438 0.7662 0.7917

0.9984 0.9829 0.9385 0.8714 0.7891 0.7254 0.7342 0.7347 0.7555 0.7767 0.8012

0.9985 0.9923 0.9862 0.9793 0.9665 0.9544 0.9475 0.9452 0.9567 0.9634 0.9755

0.9984 0.9938 0.9871 0.9776 0.9637 0.9534 0.9580 0.9528 0.9604 0.9733 0.9764

Table 3. Double integrator: ratio inf Q3 /T (x0 ) third LMI-relaxation: r=3 0.5418 0.5115 0.4848 0.4613 0.4359 0.0000 0.4556 0.4978 0.5396 0.5823 0.6255

0.4400 0.3864 0.9793 0.7899 0.5179 0.2458 0.3740 0.4709 0.5395 0.5946 0.6434

0.3630 0.9803 0.8877 0.7321 0.5361 0.3496 0.4242 0.5020 0.5638 0.6190 0.6656

0.9989 0.9648 0.8745 0.7401 0.5772 0.4273 0.4789 0.5393 0.5955 0.6453 0.6903

0.9987 0.9687 0.8847 0.7636 0.6205 0.4979 0.5253 0.5784 0.6314 0.6703 0.7193

0.9987 0.9726 0.8997 0.7915 0.6629 0.5571 0.5767 0.6179 0.6600 0.7019 0.7326

0.9985 0.9756 0.9110 0.8147 0.7013 0.5978 0.6166 0.6477 0.6856 0.7177 0.7543

0.9984 0.9778 0.9208 0.8339 0.7302 0.6409 0.6437 0.6776 0.7089 0.7382 0.7649

0.9983 0.9798 0.9277 0.8484 0.7540 0.6719 0.6807 0.6976 0.7269 0.7539 0.7776

Table 4. Double integrator: ratio inf Q5 /T (x0 ) fifth LMI-relaxation: r=5 0.7550 0.6799 0.6062 0.5368 0.4713 0.0000 0.4742 0.5410 0.6106 0.6864 0.7462

0.5539 0.4354 0.9805 0.8422 0.6417 0.4184 0.5068 0.6003 0.6826 0.7330 0.8032

0.3928 0.9828 0.9314 0.8550 0.7334 0.5962 0.6224 0.6988 0.7416 0.7979 0.8564

0.9995 0.9794 0.9462 0.8911 0.8186 0.7144 0.7239 0.7585 0.8125 0.8588 0.9138

0.9995 0.9896 0.9706 0.9394 0.8622 0.8053 0.7988 0.8236 0.8725 0.9183 0.9394

0.9995 0.9923 0.9836 0.9599 0.9154 0.8825 0.8726 0.8860 0.9241 0.9473 0.9610

0.9994 0.9917 0.9853 0.9684 0.9448 0.9044 0.8860 0.9128 0.9305 0.9481 0.9678

0.9992 0.9919 0.9847 0.9741 0.9501 0.9210 0.9097 0.9257 0.9375 0.9480 0.9678

0.9988 0.9923 0.9848 0.9727 0.9505 0.9320 0.9263 0.9358 0.9507 0.9559 0.9696

´ 16 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

Figure 1. Double integrator: level sets inf Q2 /T (x0 )

Figure 2. Double integrator: level sets inf Q3 /T (x0 )

to steer the initial state x to 0; the problem being time-homogeneous, one may take Λr ∈ R[x] instead of R[t, x]. For instance, one may verify that when the initial state is in the region where the approximation is good, then the whole optimal trajectory also lies in that region. 5.2. The Brockett integrator. Consider the so-called Brockett system in R3 (5.2)

x˙ 1 (t) = x˙ 2 (t) = x˙ 3 (t) =

u1 (t), u2 (t), u1 (t)x2 (t) − u2 (t)x1 (t),

OPTIMAL CONTROL AND LMI-RELAXATIONS

17

Figure 3. Double integrator: level sets inf Q5 /T (x0 ) 2

4.5

4 1.5 3.5

1

3

x02

2.5 0.5 2

0

1.5

1 −0.5 0.5

−1 −2

−1.5

−1

−0.5

0 x

0.5

1

1.5

2

0

01

Figure 4. Double integrator: level sets Λ5 (x) − T (x) and optimal trajectory starting from x1 (0) = x2 (0) = 1. where x = (x1 , x2 , x3 ), and the control u = (u1 (t), u2 (t)) ∈ U, satisfies the constraint (5.3)

u1 (t)2 + u2 (t)2 ≤ 1,

∀t ≥ 0.

In this case, we have X = R3 , K = {(0, 0, 0)}, and U is the closed unit ball of R2 , centered at the origin. Note that set X is not compact and so the convergence result of Theorem 3.6 may not hold, see the discussion at the beginning of example 5.1. Nevertheless, in the numerical examples, we have not enforced additional state constraints.

´ 18 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

5.2.1. Exact computation. Let T (x) be the minimum time needed to steer an initial condition x ∈ R3 to the origin. We recall the following result of [2] (in fact given for equivalent (reachability) OCP of steering the origin to a given point x). Proposition 5.2. Consider the minimum time OCP for the system (5.2) with control constraint (5.3). The minimum time T (x) needed to steer the origin to a point x = (x1 , x2 , x3 ) ∈ R3 is given by p θ x21 + x22 + 2|x3 | , (5.4) T (x1 , x2 , x3 ) = p θ + sin2 θ − sin θ cos θ

where θ = θ(x1 , x2 , x3 ) is the unique solution in [0, π) of (5.5)

θ − sin θ cos θ 2 (x1 + x22 ) = 2|x3 |. sin2 θ

Moreover, the function T is continuous on R3 , and is analytic outside the line x1 = x2 = 0. Remark 5.3. Along the line x1 = x2 = 0, one has p T (0, 0, x3 ) = 2π|x3 |.

The singular set of the function T , i.e. the set where T is not C 1 , is the line x1 = x2 = 0 in R3 . More precisely, the gradients ∂T /∂xi , i = 1, 2, are discontinuous at every point (0, 0, x3 ), x3 6= 0. For the interested reader, the level sets {(x1 , x2 , x3 ) ∈ R3 | T (x1 , x2 , x3 ) = r}, with r > 0, are displayed, e.g., in Prieur and Tr´elat [36]. 5.2.2. Numerical approximation. Recall that the convergence result of Theorem 3.6 is guaranteed for X compact only. However, in the present case X = R3 is not compact. One possibility is to take for X a large ball of R3 centered at the origin because for initial states x0 with norm kx0 k relatively small, the optimal trajectory remains in X. However, in the numerical experiments presented below, we have chosen to maintain X = R3 , that is, the LMI-relaxation Qr does not include any localizing constraint Mr−r(vj ) (vj z(x)) 0 on the moment sequence z(x). Recall that inf Qr ↑ min P as r increases, i.e., the more moments we consider, the closer to the exact value we get. In Table 5 we have displayed the optimal values inf Qr for 16 different values of the initial state x(0) = x0 , in fact, all 16 combinations of x01 = 0, x02 = 0, 1, 2, 3, and x03 = 0, 1, 2, 3. So, the entry (2, 3) of Table 5 for the second LMI-relaxation is inf Q2 for the initial condition x0 = (0, 1, 2). At some (few) places in the table, the ∗ indicates that the SDP solver encountered some numerical problems, which explains why one finds a lower bound inf Qr−1 slightly higher than inf Qr , when practically equal to the exact value T ∗ . Notice that the upper triangular part (i.e., when both first coordinates x02 , x03 of the initial condition are away from zero) displays very good approximations with low order moments. In addition, the further the coordinates from zero, the best. For another set of initial conditions x01 = 1 and x0i = {1, 2, 3} Table 6 displays the results obtained at the LMI-relaxation Q4 . The regularity property of the minimal-time function seems to be an important topic of further investigation.

OPTIMAL CONTROL AND LMI-RELAXATIONS

19

Table 5. Brockett integrator: LMI-relaxations: inf Qr first LMI-relaxation: r=1 0.0000 0.9999 1.9999 2.9999 0.0140 1.0017 2.0010 3.0006 0.0243 1.0032 2.0017 3.0024 0.0295 1.0101 2.0034 3.0040 Second LMI-relaxation: r=2 0.0000 0.9998 1.9997∗ 2.9994∗ 0.2012 1.1199 2.0762 3.0453 0.3738 1.2003 2.1631 3.1304 0.4946 1.3467 2.2417 3.1943 Third LMI-relaxation: r=3 0.0000 0.9995 1.9987∗ 2.9984∗ 0.7665 1.3350 2.1563 3.0530 1.0826 1.7574 2.4172 3.2036 1.3804 2.0398 2.6797 3.4077 Fourth LMI-relaxation: r=4 0.0000 0.9992 1.9977 2.9952 1.2554 1.5925 2.1699 3.0478 1.9962 2.1871 2.5601 3.1977 2.7006 2.7390 2.9894 3.4254 Optimal time T ∗ 0.0000 1.0000 2.0000 3.0000 2.5066 1.7841 2.1735 3.0547 3.5449 2.6831 2.5819 3.2088 4.3416 3.4328 3.0708 3.4392 Table 6. Brockett integrator: inf Q4 with x01 = 1 fourth LMI-relaxation: r=4 1.7979 2.3614 3.2004 2.3691 2.6780 3.3341 2.8875 3.0654 3.5337 Optimal time T ∗ 1.8257 2.3636 3.2091 2.5231 2.6856 3.3426 3.1895 3.1008 3.5456

5.3. Certificate of uncontrollabilty in finite time. Consider the so-called Zermelo problem in R2 studied in Bokanowski et al. [5] (5.6)

x˙ 1 (t) x˙ 2 (t)

= =

1 − a x2 (t) + v cos θ v sin θ

´ 20 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

Figure 5. Zermelo problem: uncontrollable states with Q1 with a = 0.1. The state x is constrained to remain in the set X := [−6, 2]×[−2, 2] ⊂ R2 , and we also have the control constraints 0 ≤ v ≤ 0.44, as well as θ ∈ [0, 2π]. The target K to reach from an initial state x0 is the ball B(0, ρ) with ρ := 0.44. With the change of variable u1 = v cos θ, u2 = v sin θ, and U := {u : u21 + u22 ≤ 2 ρ }, this problem is formulated as a minimum time OCP with associated hierarchy of LMI-relaxations (Qr ) defined in (3.13). Therefore, if an LMI-relaxation Qr at some stage r of the hierarchy is infeasible then the OCP itself is infeasible, i.e., the initial state x0 cannot be steered to the target K while the trajectory remains in X. Figures 5 and 6 display the uncontrollable initial states x0 ∈ X found with the infeasibility of the LMI-relaxations Q1 and Q2 respectively. With Q1 , i.e. by using only second moments, we already have a very good approximation of the controllable set displayed in [5], and Q2 brings only a small additional set of uncontrollable states.

OPTIMAL CONTROL AND LMI-RELAXATIONS

21

Figure 6. Zermelo problem: uncontrollable states with Q2 Appendix 5.4. Proof of Theorem 2.3. We first prove Item (i). Consider the linear program P defined in (2.13), assumed to be feasible. From the constraint L∗ (µ, ν) = δ(0,x0 ) , one has Z Z ∂g ∂g (t, x) + h (t, x), f (t, x, u)i dµ = g(0, x0 ), ∀g ∈ C1 (Σ). g(T, x)dν − ∂t ∂x S K

In particular, taking g(t, x) = 1 and g(t, x) = T − t, it follows that µ(S) = T and ν(K) = 1. Hence, for every (µ, ν) satisfying L∗ (µ, ν) = δ(0,x0 ) , the pair ( T1 µ, ν) belongs to the unit ball B1 of (M(S) × M(K)). From Banach-Alaoglu Theorem, B1 is compact for the weak ⋆ topology, and even sequentially compact because B1 is metrizable (see e.g. Hern´andez-Lerma and Lasserre [20, Lemma 1.3.2]). Since L∗ is continuous (see Remark 2.1), it follows that the set of (µ, ν) satisfying L∗ (µ, ν) = δ(0,x0 ) is a closed subset of B1 ∩(M(S)+ ×M(K)+ ), and thus is compact. Moreover, since the linear program P is feasible, this set is nonempty. Finally, since the linear functional to be minimized is continuous, P is solvable. We next prove Item (ii). Consider the set D := {(L∗ (µ, ν), h(h, H), (µ, ν)i) | (µ, ν) ∈ M(S)+ × M(K)+ }. To prove that D is closed, consider a sequence {(µn , νn )}n∈N of M(S)+ × M(K)+ such that (5.7)

(L∗ (µα , να ), h(h, H), (µα , να )i) → (a, b),

for some (a, b) ∈ C1 (Σ)∗ ×R. It means that L∗ (µn , νn ) → a, and h(h, H), (µn , νn )i → b. In particular, taking ϕ0 := T − t and ϕ1 = 1, there must hold µn (S) = hϕ0 , L∗ (µn , νn )i → hϕ0 , ai,

νn (K) = hϕ1 , L∗ (µn , νn )i → hϕ1 , ai.

Hence, there exist n0 ∈ N and a ball Br of M(S) × M(K), such that (µn , νn ) ∈ Br for every n ≥ n0 . Since Br is compact, up to a subsequence (µn , νn ) converges weakly to some (µ, ν) ∈ M(S)+ × M(K)+ . This fact, combined with (5.7) and the continuity of L∗ , yields a = L∗ (µ, ν), and b = h(h, H), (µ, ν)i. Therefore, the set D

´ 22 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

is closed. From Anderson and Nash [1, Theorems 3.10 and 3.22], it follows that there is no duality gap between P and P∗ , and hence, with (i), sup P∗ = min P. Item (iii) follows from Vinter [44, Theorems 2.1 and 2.3], applied to the mappings F (t, x) := f (t, x, U ) , n

for (t, x) ∈ R × R .

l(t, x, v) := inf { h(t, x, u) | v = f (t, x, u) }, u∈U

5.5. Proof of Theorem 3.6. First of all, observe that Qr has a feasible solution. Indeed, it suffices to consider the sequences y and z consisting of the moments (up to order 2r) of the occupations measures ν u and µu associated with an admissible control u ∈ U of the OCP (2.1)-(2.5). Next, observe that, for every couple (y, z) satisfying all constraints of Qr , there must holds y0 = 1 and z0 = 1. Indeed, it suffices to choose g(t, x) = 1 and g(t, x) = 1 − t (or t) in the constraint Ly (g1 ) − Lz (∂g/∂t + h∇x g, f i) = g(0, x0 ). We next prove that, for r sufficiently large, one has |z(x)α | ≤ 1, |z(u)β | ≤ 1, |z(t)k | ≤ 1, for every k, and |yα | ≤ 1. We only provide the details of the proof for z(x), the arguments being similar for z(u), z(t) and y. Let Σ2 ⊂ R[x] be the space of sums of squares (s.o.s.) polynomials, and let Q ⊂ R[x] be the quadratic modulus generated by the polynomials vj ∈ R[x] that define X, i.e., X Q := { σ ∈ R[x] | σ = σ0 + σj vj with σj ∈ Σ2 , ∀ j ∈ {0} ∪ J}. j∈J

In addition, let Q(t) ⊂ Q be the set of elements σ of Q which have a representation P σ0 + j∈J σj vj for some s.o.s. family {σj } ⊂ Σ2 with deg σ0 ≤ 2t and deg σj vj ≤ 2t for every j ∈ J. Let r ∈ N be fixed. Since X ⊂ [−1, 1]n , there holds 1 ± xα > 0 on X, for every α ∈ Nn with |α| ≤ 2r. Therefore, since X satisfies Putinar’ condition (see Definition 3.4), the polynomial x 7→ 1 ± xα belongs to Q (see Putinar [37]). Moreover, there exists l(r) such that x 7→ 1 ± xα ∈ Q(l(r)) for every |α| ≤ 2r. Of course, x 7→ 1 ± xα ∈ Q(l) for every |α| ≤ 2r, whenever l ≥ l(r). For every feasible solution z of Ql(r) one has |z(x)α | = | Lz (xα ) | ≤ z0 = 1,

|α| ≤ 2r.

This follows from z0 = 1, Ml(r) (z) 0 and Ml(r)−r(vj ) (vj z(x)) 0, which implies α

z0 ± z(x)α = Lz (1 ± x ) = Lz (σ0 ) +

m X

Lz(x) (σj vj ) ≥ 0.

j=1

With similar arguments, one redefines l(r) so that, for every couple (y, z) satisfying the contraints of Ql(r) , one has 0 ≤ zk (t) ≤ 1 and |z(x)α |, |z(u)β |, |yα | ≤ 1,

∀ k, |α|, |β| ≤ 2r.

From this, and with l(r) := 2l(r), we immediately deduce P that |zγ | ≤ 1 whenever P |γ| ≤ 2r. In particular, Ly (H) + Lz (h) ≥ − β |Hβ | − γ |hγ |, which proves that inf Ql(r) > −∞, and so inf Qr > −∞ for r sufficiently large.

OPTIMAL CONTROL AND LMI-RELAXATIONS

23

Let ρ := inf P = min P (by Theorem 2.3), let r ≥ l(r0 ), and let (z r , y r ) be a nearly optimal solution of Qr with value 1 1 ≤ρ+ . (5.8) inf Qr ≤ Lzr (h) + Lyr (H) ≤ inf Qr + r r

Complete the finite vectors y r and z r with zeros to make them infinite sequences. Since for arbitrary s ∈ N one has |yαr |, |zγr | ≤ 1 whenever |α|, |γ| ≤ 2s, provided r is sufficiently large, by a standard diagonal argument, there exists a subsequence {rk } and two infinite sequences y and z, with |yα | ≤ 1 and |zγ | ≤ 1, for all α, γ, and such that (5.9)

lim yαrk = yα

k→∞

∀α ∈ Nn ;

lim zγrk = zγ

k→∞

∀γ ∈ N × Nn × Nm .

Next, let r be fixed arbitrarily. Observe that Mrk (y rk ) 0 implies Mr (y rk ) 0 whenever rk ≥ r, and similarly, Mr (z rk ) 0. Therefore, from (5.9) and Mr (y rk ) 0, we deduce that Mr (y) 0, and similarly Mr (z) 0. Since this holds for arbitrary r, and |yα |, |zγ | ≤ 1 for all α, γ, one infers from Proposition 3.2 that y and z are moment sequences of two measures ν and µ with support contained in [−1, 1]n and [0, 1] × [−1, 1]n × [−1, 1]m respectively. In addition, from the equalities y0rk = 1 and z0rk = 1 for every k, it follows that ν and µ are probability measures on [−1, 1]n , and [0, 1] × [−1, 1]n × [−1, 1]m. Next, let (t, α) ∈ N × Nn be fixed, arbitrarily. From Lyrk (g1 ) − g(0, x0 ) − Lzrk (∂g/∂t + h∇x g, f i) = 0,

with g = (tp xα ),

and the convergence (5.9), we obtain Ly (g1 ) − g(0, x0 ) − Lz (∂g/∂t + h∇x g, f i) = 0,

with g = (tp xα ),

that is, hLg, (µ, ν)i = hg, δ(0,x0 ) i. Since (t, α) ∈ N × Nn is arbitrary, we have hg, L∗ (µ, ν)i = hL g, (µ, ν)i = hg, δ(0,x0 ) i ∀ g ∈ R[t, x], which implies that L∗ (µ, ν) = δ(0,x0 ) . Let z(x), z(u) and z(t) denote the moment vectors of the marginals of µ with respect to the variables x ∈ Rn , u ∈ Rm and t ∈ R, respectively, i.e., Z Z z(x)α = xα µ(d(t, x, u)) ∀ α ∈ Nn , z(u)β = uβ µ(d(t, x, u)) ∀β ∈ Nm ,

R and z(t)k = tk µ(d(t, x, u)) for every k ∈ N. With r fixed arbitrarily, and using again (5.9), we also have Mr (θj y) 0 for every j ∈ JT , and Mr (vj z(x)) 0

∀ j ∈ J,

Mr (wk z(u)) 0 ∀k ∈ W,

Mr (t(1 − t) z(t)) 0.

Since X, K and U satisfy Putinar’s condition (see Definition 3.4), from Theorem 3.5 (Putinar’s Positivstellensatz), y is the moment sequence of a probability measure ν supported on K ⊂ [−1, 1]n . Similarly, z(x) is the moment sequence of a measure µx supported on X ⊂ [−1, 1]n , z(u) is the moment sequence of a measure µu supported on U ⊂ [−1, 1]m , and z(t) is the moment sequence of a measure µt supported on [0, 1]. Since measures on compact sets are moment determinate, it follows that µx , µu , and µt are the marginals of µ with respect to x, u and t respectively. Therefore, µ has its support contained in S, and from L∗ (µ, ν) = δ(0,x0 ) it follows that (µ, ν) satisfies all constraints of the problem P.

´ 24 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

Moreover, one has lim inf Qrk

k→∞

= = =

lim Lzrk (h) + Lyrk (H)

k→∞

Lz (h) + Ly (H) Z Z h dµ + H dν

(by (5.8))

(by (5.9)) ≤ ρ = min P.

Hence, (µ, ν) is an optimal solution of P, and min Qr ↑ min P (the sequence is monotone nondecreasing). Item (i) is proved. Item (ii) follows from Theorem 2.3 (iii). 5.6. Proof of Theorem 4.1. It suffices to prove that vp → v as p → +∞. For every integer p, vp = min Pp is attained for a couple of measures (µp , νp ). As in the proof of Theorem 2.3, the sequence {(µp , νp )}p∈N is bounded in M(S)+ × M(K)+ , and hence, up to a subsequence, it converges to an element (µ, ν) of this space for the weak ⋆ topology. On the one hand, by definition, L∗p (µp , νp ) = δ(0,x0 ) for every p. On the other, L∗p tends strongly to L∗ , and so L∗ (µ, ν) = δ(0,x0 ) . Moreover, since (hp , Hp ) tends strongly to (h, H) in C1 (S) × C1 (K), one has vp = h(µp , νp ), (hp , Hp )i −→ h(µ, ν), (h, H)i, and so v ≤ h(µ, ν), (h, H)i. We next prove that v = h(µ, ν), (h, H)i. Since (µp , νp ) is an optimal solution of Pp , h(µp , νp ), (hp , Hp )i ≤ h(¯ µ, ν¯), (hp , Hp ),

∀(¯ µ, ν¯) | L∗p (¯ µ, ν¯) = δ(0,x0 ) .

Hence, passing to the limit, h(µ, ν), (h, H)i ≤ h(¯ µ, ν¯), (h, H),

∀(¯ µ, ν¯) | L∗ (¯ µ, ν¯) = δ(0,x0 ) ,

and so, (µ, ν) is an optimal solution of P, i.e., v = h(µ, ν), (h, H)i.

Acknowledgments The research of Didier Henrion was partly supported by Project No. 102/06/0652 of the Grant Agency of the Czech Republic and Research Program No. MSM6840770038 of the Ministry of Education of the Czech Republic. References [1] E.J. Anderson, P. Nash. Linear Programming in Infinite-Dimensional Spaces. John Wiley, Chichester, UK, 1987. [2] R. Beals, B. Gaveau, P.C. Greiner. Hamilton-Jacobi theory and the heat kernel on Heisenberg groups. J. Math. Pures Appl. 79 (2000), 633–689. [3] C. Berg. The multidimensional moment problem and semigroups. Proc. Symp. Appl. Math. 37 (1987), 110–124. [4] V. Borkar. Convex analytic methods in Markov decision processes. In: Hanbook of Markov Decision Processes. E.A. Feinberg and A. Shwartz, Eds. Kluwer Academic Publishers, 2002, 377–408. [5] O. Bokanowski, S. Martin, R. Munos, H. Zidani. An anti-diffusive scheme for viability problems. Submitted. [6] R.W. Brockett. Asymptotic stability and feedback stabilization. In: Differential geometric control theory. R.W. Brockett, R.S. Millman and H.J. Sussmann, Eds. Birkh¨ auser, Boston, 1983, pp. 181–191. [7] I. Capuzzo-Dolcetta, P.L. Lions. Hamilton-Jacobi equations with state constraints. Trans. Amer. Math. Soc. 318 (1990), 643–683.

OPTIMAL CONTROL AND LMI-RELAXATIONS

25

[8] Cho, Moon Jung, R.H. Stockbridge. Linear programming formulation for optimal stopping problems. SIAM J. Control Optim. 40 (2002), 1965–1982. [9] C. Coatm´ elec. Approximation et interpolation des fonctions diff´ erentiables de plusieurs variables. Annales Scientifiques de l’Ecole Normale Sup´ erieure, 3` eme s´ erie, 83, 4 (1966), 271–341. [10] D.A. Dawson. Qualitative behavior of geostochastic systems. Stoch. Proc. Appl. 10 (1980), 1–31. [11] W.H. Fleming, D. Vermes. Convex duality approach to the optimal control of diffusions. SIAM J. Control Optim. 27 (1989), 1136–1155. [12] R. Fletcher. Practical methods of optimization. Vol. 1. Unconstrained optimization. John Wiley & Sons, Ltd., Chichester, 1980. [13] P.E. Gill, W. Murray, M.H. Wright. Practical optimization. Academic Press, Inc., LondonNew York, 1981. [14] R.F. Hartl, S.P. Sethi, R.G. Vickson. A survey of the maximum principles for optimal control problems with state constraints. SIAM Rev. 37 (1995), 181–218. [15] K. Helmes, R.H. Stockbridge. Numerical comparison of controls and verification of optimality for stochastic control problems. J. Optim. Theory Appl. 106 (2000), 107–127 [16] K. Helmes, S. R¨ ohl, R.H. Stockbridge. Computing moments of the exit time distribution for Markov processes by linear programming. Oper. Res. 49 (2001), 516–530. [17] D. Henrion, J.B. Lasserre. Solving nonconvex optimization problems. IEEE Control Systems Magazine 24, 2004, pp. 72–83. [18] O. Hern´ andez-Lerma, J.B. Lasserre. Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer Verlag, New York, 1996. [19] O. Hern´ andez-Lerma, J.B. Lasserre. Further Topics in Discrete-Time Markov Control Processes. Springer Verlag, New York, 1999. [20] O. Hern´ andez-Lerma, J.B. Lasserre. Markov Chains and Invariant Probabilities. Birkh¨ auser Verlag, Basel, 2003. [21] O. Hern´ andez-Lerma, J.B. Lasserre. Approximation schemes for infinite linear programs, SIAM J. Optim. 8, 1998, pp. 973–988. [22] O. Hern´ andez-Lerma, J.B. Lasserre. The linear programming approach, In: Hanbook of Markov Decision Processes. E.A. Feinberg and A. Shwartz, Eds. Kluwer Academic Publishers, 2002, pp. 377–408. [23] D. Hernandez-Hernandez, O. Hern´ andez-Lerma, M. Taksar. The linear programming approach to deterministic optimal control problems. Appl. Math. 24, 1996, pp. 17–33. [24] M.W. Hirsch. Differential topology. Graduate Texts in Mathematics 33, Springer-Verlag, 1976. [25] D. Jacobson, M. Lele, J. L. Speyer. New necessary conditions of optimality for control problems with state-variable inequality constraints. J. Math. Analysis Appl. 35 (1971) 255–284. [26] J.L. Krivine. Anneaux pr´ eordonn´ es. J. Anal. Math. 12 (1964), 307–326. [27] T.G. Kurtz, R.H. Stockbridge. Existence of Markov controls and characterization of optimal Markov controls. SIAM J. Control Optim. 36 (1998), 609–653. [28] J.B. Lasserre. Global optimization with polynomials and the problem of moments. SIAM J. Optim. 11, 2001, pp. 796–817. [29] J.B. Lasserre, T. Pri´ eto-Rumeau. SDP vs. LP relaxations for the moment approach in some performance evaluation problems. Stoch. Models 20 (2004), 439–456. [30] J.B. Lasserre, T. Pri´ eto-Rumeau, M. Zervos. Pricing a class of exotic options via moments and SDP relaxations. Math. Finance 16 (2006), 469–494. [31] J.B. Lasserre, C. Prieur, D. Henrion. Nonlinear optimal control: Numerical approximations via moments and LMI relaxations. Proc. 44th IEEE Conference on Decision and Control, Sevilla, Spain, December 2005, pp. 1648–1653. [32] H. Maurer. On optimal control problems with bounded state variables and control appearing linearly. SIAM J. Cont. Optim. 15 3 (1977) 345–362. [33] J. Nash. The imbedding problem for Riemannian manifolds. Ann. Math. 63 (1956), 20–63. [34] H.J. Pesch. A practical guide to the solution of real-life optimal control problems. Control Cybernet. 23, 1-2 (1994), 7–60. [35] L.S. Pontryagin, V.G. Boltyanskij, R.V. Gamkrelidze, E.F. Mishchenko. The Mathematical Theory of Optimal Processes John Wiley & Sons, New York, 1962.. [36] C. Prieur, E. Tr´ elat. Robust optimal stabilization of the Brockett integrator via a hybrid feedback. Math. Control Signals Systems 17, 3 (2005), 201–216.

´ 26 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

[37] M. Putinar. Positive polynomials on compact semi-algebraic sets. Ind. Univ. Math. J. 42, 1993, pp. 969–984. [38] K. Schm¨ udgen. The K-moment problem for compact semi-algebraic sets. Math. Ann. 289 (1991), 203–206. [39] M.H. Soner. Optimal control with state-space constraints. SIAM J. Control Optim. 24 (1986), 552–561. [40] J. Stoer, R. Bulirsch. Introduction to Numerical Analysis. Third Edition, Springer-Verlag, New York, 2002. [41] O. von Stryk, R. Bulirsch. Direct and indirect methods for trajectory optimization. Ann. Oper. Res. 37, 1992, pp. 357–373. [42] E. Tr´ elat. Contrˆ ole optimal: th´ eorie et applications. Vuibert, Paris, 2005. (in french) [43] F.-H. Vasilescu. Spectral measures and moment problems. Spectral Theory and Its Applications, Theta 2003, 173–215. [44] R. Vinter. Convex duality and nonlinear optimal control. SIAM J. Control Optim. 31, 2 (1993), 518–538. LAAS-CNRS and Institute of Mathematics, University of Toulouse, France. E-mail address: [email protected] LAAS-CNRS, University of Toulouse, France and Czech Technical University in Prague, Czech Republic. E-mail address: [email protected] LAAS-CNRS, University of Toulouse, France. E-mail address: [email protected] University of Orl´ eans, France E-mail address: [email protected]

NONLINEAR OPTIMAL CONTROL VIA OCCUPATION MEASURES AND LMI-RELAXATIONS JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, ´ AND EMMANUEL TRELAT Abstract. We consider the class of nonlinear optimal control problems (OCP) with polynomial data, i.e., the differential equation, state and control constraints and cost are all described by polynomials, and more generally for OCPs with smooth data. In addition, state constraints as well as state and/or action constraints are allowed. We provide a simple hierarchy of LMI (linear matrix inequality)-relaxations whose optimal values form a nondecreasing sequence of lower bounds on the optimal value. Under some convexity assumptions, the sequence converges to the optimal value of the OCP. Preliminary results show that good approximations are obtained with few moments.

1. INTRODUCTION Solving a general nonlinear optimal control problem (OCP) is a difficult challenge, despite powerful theoretical tools are available, e.g. the maximum principle and Hamilton-Jacobi-Bellman (HJB) optimality equation. The problem is even more difficult in the presence of state and/or control constraints. State constraints are particularly difficult to handle, and the interested reader is referred to CapuzzoDolcetta and Lions [7] and Soner [39] for a detailed account of HJB theory in the case of state constraints. There exist many numerical methods to compute the solution of a given optimal control problem; for instance, multiple shooting techniques which solve two-point boundary value problems as described, e.g., in [40, 34], or direct methods, as, e.g., in [41, 12, 13], which use, among others, descent or gradientlike algorithms. To deal with optimal control problems with state constraints, some adapted versions of the maximum principle have been developed (see [25, 32], and see [14] for a survey of this theory), but happen to be very hard to implement in general. On the other hand, the OCP can be written as an infinite-dimensional linear program (LP) over two spaces of measures. This is called the weak formulation of the OCP in Vinter [44] (stated in the more general context of differential inclusions). The two unknown measures are the state-action occupation measure (o.m.) up to the final time T , and the state o.m. at time T . The optimal value of the resulting LP always provides a lower bound on the optimal value of the OCP, and under some convexity assumptions, both values coincide; see Vinter [44] and HernandezHernandez et al. [23] as well. Date: February 2, 2008. 1991 Mathematics Subject Classification. 90C22, 93C10, 28A99. Key words and phrases. Nonlinear control; Optimal control; Semidefinite Programming; Measures; Moments. 1

´ 2JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

The dual of the original infinite dimensional LP has an interpretation in terms of ”subsolutions” of related HJB-like optimality conditions, as for the unconstrained case. The only difference with the unconstrained case is the underlying function space involved, which directly incorporate the state constraints. Namely, the functions are only defined on the state constraint set . An interesting feature of this LP approach with o.m.’s is that state constraints, as well as state and/or action constraints are all easy to handle; indeed they simply translate into constraints on the supports of the unknown o.m.’s. It thus provides an alternative to the use of maximum principles with state constraints. However, although this LP approach is valid for any OCP, solving the corresponding (infinitedimensional) LP is difficult in general; however, general LP approximation schemes based on grids have been proposed in e.g. Hernandez and Lasserre [21]. This LP approach has also been used in the context of discrete-time Markov control processes, and is dual to Bellman’s optimality principle. For more details the interested reader is referred to Borkar [4], Hernandez-Lerma and Lasserre [18, 19, 22] and many references therein. For some continuous-time stochastic control problems (e.g., modeled by diffusions) and optimal stopping problems, the LP approach has also been used with success to prove existence of stationary optimal policies; see for instance Cho and Stockbridge [8], Helmes and Stockbridge [15], Helmes et al. [16], Kurtz and Stockbridge [27], and also Fleming and Vermes [11]. In some of these works, the moment approach is also used to approximate the resulting infinitedimensional LP. Contribution. In this paper, we consider the particular class of nonlinear OCP’s with state and/or control constraints, for which all data describing the problem (dynamics, state and control constraints) are polynomials. The approach also extends to the case of problems with smooth data and compact sets, because polynomials are dense in the space of functions considered; this point of view is detailed in §4. In this restricted polynomial framework, the LP approach has interesting additional features that can be exploited for effective numerical computation. Indeed, what makes this LP approach attractive is that for the class of OCPs considered: • Only the moments of the o.m.’s appear in the LP formulation, so that we already end up with countably many variables, a significant progress. • Constraints on the support of the o.m.’s translate easily into either LP or SDP (Semi Definite Programming) necessary constraints on their moments. Even more, for (semi-algebraic) compact supports, relatively recent powerful results from real algebraic geometry make these constraints also sufficient. • When truncating to finitely many moments, the resulting LP or SDP’s are solvable and their optimal values form a monotone nondecreasing sequence (indexed by the number of moments considered) of lower bounds on the optimal value of the LP (and thus of the OCP). Therefore, based on the above observations, we propose an approximation of the optimal value of the OCP via solving a hierarchy of SDPs (or linear matrix inequalities, LMIs)that provides a monotone nondecreasing sequence of lower bounds on the optimal value of the weak LP formulation of the OCP. In adddition, under some compactness assumption of the state and control constraint sets, the sequence of lower bounds is shown to converge to the optimal value of the LP, and thus the optimal value of the OCP when the former and latter are equal.

OPTIMAL CONTROL AND LMI-RELAXATIONS

3

As such, it could be seen as a complement to the above shooting or direct methods, and when the sequence of lower bounds converges to the optimal value, a test of their efficiency. Finally this approach can also be used to provide a certificate of unfeasibility. Indeed, if in the hierarchy of LMI-relaxations of the minimum time OCP, one is infeasible then the OCP itself is infeasible. It turns out that sometimes this certificate is provided at an early stage in the hierarchy, i.e. with very few moments. This is illustrated on two simple examples. In a pioneering paper, Dawson [10] had suggested the use of moments in the LP approach with o.m.’s, but results on the K-moment problem by Schm¨ udgen [38] and Putinar [37] were either not available at that time. Later, Helmes and Stockbridge [15] and Helmes, R¨ ohl and Stockbridge [16] have used LP moment conditions for computing some exit time moments in some diffusion model, whereas for the same models, Lasserre and Prieto-Rumeau [29] have shown that SDP moment conditions are superior in terms of precision and number of moments to consider; in [30], they have extended the moment approach for options pricing problems in some mathematical finance models. More recently, Lasserre, Prieur and Henrion [31] have used the o.m. approach for minimum time OCP without state constraint. Preliminary experimental results on Brockett’s integrator example, and the double integrator show fast convergence with few moments. 2. Occupation measures and the LP approach 2.1. Definition of the optimal control problem. Let n and m be nonzero integers. Consider on Rn the control system (2.1)

x(t) ˙ = f (t, x(t), u(t)),

where f : [0, +∞)×Rn ×Rm −→ Rn is smooth, and where the controls are bounded measurable functions, defined on intervals [0, T (u)] of R+ , and taking their values in a compact subset U of Rm . Let x0 ∈ Rn , and let X and K be compact subsets of Rn . For T > 0, a control u is said admissible on [0, T ] whenever the solution x(·) of (2.1), such that x(0) = x0 , is well defined on [0, T ], and satisfies (2.2)

(x(t), u(t)) ∈ X × U a.e. on [0, T ],

and (2.3)

x(T ) ∈ K.

Denote by UT the set of admissible controls on [0, T ]. For u ∈ UT , the cost of the associated trajectory x(·) is defined by Z T h(t, x(t), u(t))dt + H(x(T )), (2.4) J(0, T, x0 , u) = 0

where h : [0, +∞) × Rn × Rm −→ R and H : Rn → R are smooth functions. Consider the optimal control problem of determining a trajectory solution of (2.1, starting from x(0) = x0 , satisfying the state and control constraints (2.2), the terminal constraint (2.3), and minimizing the cost (2.4). The final time T may be fixed or not. If the final time T is fixed, we set (2.5)

J ∗ (0, T, x0 ) := inf J(0, T, x0 , u), u∈UT

´ 4JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

and if T is free, we set (2.6)

J ∗ (0, x0 ) :=

inf

T >0, u∈UT

J(0, T, x0 , u),

Note that a particular OCP is the minimal time problem from x0 to K, by letting h ≡ 1, H ≡ 0. In this particular case, the minimal time is the first hitting time of the set K. It is possible to associate a stochastic or deterministic OCP with an abstract infinite dimensional linear programming (LP) problem P together with its dual P∗ (see for instance Hern´andez-lerma and Lasserre [18] for discrete-time Markov control problems, and Vinter [44], Hernandez et al. [23] for deterministic optimal control problems, as well as many references therein). We next describe this LP approach in the present context. 2.2. Notations and definitions. For a topological space X , let M(X ) be the Banach space of finite signed Borel measures on X , equipped with the norm of total variation, and denote by M(X )+ its positive cone, that is, the space of finite measures on X . Let C(X ) be the Banach space of bounded continuous functions on X , equipped with the sup-norm. Notice that when X is compact Hausdorff, then M(X ) ≃ C(X )∗ , i.e., M(X ) is the topological dual of C(X ). Let Σ := [0, T ] × X, S := Σ × U, and let C1 (Σ) be the Banach space of functions ϕ ∈ C(Σ) that are continuously differentiable. For ease of exposition we use the same notation g (resp. h) for a polynomial g ∈ R[t, x] (resp. h ∈ R[x]) and its restriction to the compact set Σ (resp. K). Next, with u ∈ U, let A : C1 (Σ) → C(S) be the mapping ∂ϕ (t, x) + hf (t, x, u), ∇x ϕ(t, x)i. ∂t Notice that ∂ϕ/∂t + h∇x ϕ, f i ∈ C(S) for every ϕ ∈ C1 (Σ), because both X and U are compact, and f is understood as its restriction to S. Next, let L : C1 (Σ) → C(S) × C(K) be the linear mapping

(2.7)

ϕ 7→ Aϕ(t, x, u) :=

(2.8)

ϕ 7→ Lϕ := (−Aϕ, ϕT ),

where ϕT (x) := ϕ(T, x), for all x ∈ X. Obviously, L is continuous with respect to the strong topologies of C1 (Σ) and C(S) × C(K). Both (C(S), M(S)) and (C(K), M(K)) form a dual pair of vector spaces, with duality brackets Z hh, µi = h dµ, ∀ (h, µ) ∈ C(S) × M(S),

and

hg, νi = .

Z

g dν,

∀ (g, ν) ∈ C(K) × M(K)

Let L∗ : M (S) × M (K) → C1 (Σ)∗ be the adjoint mapping of L, defined by (2.9)

h(µ, ν), Lϕi = hL∗ (µ, ν), ϕi,

for all ((µ, ν), ϕ) ∈ M (S) × M (K) × C1 (Σ). Remark 2.1. (i) The mapping L∗ is continuous with respect to the weak topologies σ(M(S) × M(K), C(S) × C(K)), and σ(C1 (Σ)∗ , C1 (Σ)).

OPTIMAL CONTROL AND LMI-RELAXATIONS

5

(ii) Since the mapping L is continuous in the strong sense, it is also continuous with respect to the weak topologies σ(C1 (Σ), C1 (Σ)∗ ) and σ(C(S) × C(K), M(S) × M(K)). (iii) In the case of a free terminal time T ≤ T0 , it suffices to redefine L : C1 (Σ) → C(S) × C([0, T0 ] × K) by Lϕ := (−Aϕ, ϕ). The operator L∗ : M (S)×M ([0, T0 ]×K) → C1 (Σ)∗ is still defined by (2.9), for all ((µ, ν), ϕ) ∈ M (S) × M ([0, T0 ] × K) × C1 (Σ). For time-homogeneous free terminal time problems, one only needs functions ϕ of x only, and so Σ = S = X × U and L : C1 (Σ) → C(S) × C(K). 2.3. Occupation measures and primal LP formulation. Let T > 0, and let u = {u(t), 0 ≤ t < T } be a control such that the solution of (2.1), with x(0) = x0 , is well defined on [0, T ]. Define the probability measure ν u on Rn , and the measure µu on [0, T ] × Rn × Rm , by (2.10)

ν u (D)

(2.11)

µu (A × B × C)

:= ID [x(T )], D ∈ Bn , Z IB×C [(x(t), u(t))] dt, := [0,T ]∩A

for all rectangles (A × B × C), with (A, B, C) ∈ A × Bn × Bm , and where Bn (resp. Bm ) denotes the usual Borel σ-algebra associated with Rn (resp. Rm ), and A the Borel σ-algebra of [0, T ], and IB (•) the indicator function of the set B. The measure µu is called the occupation measure of the state-action (deterministic) process (t, x(t), u(t)) up to time T , whereas ν u is the occupation measure of the state x(T ) at time T . Remark 2.2. If the control u is admissible on [0, T ], i.e., if the trajectory x(·) satisfies the constraints (2.2) and (2.3), then ν u is a probability measure supported on K (i.e. ν u ∈ M(K)+ ), and µu is supported on [0, T ]×X×U (i.e. µu ∈ M(S)+ ). In particular, T = µu (S). Conversely, if the support of µu is contained in S = [0, T ] × X × U and if u µ (S) = T , then (x(t), u(t)) ∈ X × U for almost every t ∈ [0, T ]. Indeed, with (2.11), Z T T = IX×U [(x(s), u(s))] ds 0

⇒ IX×U [(x(s), u(s))] = 1 a.e. in [0, T ], and hence (x(t), u(t)) ∈ X × U, for almost every t ∈ [0, T ]. If moreover the support of ν u is contained in K, then x(T ) ∈ K. Therefore, u is an admissible control on [0, T ]. Then, observe that the optimization criterion (2.5) now writes Z Z H dν u + h dµu = h(µu , ν u ), (h, H)i, J(0, T, x0 , u) = K

S

and one infers from (2.1), (2.2) and (2.3) that Z Z ∂g + h∇x g, f i dµu , gT dν u − g(0, x0 ) = (2.12) ∂t S K

for every g ∈ C1 (Σ) (where gT (x) ≡ g(T, x) for every x ∈ K), or equivalently, in view of (2.8) and (2.9), hg, L∗ (µu , ν u )i = hg, δ(0,x0 ) i,

∀g ∈ C1 (Σ).

´ 6JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

This in turn implies that L∗ (µu , ν u ) = δ(0,x0 ) . Therefore, consider the infinite-dimensional linear program P (2.13)

P:

inf

(µ,ν)∈∆

{h(µ, ν), (h, H)i | L∗ (µ, ν) = δ(0,x0 ) }

(where ∆ := M(S)+ × M(K)+ ). Denote by inf P its optimal value and min P is the infimum is attained, in which case P is said to be solvable. The problem P is said feasible if there exists (µ, ν) ∈ ∆ such that L∗ (µ, ν) = δ(0,x0 ) . Note that P is feasible whenever there exists an admissible control. The linear program P is a rephrasing of the OCP (2.1)–(2.5) in terms of the occupation measures of its trajectories (t, x(t), u(t)). Its dual LP reads (2.14)

P∗ :

sup

{hδ(0,x0 ) , ϕi | Lϕ ≤ (h, H)}

ϕ∈C1 (Σ)

where Lϕ ≤ (h, H) ⇔

Aϕ(t, x, u) + h(t, x, u) ϕ(T, x)

≥0 ≤ H(x)

∀(t, x, u) ∈ S . ∀x ∈ K

Denote by sup P∗ its optimal value and max P∗ is the supremum is attained, i.e. if P∗ is solvable. Discrete-time stochastic analogues of the linear programs P and P∗ are also described in Hern´andez-Lerma and Lasserre [18, 19] for discrete time Markov control problems. Similarly see Cho and Stockbridge [8], Kurtz and Stockbridge [27], and Helmes and Stcokbridge [16] for some continuous-time stochastic problems. Theorem 2.3. If P is feasible, then: (i) P is solvable, i.e., inf P = min P ≤ J(0, T, x0 ). (ii) There is no duality gap, i.e., sup P∗ = min P. (iii) If moreover, for every (t, x) ∈ Σ, the set f (t, x, U) ⊂ Rn is convex, and the function v 7→ gt,x (v) := inf { h(t, x, u) : u∈U

v = f (t, x, u)}

is convex, then the OCP (2.1)–(2.5) has an optimal solution and sup P∗ = inf P = min P = J ∗ (0, T, x0 ). For a proof see §5.4. Theorem 2.3(iii) is due to Vinter [44]. 3. Semidefinite programming relaxations of P The linear program P is infinite dimensional, and thus, not tractable as it stands. Therefore, we next present a relaxation scheme that provides a sequence of semidefinite programming, or linear matrix inequality relaxations (in short, LMIrelaxations) {Qr }, each with finitely many constraints and variables. Let R[x] = [x1 , . . . xn ] (resp. R[t, x, u] = R[t, x1 , . . . xn , u1 , . . . , um ]) denote the set of polynomial functions of the variable x (resp., of the variables t, x, u). Assume that X and K (resp., U) are compact semi-algebraic subsets of Rn (resp. of Rm ), of the form (3.1) (3.2)

X := K :=

{x ∈ Rn {x ∈ Rn

| vj (x) ≥ 0, | θj (x) ≥ 0,

(3.3)

U :=

{u ∈ Rm

| wj (u) ≥ 0,

j ∈ J}, j ∈ JT }, j ∈ W },

OPTIMAL CONTROL AND LMI-RELAXATIONS

7

for some finite index sets JT , J and W , where vj , θj and wj are polynomial functions. Define (3.4)

d(X, K, U) :=

max

j∈J1 , l∈J, k∈W

(deg θj , deg vl , deg wk ).

To highlight the main ideas, in this section we assume that f , h and H are polynomial functions, that is, h ∈ R[t, x, u], H ∈ R[x], and f : [0, +∞)×Rn ×Rm → Rn is polynomial, i.e., every component of f satisfies fk ∈ R[t, x, u], for k = 1, . . . , n. 3.1. The underlying idea. Observe the following important facts. The restriction of R[t, x] to Σ belongs to C1 (Σ). Therefore, L∗ (µ, ν) = δ(0,x0 )

⇔

hg, L∗ (µ, ν)i = g(0, x0 ),

∀g ∈ R[t, x],

because Σ being compact, polynomial functions are dense in C1 (Σ) for the supnorm. Indeed, on a compact set, one may simultaneously approximate a function and its (continuous) partial derivatives by a polynomial and its derivatives, uniformly (see [24] pp. 65-66). Hence, the linear program P can be written ( inf {h(µ, ν), (h, H)i (µ,ν)∈∆ P: s.t. hg, L∗ (µ, ν)i = g(0, x0 ), ∀g ∈ R[t, x], or, equivalently, by linearity, ( inf {h(µ, ν), (h, H)i (µ,ν)∈∆ (3.5) P: s.t. hLg, (µ, ν)i = g(0, x0 ),

∀ g = (tp xα ); (p, α) ∈ N × Nn .

The constraints of P, (3.6)

hLg, (µ, ν)i = g(0, x0 ),

∀ g = (tp xα ); (p, α) ∈ N × Nn ,

define countably many linear equality constraints linking the moments of µ and ν, because if g is polynomial then so are ∂g/∂t and ∂g/∂xk , for every k, and h∇x g, f i. And so, Lg is polynomial. The functions h, H being also polynomial, the cost h(µ, ν), (h, H)i of the OCP (2.1)–(2.5) is also a linear combination of the moments of µ and ν. Therefore, the linear program P in (3.5) can be formulated as a LP with countably many variables (the moments of µ and ν), and countably many linear equality constraints. However, it remains to express the fact that the variables should be moments of some measures µ and ν, with support contained in S and K respectively. At this stage, one will make some (weak) additional assumptions on the polynomials that define the compact semi-algebraic sets X, K and U. Under such assumptions, one may then invoke recent results of real algebraic geometry on the representation of polynomials positive on a compact set, and get necessary and sufficient conditions on the variables of P to be indeed moments of two measures µ and ν, with appropriate support. We will use Putinar’s Positivstellensatz [37] described in the next section, which yields SDP constraints on the variables. One might also use other representation results like e.g. Krivine [26] and Vasilescu [43], and obtain linear constraints on the variables (as opposed to SDP constraints). This is the approach taken in e.g. Helmes et al. [16]. However, a comparison of the use of LP-constraints versus SDP constraints on a related problem [29] has dictated our choice of the former.

´ 8JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

Finally, if g in (3.6) runs only over all monomials of degree less than r, one then obtains a corresponding relaxation Qr of P, which is now a finite-dimensional SDP that one may solve with public software packages. At last, one lets r → ∞. 3.2. Notations, definitions and auxiliary results. For a multi-index α = αn 1 (α1 , . . . , αn ) ∈ Nn , and for x = (x1 , . . . , xn ) ∈ Rn , denote xα := xα 1 · · · xn . Consider the canonical basis {xα }α∈Nn (resp., {tp xα uβ }p∈N,α∈Nn,β∈Nm ) of R[x] (resp., of R[t, x, u]). Given two sequences y = {yα }α∈Nn and z = {zγ }γ∈N×Nn×Nm of real numbers, define the linear functional Ly : R[x] → R by X X H(:= Hα xα ) 7→ Ly (H) := Hα y α , α∈Nn

α∈Nn

and similarly, define the linear functional Lz : R[t, x, u] → R by X X h 7→ Lz (h) := hγ z γ = hpαβ zpαβ , γ∈N×Nn ×Nm

P

p∈N,α∈Nn ,β∈Nm

p α β

where h(t, x, u) = p∈N,α∈Nn ,β∈Nm hpαβ t x u . Note that, for a given measure ν (resp., µ) on R (resp., on R × Rn × Rm ), there holds, for every H ∈ R[x] (resp., for every h ∈ R[t, x, u]), Z Z X X Hα xα dν = Hα yα = Ly (H), hν, Hi = Hdν = R R R α where the real numbers yα = x dν are the moments of the measure ν (resp., hµ, hi = Lz (h), where z is the sequence of moments of the measure µ). Definition 3.1. For a given sequence z = {zγ }γ∈N×Nn×Nm of real numbers, the moment matrix Mr (z) of order r associated with z, has its rows and columns indexed in the canonical basis {tp xα uβ }, and is defined by Mr (z)(γ, β) = zγ+β , γ, β ∈ N × Nn × Nm , |γ|, |β| ≤ r, P where |γ| := j γj . The moment matrix Mr (y) of order r associated with a given sequence y = {yα }α∈Nn , has its rows and columns indexed in the canonical basis {xα }, and is defined in a similar fashion.

(3.7)

Note that, if z has a representing measure µ, i.e., R if z is the sequence of moments of the measure µ on R × Rn × Rm , then Lz (h) = hdµ, for every h ∈ R[t, x, u], and if h denotes the vector of coefficients of h ∈ R[t, x, u] of degree less than r, then Z hh, Mr (z)hi = Lz (h2 ) = h2 dµ ≥ 0.

This implies that Mr (z) is symmetric nonnegative (denoted Mr (z) 0), for every r. The same holds for Mr (y). Conversely, not every sequence y that satisfies Mr (y) 0 for every r, has a representing measure. However, several sufficient conditions exist, and in particular the following one, due to Berg [3]. Proposition 3.2. If y = {yα }α∈Nn satisfies |yα | ≤ 1 for every α ∈ Nn , and Mr (y) 0 for every integer r, then y has a representing measure on Rn , with support contained in the unit ball [−1, 1]n . We next present another sufficient condition which is crucial in the proof of our main result.

OPTIMAL CONTROL AND LMI-RELAXATIONS

9

Definition 3.3. For a given polynomial θ ∈ R[t, x, u], written X θδ tp xα uβ , θ(t, x, u) = δ=(p,α,β)

define the localizing matrix Mr (θ z) associated with z, θ, and with rows and columns also indexed in the canonical basis of R[t, x, u], by X (3.8) Mr (θ z)(γ, β) = θδ zδ+γ+β γ, β ∈ N × Nn × Nm , |γ|, |β| ≤ r. δ

The localizing matrix Mr (θ y) associated with a given sequence y = {yα }α∈Nn is defined similarly. Note that, if z has a representing measure µ on R × Rn × Rm with support contained in the level set {(t, x, u) : θ(t, x, u) ≥ 0}, and if h ∈ R[t, x, u] has degree less than r, then Z hh, Mr (θ, z)hi = Lz (θ h2 ) = θh2 dµ ≥ 0. Hence, Mr (θ z) 0, for every r. Let Σ2 ⊂ R[x] be the convex cone generated in R[x] by all squares of polynomials, and let Ω ⊂ Rn be the compact basic semi-algebraic set defined by (3.9)

Ω := {x ∈ Rn

| gj (x) ≥ 0,

j = 1, . . . , m}

for some family {gj }m j=1 ⊂ R[x]. Definition 3.4. The set Ω ⊂ Rn defined by (3.9) satisfies Putinar’s condition if Pm 2 there exists u ∈ R[x] such that u = u0 + j=1 uj gj for some family {uj }m j=0 ⊂ Σ , and the level set {x ∈ Rn | u(x) ≥ 0} is compact. Putinar’s condition is satisfied if e.g. the level set {x : gk (x) ≥ 0} is compact for some k, or if all the gj ’s are linear, in which case Ω is a polytope. In addition, if one knows some M such that kxk ≤ M whenever x ∈ Ω, then it suffices to add the redundant quadratic constraint M 2 − kxk2 ≥ 0 in the definition (3.9) of Ω, and Putinar’s condition is satisfied (take u := M 2 − kxk2 ). Theorem 3.5 (Putinar’s Positivstellensatz [37]). Assume that the set Ω defined by (3.9) satisfies Putinar’s condition. (a) If f ∈ R[x] and f > 0 on Ω, then (3.10)

f = f0 +

m X

fj gj ,

j=1

2 for some family {fj }m j=0 ⊂ Σ . (b) Let y = {yα }α∈Nn be a sequence of real numbers. If

(3.11)

Mr (y) 0 ;

Mr (gj y) 0,

j = 1, . . . , m;

∀ r = 0, 1, . . .

then y has a representing measure with support contained in Ω.

´ 10 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

3.3. LMI-relaxations of P. Consider the linear program P defined by (3.5). Since the semi-algebraic sets X, K and U defined respectively by (3.1), (3.2) and (3.3) are compact, with no loss of generality, we assume (up to a scaling of the variables x, u and t) that T = 1, X, K ⊆ [−1, 1]n and U ⊆ [−1, 1]m . Next, given a sequence z = {zγ } indexed in the basis of R[t, x, u] denote z(t), z(x) and z(u) its marginals with respect to the variables t, x and u, respectively. These sequences are indexed in the canonical basis of R[t], R[x] and R[u] repectively. For instance, writing γ = (k, α, β) ∈ N × Nn × Nn , {z(t)} = {zk,0,0 }k∈N ;

{z(x)} = {z0,α,0 }α∈Nn ;

{z(u)} = {z0,0,β }β∈Nm .

Let r0 be an integer such that 2r0 ≥ max (deg f, deg h, deg H, 2d(X, K, U)), where d(X, K, U) is defined by (3.4). For every r ≥ r0 , consider the LMI-relaxation inf Lz (h) + Ly (H) y,z M r (y), Mr (z) 0 j ∈ J1 M r−deg θj (θj y) 0, (v z(x)) 0, j∈J M j r−deg v j (3.12) Qr : , M (w z(u)) 0, k∈W r−deg w k k Mr−1 (t(1 − t) z(t)) 0 p α L y (g1 ) − Lz (∂g/∂t + h∇x g, f i) = g(0, x0 ), ∀g = (t x ) with p + |α| − 1 + deg f ≤ 2r

whose optimal value is denoted by inf Qr .

OCP with free terminal time. For the OCP (2.6), i.e., with free terminal time T ≤ T0 , we need adapt the notation because T is now a variable. As already mentioned in Remark 2.1(iii), the measure ν in the infinite dimensional linear program P defined in (2.13), is now supported in [0, T0 ] × K (and [0, 1] × K after re-scaling) instead of K previously. Hence, the sequence y associated with ν is now indexed in the basis {tp xα } of R[t, x] instead of {xα } previously. Therefore, given y = {ykα } indexed in that basis, let y(t) and y(x) be the subsequences of y defined by: y(t) := {yk0 }k ,

k ∈ N; ;

y(x) = {y0α },

α ∈ Nn .

Then again (after rescaling), the LMI-relaxation Qr now reads inf Lz (h) + Ly (H) y,z Mr (y), Mr (z) 0 Mr−r(θj ) (θj y) 0, j ∈ J1 Mr−r(vj ) (vj z(x)) 0, j ∈ J . (3.13) Qr : Mr−r(wk ) (wk z(u)) 0, k ∈ W M (t(1 − t) y(t)) 0 r−1 Mr−1 (t(1 − t) z(t)) 0 Ly (g) − Lz (∂g/∂t + h∇x g, f i) = g(0, x0 ), ∀g = (tp xα ) with p + |α| − 1 + deg f ≤ 2r

The particular case of minimal time problem is obtained with h ≡ 1, H ≡ 0. For time-homogeneous problems, i.e., when h and f do not depend on t, one may take µ (resp. ν) supported on X × U (resp. K), which simplifies the associated LMI-relaxation (3.13). The main result is the following.

OPTIMAL CONTROL AND LMI-RELAXATIONS

11

Theorem 3.6. Let X, K ⊂ [−1, 1]n, and U ⊂ [−1, 1]m be compact basic semialgebraic-sets respectively defined by (3.1), (3.2) and (3.3). Assume that X, K and U satisfy Putinar’s condition (see Definition (3.4)), and let Qr be the LMIrelaxation defined in (3.12). Then, (i) inf Qr ↑ min P as r → ∞; (ii) if moreover, for every (t, x) ∈ Σ, the set f (t, x, U) ⊂ Rn is convex, and the function v 7→ gt,x (v) := inf { h(t, x, u) | v = f (t, x, u)} u∈U

is convex, then inf Qr ↑ min P = J ∗ (0, T, x0 ), as r → ∞. The proof of this result is postponed to the Appendix in Section §5.5. 3.4. The dual Q∗r . We describe the dual of the LMI-relaxation Qr which is also a semidefinite program, denoted Q∗r , and relate Q∗r with the dual P∗ of P, defined in (2.14). Let s(r) be the cardinal of the set Vr := {(k, α) ∈ N × Nn | k + |α| ≤ r − r0 }, and given λ ∈ Rs(r) , let Λr ∈ R[t, x] be the polynomial X λkα tk xα . (t, x) 7→ Λr (t, x) := (k,α)∈Vr

Consider the semidefinite program: Λr (0, x0 ), sup y q0 ,qjx ,qk ,l0 ,lj ,Λr P P h + A Λr = q0 t(1 − t) + k∈W qku wk + j∈J qjx vj , P ∗ H − Λr (1, .) = l0 + j∈J1 lj θj , . (3.14) Qr : q0 ∈ R[t], qku ∈ R[u], qjx ∈ R[x], lj ∈ R[x] x u of squares polynomials), and {q0 , qj , qk , l0 , lxj } s.o.s. (sums deg lj θj , deg qj vj , deg qku wk , deg q0 ≤ 2r − 2; deg l0 ≤ 2r.

The LMI Q∗r is a reinforcement of P∗ in the following sense:

• the unknown function ϕ ∈ C1 (Σ) is now replaced with a polynomial Λr ∈ R[t, x] of degree less than 2r; • the constraint −Aϕ ≤ h for (t, x, u) ∈ S, is now replaced with the constraint h + AΛr ≥ 0 on S and the polynomial h + AΛrP≥ 0 which is nonnegative P on S, has Putinar’s representation q0 t(1 − t) + k∈W qku wk + j∈J qjx vj ; • the constraint ϕ1 ≤ H for x ∈ K, is replaced with the constraint H − Λr (1, .) ≥ 0 on K, and the polynomial P H − Λr (1, .) which is nonnegative on K, has Putinar’s representation l0 + j∈J1 lj θj .

Assume that Q∗r is solvable. A natural question is to know whether or not we can use an optimal solution q0 , qjx , qky , l0 , lj , Λr

´ 12 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

of Q∗r to obtain some information on an optimal solution of P. The most natural idea is to look for the zero set in S of the polynomial X X (t, x, u) 7→ q0 t(1 − t) + qku wk + qjx vj . k∈W

j∈J

sup Q∗r

Indeed, under the assumptions of Theorem 3.6, if = inf Qr , then Λr (0, x0 ) ≈ inf Qr ≈ min P = sup P∗ , and so, the polynomial Λr ∈ R[t, x] seems to be a good candidate to approximate a nearly optimal solution ϕ ∈ C1 (Σ) of P∗ . Next, as Q∗r is an approximation of a weak formulation of the HJB optimality equation, one may hope that the zero set in S of the polynomial h + AΛr provides some good information on the possible states x∗ (t) and controls u∗ (t) at time t in an optimal solution of the OCP (2.1)–(2.5). That is, fixing an arbitrary t0 ∈ [0, 1], one may solve the equation X X qku (u) wk (u) + qjx (x) vj (x) = −q0 t0 (1 − t0 ), k∈W

j∈J

and look for solutions (x, u) ∈ X × U. All these issues deserve further investigation beyond the scope of the present paper. However, at least in the minimum time problem for the (state and control constrained) double integrator example considered in §5.1, we already have some numerical support for the above claims. 3.5. Certificates of non controllability. For minimum time OCPs, i.e., with free terminal time T and instantaneous cost h ≡ 1, and H ≡ 0, the LMI-relaxations Qr defined in (3.13) may provide certificates of non controllability. Indeed, if for a given initial state x0 ∈ X, some LMI-relaxation Qr in the hierarchy has no feasible solution, then the initial state x0 cannot be steered to the origin in finite time. In other words, inf Qr = +∞ provides a certificate of uncontrollability of the initial state x0 . It turns out that sometimes such certificates can be provided at cheap cost, i.e., with LMI-relaxations of low order r. This is illustrated on the Zermelo problem in §5.3. Moreover, one may also consider controllability in given finite time T , that is, consider the LMI-relaxations as defined in (3.12) with T fixed, and H ≡ 0, h ≡ 1. Again, if for a given initial state x0 ∈ X, the LMI-relaxation Qr has no feasible solution, the initial state x0 cannot be steered to the origin in less than T units of time. And so, inf Qr = +∞ also provides a certificate of uncontrollability of the initial state x0 . 4. Generalization to smooth optimal control problems In the previous section, we assumed, to highlight the main ideas, that f , h and H were polynomials. In this section, we generalize Theorem 3.6, and simply assume that f , h and H are smooth. Consider the linear program P defined in the previous section ( inf {h(µ, ν), (h, H)i (µ,ν)∈∆ P: s.t. hg, L∗ (µ, ν)i = g(0, x0 ), ∀g ∈ R[t, x]. Since the sets X, K and U, defined previously, are compact, it follows from [9] (see also [24, pp. 65-66]) that f (resp. h, resp. H) is the limit in C1 (S) (resp. C1 (S), resp. C1 (K)) of a sequence of polynomials fp (resp. hp , resp. Hp ) of degree p, as p → +∞.

OPTIMAL CONTROL AND LMI-RELAXATIONS

13

Hence, for every integer p, consider the linear program Pp ( inf {h(µ, ν), (hp , Hp )i (µ,ν)∈∆ Pp : s.t. hg, L∗p (µ, ν)i = g(0, x0 ), ∀g ∈ R[t, x], the smooth analogue of P, where the linear mapping Lp : C1 (Σ) → C(S) × C(K) is defined by Lp ϕ := (−Ap ϕ, ϕT ), and where Ap : C1 (Σ) → C(S) is defined by ∂ϕ (t, x) + hfp (t, x, u), ∇x ϕ(t, x)i. ∂t For every integer r ≥ max(p/2, d(X, K, U)), let Qr,p denote the LMI-relaxation (3.12) associated with the linear program Pp . Recall that from Theorem 3.6, if K, X and U satisfy Putinar’s condition, then inf Qr,p ↑ min Pp as r → +∞; The next result, generalizing Theorem 3.6, shows that it is possible to let p tend to +∞. For convenience, set Ap ϕ(t, x, u) :=

vr,p = inf Qr,p , vp = min Pp , v = min P. Theorem 4.1. Let X, K ⊂ [−1, 1]n , and U ⊂ [−1, 1]m be compact semi-algebraicsets respectively defined by (3.1), (3.2) and (3.3). Assume that X, K and U satisfy Putinar’s condition (see Definition (3.4)). Then, (i) v = lim lim vr,p = lim sup vr,p ≤ J ∗ (0, T, x0 ). p→+∞ r>p/2

p→+∞ r→+∞ 2r>p

(ii) Moreover if for every (t, x) ∈ Σ, the set f (t, x, U) ⊂ Rn is convex, and the function v 7→ gt,x (v) := inf { h(t, x, u) | v = f (t, x, u)} u∈U

∗

is convex, then v = J (0, T, x0 ). The proof of this result is in the Appendix, Section §5.6. From the numerical point of view, depending on the functions f , h, H, the degree of the polynomials of the approximate OCP Pp may be required to be large, and hence the hierarchy of LMI-relaxations (Qr ) in (3.12) might not be efficiently implementable, at least in view of the performances of public SDP solvers available at present. Remark 4.2. The previous construction extends to smooth optimal control problems on Riemannian manifolds, as follows. Let M and N be smooth Riemannian manifolds. Consider on M the control system (2.1), where f : [0, +∞) × M × N −→ T M is smooth, and where the controls are bounded measurable functions, defined on intervals [0, T (u)] of R+ , and taking their values in a compact subset U of N . Let x0 ∈ M , and let X and K be compact subsets of M . Admissible controls are defined as previously. For an admissible control u on [0, T ], the cost of the associated trajectory x(·) is defined by (2.4), where where h : [0, +∞) × M × N −→ R and H : M → R are smooth functions. According to Nash embedding Theorem [33], there exist an integer n (resp. m) such that M (resp. N ) is smoothly isometrically embedded in Rn (resp. Rm ). In this context, all previous results apply.

´ 14 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

This remark is important for the applicability of the method described in this article. Indeed, many practical control problems (in particular, in mechanics) are expressed on manifolds, and since the optimal control problem investigated here is global, they cannot be expressed in general as control systems in Rn (in a global chart). 5. Illustrative examples We consider here the minimal time OCP, that is, we aim to approximate the minimal time to steer a given initial condition to the origin. We have tested the above methodology on two test OCPs, the double and Brockett integrators, for which the associated optimal value T ∗ can be calculated exactly. The numerical examples in this section were processed with our Matlab package GloptiPoly 3 1. 5.1. The double integrator. Consider the double integrator system in R2 (5.1)

x˙ 1 (t) = x˙ 2 (t) =

x2 (t), u(t),

where x = (x1 , x2 ) is the state and the control u = u(t) ∈ U, satisfies the constraint |u(t)| ≤ 1, for all t ≥ 0. In addition, the state is constrained to satisfy x2 (t) ≥ −1, for all t. In this time-homogeneous case, and with the notation of Section 2, we have X = {x ∈ R2 : x2 ≥ −1}, K = {(0, 0)}, and U = [−1, 1]. Remark 5.1. The theorem obviously extends, up to scaling, to the case of arbitrary compact subsets X, K ⊂ Rn and U ⊂ [−1, 1]m . Observe that X is not compact and so the convergence result of Theorem 3.6 may not hold. In fact, we may impose the additional constraint kx(t)k∞ ≤ M for some large M (and modify X accordingly), because for initial states x0 with kx0 k∞ relatively small with respect to M , the optimal trajectory remains in X. However, in the numerical experiments, we have not enforced an additional constraint. We have maintained the original constraint x2 ≥ −1 in the localizing constraint Mr−r(vj ) (vj z(x)) 0, with x 7→ vj (x) = x2 + 1. 5.1.1. Exact computation. For this very simple system, one is able to compute exactly the optimal minimum time. Denoting T (x) the minimal time to reach the origin from x = (x1 , x2 ), we have: If x1 ≥ 1 − x22 /2 and x2 ≥ −1 then T (x) p = x22 /2 + x1 + x2 + 1. If −x22 /2 sign x2 ≤ then T (x) = 2 x22 /2 + x1 + x2 . If x1 < −x22 /2 sign x2 x1 ≤ 1 − x22 /2 and x2 ≥ −1 p and x2 ≥ −1 then T (x) = 2 x22 /2 − x1 − x2 .

5.1.2. Numerical approximation. Table 1 displays the values of the initial state x0 ∈ X, and denoting inf Qr (x0 ) the optimal value of the LMI-relaxation (3.13) for the minimum time OCP (5.1) with initial state x0 , Tables 2, 3, and 4 display the numerical values of the ratii inf Qr (x0 )/T (x0 ) for r = 2, 3 and 5 respectively. In Figures 1, 2, and 3 one displays the level sets of the ratii inf Qr /T (x0 ) for r = 2, 3 and 5 respectively. The closer to white the color, the closer to 1 the ratio inf Qr /T (x0 ). Finally, for this double integrator example we have plotted in Figure 4 the level sets of the function Λ5 (x) − T (x) where T (x) is the known optimal minimum time 1 GloptiPoly 3 can be downloaded at www.laas.fr/∼henrion/software

OPTIMAL CONTROL AND LMI-RELAXATIONS

15

Table 1. Double integrator: data initial state x0 = (x01 , x02 ) x01 x02

0.0 -1.0

0.2 -0.8

0.4 -0.6

0.6 -0.4

0.8 -0.2

1.0 0.0

1.2 0.2

1.4 0.4

1.6 0.6

1.8 0.8

2.0 1.0

Table 2. Double integrator: ratio inf Q2 /T (x0 ) second LMI-relaxation: r=2 0.4598 0.4534 0.4390 0.4301 0.4212 0.0000 0.4501 0.4878 0.5248 0.5629 0.6001

0.3964 0.3679 0.9722 0.7698 0.4919 0.2238 0.3536 0.4493 0.5142 0.5673 0.6099

0.3512 0.9653 0.8653 0.7057 0.5039 0.3165 0.3950 0.4699 0.5355 0.5842 0.6245

0.9817 0.9347 0.8457 0.7050 0.5422 0.3877 0.4403 0.5025 0.5591 0.6044 0.6470

0.9674 0.9355 0.8518 0.7286 0.5833 0.4476 0.4846 0.5342 0.5840 0.6296 0.6617

0.9634 0.9383 0.8639 0.7542 0.6230 0.5005 0.5276 0.5691 0.6124 0.6465 0.6792

0.9628 0.9385 0.8720 0.7752 0.6613 0.5460 0.5663 0.5981 0.6312 0.6674 0.6972

0.9608 0.9386 0.8848 0.7964 0.6870 0.5839 0.5934 0.6219 0.6544 0.6829 0.7028

0.9600 0.9413 0.8862 0.8085 0.7121 0.6158 0.6204 0.6446 0.6689 0.6906 0.7153

0.9596 0.9432 0.8983 0.8187 0.7329 0.6434 0.6474 0.6647 0.6869 0.7083 0.7279

0.9595 0.9445 0.9015 0.8351 0.7513 0.6671 0.6667 0.6824 0.7005 0.7204 0.7369

0.9984 0.9815 0.9339 0.8605 0.7711 0.6925 0.6972 0.7192 0.7438 0.7662 0.7917

0.9984 0.9829 0.9385 0.8714 0.7891 0.7254 0.7342 0.7347 0.7555 0.7767 0.8012

0.9985 0.9923 0.9862 0.9793 0.9665 0.9544 0.9475 0.9452 0.9567 0.9634 0.9755

0.9984 0.9938 0.9871 0.9776 0.9637 0.9534 0.9580 0.9528 0.9604 0.9733 0.9764

Table 3. Double integrator: ratio inf Q3 /T (x0 ) third LMI-relaxation: r=3 0.5418 0.5115 0.4848 0.4613 0.4359 0.0000 0.4556 0.4978 0.5396 0.5823 0.6255

0.4400 0.3864 0.9793 0.7899 0.5179 0.2458 0.3740 0.4709 0.5395 0.5946 0.6434

0.3630 0.9803 0.8877 0.7321 0.5361 0.3496 0.4242 0.5020 0.5638 0.6190 0.6656

0.9989 0.9648 0.8745 0.7401 0.5772 0.4273 0.4789 0.5393 0.5955 0.6453 0.6903

0.9987 0.9687 0.8847 0.7636 0.6205 0.4979 0.5253 0.5784 0.6314 0.6703 0.7193

0.9987 0.9726 0.8997 0.7915 0.6629 0.5571 0.5767 0.6179 0.6600 0.7019 0.7326

0.9985 0.9756 0.9110 0.8147 0.7013 0.5978 0.6166 0.6477 0.6856 0.7177 0.7543

0.9984 0.9778 0.9208 0.8339 0.7302 0.6409 0.6437 0.6776 0.7089 0.7382 0.7649

0.9983 0.9798 0.9277 0.8484 0.7540 0.6719 0.6807 0.6976 0.7269 0.7539 0.7776

Table 4. Double integrator: ratio inf Q5 /T (x0 ) fifth LMI-relaxation: r=5 0.7550 0.6799 0.6062 0.5368 0.4713 0.0000 0.4742 0.5410 0.6106 0.6864 0.7462

0.5539 0.4354 0.9805 0.8422 0.6417 0.4184 0.5068 0.6003 0.6826 0.7330 0.8032

0.3928 0.9828 0.9314 0.8550 0.7334 0.5962 0.6224 0.6988 0.7416 0.7979 0.8564

0.9995 0.9794 0.9462 0.8911 0.8186 0.7144 0.7239 0.7585 0.8125 0.8588 0.9138

0.9995 0.9896 0.9706 0.9394 0.8622 0.8053 0.7988 0.8236 0.8725 0.9183 0.9394

0.9995 0.9923 0.9836 0.9599 0.9154 0.8825 0.8726 0.8860 0.9241 0.9473 0.9610

0.9994 0.9917 0.9853 0.9684 0.9448 0.9044 0.8860 0.9128 0.9305 0.9481 0.9678

0.9992 0.9919 0.9847 0.9741 0.9501 0.9210 0.9097 0.9257 0.9375 0.9480 0.9678

0.9988 0.9923 0.9848 0.9727 0.9505 0.9320 0.9263 0.9358 0.9507 0.9559 0.9696

´ 16 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

Figure 1. Double integrator: level sets inf Q2 /T (x0 )

Figure 2. Double integrator: level sets inf Q3 /T (x0 )

to steer the initial state x to 0; the problem being time-homogeneous, one may take Λr ∈ R[x] instead of R[t, x]. For instance, one may verify that when the initial state is in the region where the approximation is good, then the whole optimal trajectory also lies in that region. 5.2. The Brockett integrator. Consider the so-called Brockett system in R3 (5.2)

x˙ 1 (t) = x˙ 2 (t) = x˙ 3 (t) =

u1 (t), u2 (t), u1 (t)x2 (t) − u2 (t)x1 (t),

OPTIMAL CONTROL AND LMI-RELAXATIONS

17

Figure 3. Double integrator: level sets inf Q5 /T (x0 ) 2

4.5

4 1.5 3.5

1

3

x02

2.5 0.5 2

0

1.5

1 −0.5 0.5

−1 −2

−1.5

−1

−0.5

0 x

0.5

1

1.5

2

0

01

Figure 4. Double integrator: level sets Λ5 (x) − T (x) and optimal trajectory starting from x1 (0) = x2 (0) = 1. where x = (x1 , x2 , x3 ), and the control u = (u1 (t), u2 (t)) ∈ U, satisfies the constraint (5.3)

u1 (t)2 + u2 (t)2 ≤ 1,

∀t ≥ 0.

In this case, we have X = R3 , K = {(0, 0, 0)}, and U is the closed unit ball of R2 , centered at the origin. Note that set X is not compact and so the convergence result of Theorem 3.6 may not hold, see the discussion at the beginning of example 5.1. Nevertheless, in the numerical examples, we have not enforced additional state constraints.

´ 18 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

5.2.1. Exact computation. Let T (x) be the minimum time needed to steer an initial condition x ∈ R3 to the origin. We recall the following result of [2] (in fact given for equivalent (reachability) OCP of steering the origin to a given point x). Proposition 5.2. Consider the minimum time OCP for the system (5.2) with control constraint (5.3). The minimum time T (x) needed to steer the origin to a point x = (x1 , x2 , x3 ) ∈ R3 is given by p θ x21 + x22 + 2|x3 | , (5.4) T (x1 , x2 , x3 ) = p θ + sin2 θ − sin θ cos θ

where θ = θ(x1 , x2 , x3 ) is the unique solution in [0, π) of (5.5)

θ − sin θ cos θ 2 (x1 + x22 ) = 2|x3 |. sin2 θ

Moreover, the function T is continuous on R3 , and is analytic outside the line x1 = x2 = 0. Remark 5.3. Along the line x1 = x2 = 0, one has p T (0, 0, x3 ) = 2π|x3 |.

The singular set of the function T , i.e. the set where T is not C 1 , is the line x1 = x2 = 0 in R3 . More precisely, the gradients ∂T /∂xi , i = 1, 2, are discontinuous at every point (0, 0, x3 ), x3 6= 0. For the interested reader, the level sets {(x1 , x2 , x3 ) ∈ R3 | T (x1 , x2 , x3 ) = r}, with r > 0, are displayed, e.g., in Prieur and Tr´elat [36]. 5.2.2. Numerical approximation. Recall that the convergence result of Theorem 3.6 is guaranteed for X compact only. However, in the present case X = R3 is not compact. One possibility is to take for X a large ball of R3 centered at the origin because for initial states x0 with norm kx0 k relatively small, the optimal trajectory remains in X. However, in the numerical experiments presented below, we have chosen to maintain X = R3 , that is, the LMI-relaxation Qr does not include any localizing constraint Mr−r(vj ) (vj z(x)) 0 on the moment sequence z(x). Recall that inf Qr ↑ min P as r increases, i.e., the more moments we consider, the closer to the exact value we get. In Table 5 we have displayed the optimal values inf Qr for 16 different values of the initial state x(0) = x0 , in fact, all 16 combinations of x01 = 0, x02 = 0, 1, 2, 3, and x03 = 0, 1, 2, 3. So, the entry (2, 3) of Table 5 for the second LMI-relaxation is inf Q2 for the initial condition x0 = (0, 1, 2). At some (few) places in the table, the ∗ indicates that the SDP solver encountered some numerical problems, which explains why one finds a lower bound inf Qr−1 slightly higher than inf Qr , when practically equal to the exact value T ∗ . Notice that the upper triangular part (i.e., when both first coordinates x02 , x03 of the initial condition are away from zero) displays very good approximations with low order moments. In addition, the further the coordinates from zero, the best. For another set of initial conditions x01 = 1 and x0i = {1, 2, 3} Table 6 displays the results obtained at the LMI-relaxation Q4 . The regularity property of the minimal-time function seems to be an important topic of further investigation.

OPTIMAL CONTROL AND LMI-RELAXATIONS

19

Table 5. Brockett integrator: LMI-relaxations: inf Qr first LMI-relaxation: r=1 0.0000 0.9999 1.9999 2.9999 0.0140 1.0017 2.0010 3.0006 0.0243 1.0032 2.0017 3.0024 0.0295 1.0101 2.0034 3.0040 Second LMI-relaxation: r=2 0.0000 0.9998 1.9997∗ 2.9994∗ 0.2012 1.1199 2.0762 3.0453 0.3738 1.2003 2.1631 3.1304 0.4946 1.3467 2.2417 3.1943 Third LMI-relaxation: r=3 0.0000 0.9995 1.9987∗ 2.9984∗ 0.7665 1.3350 2.1563 3.0530 1.0826 1.7574 2.4172 3.2036 1.3804 2.0398 2.6797 3.4077 Fourth LMI-relaxation: r=4 0.0000 0.9992 1.9977 2.9952 1.2554 1.5925 2.1699 3.0478 1.9962 2.1871 2.5601 3.1977 2.7006 2.7390 2.9894 3.4254 Optimal time T ∗ 0.0000 1.0000 2.0000 3.0000 2.5066 1.7841 2.1735 3.0547 3.5449 2.6831 2.5819 3.2088 4.3416 3.4328 3.0708 3.4392 Table 6. Brockett integrator: inf Q4 with x01 = 1 fourth LMI-relaxation: r=4 1.7979 2.3614 3.2004 2.3691 2.6780 3.3341 2.8875 3.0654 3.5337 Optimal time T ∗ 1.8257 2.3636 3.2091 2.5231 2.6856 3.3426 3.1895 3.1008 3.5456

5.3. Certificate of uncontrollabilty in finite time. Consider the so-called Zermelo problem in R2 studied in Bokanowski et al. [5] (5.6)

x˙ 1 (t) x˙ 2 (t)

= =

1 − a x2 (t) + v cos θ v sin θ

´ 20 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

Figure 5. Zermelo problem: uncontrollable states with Q1 with a = 0.1. The state x is constrained to remain in the set X := [−6, 2]×[−2, 2] ⊂ R2 , and we also have the control constraints 0 ≤ v ≤ 0.44, as well as θ ∈ [0, 2π]. The target K to reach from an initial state x0 is the ball B(0, ρ) with ρ := 0.44. With the change of variable u1 = v cos θ, u2 = v sin θ, and U := {u : u21 + u22 ≤ 2 ρ }, this problem is formulated as a minimum time OCP with associated hierarchy of LMI-relaxations (Qr ) defined in (3.13). Therefore, if an LMI-relaxation Qr at some stage r of the hierarchy is infeasible then the OCP itself is infeasible, i.e., the initial state x0 cannot be steered to the target K while the trajectory remains in X. Figures 5 and 6 display the uncontrollable initial states x0 ∈ X found with the infeasibility of the LMI-relaxations Q1 and Q2 respectively. With Q1 , i.e. by using only second moments, we already have a very good approximation of the controllable set displayed in [5], and Q2 brings only a small additional set of uncontrollable states.

OPTIMAL CONTROL AND LMI-RELAXATIONS

21

Figure 6. Zermelo problem: uncontrollable states with Q2 Appendix 5.4. Proof of Theorem 2.3. We first prove Item (i). Consider the linear program P defined in (2.13), assumed to be feasible. From the constraint L∗ (µ, ν) = δ(0,x0 ) , one has Z Z ∂g ∂g (t, x) + h (t, x), f (t, x, u)i dµ = g(0, x0 ), ∀g ∈ C1 (Σ). g(T, x)dν − ∂t ∂x S K

In particular, taking g(t, x) = 1 and g(t, x) = T − t, it follows that µ(S) = T and ν(K) = 1. Hence, for every (µ, ν) satisfying L∗ (µ, ν) = δ(0,x0 ) , the pair ( T1 µ, ν) belongs to the unit ball B1 of (M(S) × M(K)). From Banach-Alaoglu Theorem, B1 is compact for the weak ⋆ topology, and even sequentially compact because B1 is metrizable (see e.g. Hern´andez-Lerma and Lasserre [20, Lemma 1.3.2]). Since L∗ is continuous (see Remark 2.1), it follows that the set of (µ, ν) satisfying L∗ (µ, ν) = δ(0,x0 ) is a closed subset of B1 ∩(M(S)+ ×M(K)+ ), and thus is compact. Moreover, since the linear program P is feasible, this set is nonempty. Finally, since the linear functional to be minimized is continuous, P is solvable. We next prove Item (ii). Consider the set D := {(L∗ (µ, ν), h(h, H), (µ, ν)i) | (µ, ν) ∈ M(S)+ × M(K)+ }. To prove that D is closed, consider a sequence {(µn , νn )}n∈N of M(S)+ × M(K)+ such that (5.7)

(L∗ (µα , να ), h(h, H), (µα , να )i) → (a, b),

for some (a, b) ∈ C1 (Σ)∗ ×R. It means that L∗ (µn , νn ) → a, and h(h, H), (µn , νn )i → b. In particular, taking ϕ0 := T − t and ϕ1 = 1, there must hold µn (S) = hϕ0 , L∗ (µn , νn )i → hϕ0 , ai,

νn (K) = hϕ1 , L∗ (µn , νn )i → hϕ1 , ai.

Hence, there exist n0 ∈ N and a ball Br of M(S) × M(K), such that (µn , νn ) ∈ Br for every n ≥ n0 . Since Br is compact, up to a subsequence (µn , νn ) converges weakly to some (µ, ν) ∈ M(S)+ × M(K)+ . This fact, combined with (5.7) and the continuity of L∗ , yields a = L∗ (µ, ν), and b = h(h, H), (µ, ν)i. Therefore, the set D

´ 22 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

is closed. From Anderson and Nash [1, Theorems 3.10 and 3.22], it follows that there is no duality gap between P and P∗ , and hence, with (i), sup P∗ = min P. Item (iii) follows from Vinter [44, Theorems 2.1 and 2.3], applied to the mappings F (t, x) := f (t, x, U ) , n

for (t, x) ∈ R × R .

l(t, x, v) := inf { h(t, x, u) | v = f (t, x, u) }, u∈U

5.5. Proof of Theorem 3.6. First of all, observe that Qr has a feasible solution. Indeed, it suffices to consider the sequences y and z consisting of the moments (up to order 2r) of the occupations measures ν u and µu associated with an admissible control u ∈ U of the OCP (2.1)-(2.5). Next, observe that, for every couple (y, z) satisfying all constraints of Qr , there must holds y0 = 1 and z0 = 1. Indeed, it suffices to choose g(t, x) = 1 and g(t, x) = 1 − t (or t) in the constraint Ly (g1 ) − Lz (∂g/∂t + h∇x g, f i) = g(0, x0 ). We next prove that, for r sufficiently large, one has |z(x)α | ≤ 1, |z(u)β | ≤ 1, |z(t)k | ≤ 1, for every k, and |yα | ≤ 1. We only provide the details of the proof for z(x), the arguments being similar for z(u), z(t) and y. Let Σ2 ⊂ R[x] be the space of sums of squares (s.o.s.) polynomials, and let Q ⊂ R[x] be the quadratic modulus generated by the polynomials vj ∈ R[x] that define X, i.e., X Q := { σ ∈ R[x] | σ = σ0 + σj vj with σj ∈ Σ2 , ∀ j ∈ {0} ∪ J}. j∈J

In addition, let Q(t) ⊂ Q be the set of elements σ of Q which have a representation P σ0 + j∈J σj vj for some s.o.s. family {σj } ⊂ Σ2 with deg σ0 ≤ 2t and deg σj vj ≤ 2t for every j ∈ J. Let r ∈ N be fixed. Since X ⊂ [−1, 1]n , there holds 1 ± xα > 0 on X, for every α ∈ Nn with |α| ≤ 2r. Therefore, since X satisfies Putinar’ condition (see Definition 3.4), the polynomial x 7→ 1 ± xα belongs to Q (see Putinar [37]). Moreover, there exists l(r) such that x 7→ 1 ± xα ∈ Q(l(r)) for every |α| ≤ 2r. Of course, x 7→ 1 ± xα ∈ Q(l) for every |α| ≤ 2r, whenever l ≥ l(r). For every feasible solution z of Ql(r) one has |z(x)α | = | Lz (xα ) | ≤ z0 = 1,

|α| ≤ 2r.

This follows from z0 = 1, Ml(r) (z) 0 and Ml(r)−r(vj ) (vj z(x)) 0, which implies α

z0 ± z(x)α = Lz (1 ± x ) = Lz (σ0 ) +

m X

Lz(x) (σj vj ) ≥ 0.

j=1

With similar arguments, one redefines l(r) so that, for every couple (y, z) satisfying the contraints of Ql(r) , one has 0 ≤ zk (t) ≤ 1 and |z(x)α |, |z(u)β |, |yα | ≤ 1,

∀ k, |α|, |β| ≤ 2r.

From this, and with l(r) := 2l(r), we immediately deduce P that |zγ | ≤ 1 whenever P |γ| ≤ 2r. In particular, Ly (H) + Lz (h) ≥ − β |Hβ | − γ |hγ |, which proves that inf Ql(r) > −∞, and so inf Qr > −∞ for r sufficiently large.

OPTIMAL CONTROL AND LMI-RELAXATIONS

23

Let ρ := inf P = min P (by Theorem 2.3), let r ≥ l(r0 ), and let (z r , y r ) be a nearly optimal solution of Qr with value 1 1 ≤ρ+ . (5.8) inf Qr ≤ Lzr (h) + Lyr (H) ≤ inf Qr + r r

Complete the finite vectors y r and z r with zeros to make them infinite sequences. Since for arbitrary s ∈ N one has |yαr |, |zγr | ≤ 1 whenever |α|, |γ| ≤ 2s, provided r is sufficiently large, by a standard diagonal argument, there exists a subsequence {rk } and two infinite sequences y and z, with |yα | ≤ 1 and |zγ | ≤ 1, for all α, γ, and such that (5.9)

lim yαrk = yα

k→∞

∀α ∈ Nn ;

lim zγrk = zγ

k→∞

∀γ ∈ N × Nn × Nm .

Next, let r be fixed arbitrarily. Observe that Mrk (y rk ) 0 implies Mr (y rk ) 0 whenever rk ≥ r, and similarly, Mr (z rk ) 0. Therefore, from (5.9) and Mr (y rk ) 0, we deduce that Mr (y) 0, and similarly Mr (z) 0. Since this holds for arbitrary r, and |yα |, |zγ | ≤ 1 for all α, γ, one infers from Proposition 3.2 that y and z are moment sequences of two measures ν and µ with support contained in [−1, 1]n and [0, 1] × [−1, 1]n × [−1, 1]m respectively. In addition, from the equalities y0rk = 1 and z0rk = 1 for every k, it follows that ν and µ are probability measures on [−1, 1]n , and [0, 1] × [−1, 1]n × [−1, 1]m. Next, let (t, α) ∈ N × Nn be fixed, arbitrarily. From Lyrk (g1 ) − g(0, x0 ) − Lzrk (∂g/∂t + h∇x g, f i) = 0,

with g = (tp xα ),

and the convergence (5.9), we obtain Ly (g1 ) − g(0, x0 ) − Lz (∂g/∂t + h∇x g, f i) = 0,

with g = (tp xα ),

that is, hLg, (µ, ν)i = hg, δ(0,x0 ) i. Since (t, α) ∈ N × Nn is arbitrary, we have hg, L∗ (µ, ν)i = hL g, (µ, ν)i = hg, δ(0,x0 ) i ∀ g ∈ R[t, x], which implies that L∗ (µ, ν) = δ(0,x0 ) . Let z(x), z(u) and z(t) denote the moment vectors of the marginals of µ with respect to the variables x ∈ Rn , u ∈ Rm and t ∈ R, respectively, i.e., Z Z z(x)α = xα µ(d(t, x, u)) ∀ α ∈ Nn , z(u)β = uβ µ(d(t, x, u)) ∀β ∈ Nm ,

R and z(t)k = tk µ(d(t, x, u)) for every k ∈ N. With r fixed arbitrarily, and using again (5.9), we also have Mr (θj y) 0 for every j ∈ JT , and Mr (vj z(x)) 0

∀ j ∈ J,

Mr (wk z(u)) 0 ∀k ∈ W,

Mr (t(1 − t) z(t)) 0.

Since X, K and U satisfy Putinar’s condition (see Definition 3.4), from Theorem 3.5 (Putinar’s Positivstellensatz), y is the moment sequence of a probability measure ν supported on K ⊂ [−1, 1]n . Similarly, z(x) is the moment sequence of a measure µx supported on X ⊂ [−1, 1]n , z(u) is the moment sequence of a measure µu supported on U ⊂ [−1, 1]m , and z(t) is the moment sequence of a measure µt supported on [0, 1]. Since measures on compact sets are moment determinate, it follows that µx , µu , and µt are the marginals of µ with respect to x, u and t respectively. Therefore, µ has its support contained in S, and from L∗ (µ, ν) = δ(0,x0 ) it follows that (µ, ν) satisfies all constraints of the problem P.

´ 24 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

Moreover, one has lim inf Qrk

k→∞

= = =

lim Lzrk (h) + Lyrk (H)

k→∞

Lz (h) + Ly (H) Z Z h dµ + H dν

(by (5.8))

(by (5.9)) ≤ ρ = min P.

Hence, (µ, ν) is an optimal solution of P, and min Qr ↑ min P (the sequence is monotone nondecreasing). Item (i) is proved. Item (ii) follows from Theorem 2.3 (iii). 5.6. Proof of Theorem 4.1. It suffices to prove that vp → v as p → +∞. For every integer p, vp = min Pp is attained for a couple of measures (µp , νp ). As in the proof of Theorem 2.3, the sequence {(µp , νp )}p∈N is bounded in M(S)+ × M(K)+ , and hence, up to a subsequence, it converges to an element (µ, ν) of this space for the weak ⋆ topology. On the one hand, by definition, L∗p (µp , νp ) = δ(0,x0 ) for every p. On the other, L∗p tends strongly to L∗ , and so L∗ (µ, ν) = δ(0,x0 ) . Moreover, since (hp , Hp ) tends strongly to (h, H) in C1 (S) × C1 (K), one has vp = h(µp , νp ), (hp , Hp )i −→ h(µ, ν), (h, H)i, and so v ≤ h(µ, ν), (h, H)i. We next prove that v = h(µ, ν), (h, H)i. Since (µp , νp ) is an optimal solution of Pp , h(µp , νp ), (hp , Hp )i ≤ h(¯ µ, ν¯), (hp , Hp ),

∀(¯ µ, ν¯) | L∗p (¯ µ, ν¯) = δ(0,x0 ) .

Hence, passing to the limit, h(µ, ν), (h, H)i ≤ h(¯ µ, ν¯), (h, H),

∀(¯ µ, ν¯) | L∗ (¯ µ, ν¯) = δ(0,x0 ) ,

and so, (µ, ν) is an optimal solution of P, i.e., v = h(µ, ν), (h, H)i.

Acknowledgments The research of Didier Henrion was partly supported by Project No. 102/06/0652 of the Grant Agency of the Czech Republic and Research Program No. MSM6840770038 of the Ministry of Education of the Czech Republic. References [1] E.J. Anderson, P. Nash. Linear Programming in Infinite-Dimensional Spaces. John Wiley, Chichester, UK, 1987. [2] R. Beals, B. Gaveau, P.C. Greiner. Hamilton-Jacobi theory and the heat kernel on Heisenberg groups. J. Math. Pures Appl. 79 (2000), 633–689. [3] C. Berg. The multidimensional moment problem and semigroups. Proc. Symp. Appl. Math. 37 (1987), 110–124. [4] V. Borkar. Convex analytic methods in Markov decision processes. In: Hanbook of Markov Decision Processes. E.A. Feinberg and A. Shwartz, Eds. Kluwer Academic Publishers, 2002, 377–408. [5] O. Bokanowski, S. Martin, R. Munos, H. Zidani. An anti-diffusive scheme for viability problems. Submitted. [6] R.W. Brockett. Asymptotic stability and feedback stabilization. In: Differential geometric control theory. R.W. Brockett, R.S. Millman and H.J. Sussmann, Eds. Birkh¨ auser, Boston, 1983, pp. 181–191. [7] I. Capuzzo-Dolcetta, P.L. Lions. Hamilton-Jacobi equations with state constraints. Trans. Amer. Math. Soc. 318 (1990), 643–683.

OPTIMAL CONTROL AND LMI-RELAXATIONS

25

[8] Cho, Moon Jung, R.H. Stockbridge. Linear programming formulation for optimal stopping problems. SIAM J. Control Optim. 40 (2002), 1965–1982. [9] C. Coatm´ elec. Approximation et interpolation des fonctions diff´ erentiables de plusieurs variables. Annales Scientifiques de l’Ecole Normale Sup´ erieure, 3` eme s´ erie, 83, 4 (1966), 271–341. [10] D.A. Dawson. Qualitative behavior of geostochastic systems. Stoch. Proc. Appl. 10 (1980), 1–31. [11] W.H. Fleming, D. Vermes. Convex duality approach to the optimal control of diffusions. SIAM J. Control Optim. 27 (1989), 1136–1155. [12] R. Fletcher. Practical methods of optimization. Vol. 1. Unconstrained optimization. John Wiley & Sons, Ltd., Chichester, 1980. [13] P.E. Gill, W. Murray, M.H. Wright. Practical optimization. Academic Press, Inc., LondonNew York, 1981. [14] R.F. Hartl, S.P. Sethi, R.G. Vickson. A survey of the maximum principles for optimal control problems with state constraints. SIAM Rev. 37 (1995), 181–218. [15] K. Helmes, R.H. Stockbridge. Numerical comparison of controls and verification of optimality for stochastic control problems. J. Optim. Theory Appl. 106 (2000), 107–127 [16] K. Helmes, S. R¨ ohl, R.H. Stockbridge. Computing moments of the exit time distribution for Markov processes by linear programming. Oper. Res. 49 (2001), 516–530. [17] D. Henrion, J.B. Lasserre. Solving nonconvex optimization problems. IEEE Control Systems Magazine 24, 2004, pp. 72–83. [18] O. Hern´ andez-Lerma, J.B. Lasserre. Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer Verlag, New York, 1996. [19] O. Hern´ andez-Lerma, J.B. Lasserre. Further Topics in Discrete-Time Markov Control Processes. Springer Verlag, New York, 1999. [20] O. Hern´ andez-Lerma, J.B. Lasserre. Markov Chains and Invariant Probabilities. Birkh¨ auser Verlag, Basel, 2003. [21] O. Hern´ andez-Lerma, J.B. Lasserre. Approximation schemes for infinite linear programs, SIAM J. Optim. 8, 1998, pp. 973–988. [22] O. Hern´ andez-Lerma, J.B. Lasserre. The linear programming approach, In: Hanbook of Markov Decision Processes. E.A. Feinberg and A. Shwartz, Eds. Kluwer Academic Publishers, 2002, pp. 377–408. [23] D. Hernandez-Hernandez, O. Hern´ andez-Lerma, M. Taksar. The linear programming approach to deterministic optimal control problems. Appl. Math. 24, 1996, pp. 17–33. [24] M.W. Hirsch. Differential topology. Graduate Texts in Mathematics 33, Springer-Verlag, 1976. [25] D. Jacobson, M. Lele, J. L. Speyer. New necessary conditions of optimality for control problems with state-variable inequality constraints. J. Math. Analysis Appl. 35 (1971) 255–284. [26] J.L. Krivine. Anneaux pr´ eordonn´ es. J. Anal. Math. 12 (1964), 307–326. [27] T.G. Kurtz, R.H. Stockbridge. Existence of Markov controls and characterization of optimal Markov controls. SIAM J. Control Optim. 36 (1998), 609–653. [28] J.B. Lasserre. Global optimization with polynomials and the problem of moments. SIAM J. Optim. 11, 2001, pp. 796–817. [29] J.B. Lasserre, T. Pri´ eto-Rumeau. SDP vs. LP relaxations for the moment approach in some performance evaluation problems. Stoch. Models 20 (2004), 439–456. [30] J.B. Lasserre, T. Pri´ eto-Rumeau, M. Zervos. Pricing a class of exotic options via moments and SDP relaxations. Math. Finance 16 (2006), 469–494. [31] J.B. Lasserre, C. Prieur, D. Henrion. Nonlinear optimal control: Numerical approximations via moments and LMI relaxations. Proc. 44th IEEE Conference on Decision and Control, Sevilla, Spain, December 2005, pp. 1648–1653. [32] H. Maurer. On optimal control problems with bounded state variables and control appearing linearly. SIAM J. Cont. Optim. 15 3 (1977) 345–362. [33] J. Nash. The imbedding problem for Riemannian manifolds. Ann. Math. 63 (1956), 20–63. [34] H.J. Pesch. A practical guide to the solution of real-life optimal control problems. Control Cybernet. 23, 1-2 (1994), 7–60. [35] L.S. Pontryagin, V.G. Boltyanskij, R.V. Gamkrelidze, E.F. Mishchenko. The Mathematical Theory of Optimal Processes John Wiley & Sons, New York, 1962.. [36] C. Prieur, E. Tr´ elat. Robust optimal stabilization of the Brockett integrator via a hybrid feedback. Math. Control Signals Systems 17, 3 (2005), 201–216.

´ 26 JEAN B. LASSERRE, DIDIER HENRION, CHRISTOPHE PRIEUR, AND EMMANUEL TRELAT

[37] M. Putinar. Positive polynomials on compact semi-algebraic sets. Ind. Univ. Math. J. 42, 1993, pp. 969–984. [38] K. Schm¨ udgen. The K-moment problem for compact semi-algebraic sets. Math. Ann. 289 (1991), 203–206. [39] M.H. Soner. Optimal control with state-space constraints. SIAM J. Control Optim. 24 (1986), 552–561. [40] J. Stoer, R. Bulirsch. Introduction to Numerical Analysis. Third Edition, Springer-Verlag, New York, 2002. [41] O. von Stryk, R. Bulirsch. Direct and indirect methods for trajectory optimization. Ann. Oper. Res. 37, 1992, pp. 357–373. [42] E. Tr´ elat. Contrˆ ole optimal: th´ eorie et applications. Vuibert, Paris, 2005. (in french) [43] F.-H. Vasilescu. Spectral measures and moment problems. Spectral Theory and Its Applications, Theta 2003, 173–215. [44] R. Vinter. Convex duality and nonlinear optimal control. SIAM J. Control Optim. 31, 2 (1993), 518–538. LAAS-CNRS and Institute of Mathematics, University of Toulouse, France. E-mail address: [email protected] LAAS-CNRS, University of Toulouse, France and Czech Technical University in Prague, Czech Republic. E-mail address: [email protected] LAAS-CNRS, University of Toulouse, France. E-mail address: [email protected] University of Orl´ eans, France E-mail address: [email protected]