Solving Optimization Problems with Equality Constraints

Language:中文/EN

Basic Concepts

This article will discuss optimization problems of the following form:

minimizef(x)subject toh(x)=0\begin{align*} minimize\quad f(x)\\ subject\ to\quad h(x)=0 \end{align*}

where xRn,f:RnR,h:RnRm,h=[h1,...,hm]T,mnx\in R^{n}, f:R^{n}\to R, h:R^{n}\to R^{m}, h=[h_{1},...,h_{m}]^{T}, m\le n. We assume the function hh is continuously differentiable, i.e., hC1h\in C^{1}.

Here are some basic concepts:

Regular Point: For a point xx^{*} satisfying the constraints h1(x)=0,...,hm(x)=0h_{1}(x^{*})=0,...,h_{m}(x^{*})=0, if the gradient vectors h1(x),...,hm(x)\nabla h_{1}(x^{*}),...,\nabla h_{m}(x^{*}) are linearly independent, then xx^{*} is called a regular point of the constraint.

Tangent Space: The tangent space at point xx^{*} on the surface S=xRn:h(x)=0S={x\in R^{n}:h(x)=0} is the set T(x)={y:Dh(x)y=0}T(x^{*})=\{ y:Dh(x^{*})y=0\} . The tangent space T(x)T(x^{*}) is the null space of the matrix Dh(x)Dh(x^{*}), i.e., T(x)=N(Dh(x))T(x^{*})=N(Dh(x^{*})).

Normal Space: The normal space at point xx^{*} on the surface S=xRn:h(x)=0S={x\in R^{n}:h(x)=0} is the set N(x)={xRn:x=Dh(x)Tz,zRm}N(x^{*})=\{ x\in R^{n}:x=Dh(x^{*})^{T}z,z\in R^{m}\} . The normal space N(x)N(x^{*}) is the range of the matrix Dh(x)TDh(x^{*})^{T}, i.e., N(x)=R(Dh(x)T)N(x^{*})=R(Dh(x^{*})^{T}).

Lagrange Conditions

First, consider an optimization problem with only two decision variables and one equality constraint. Let h:R2Rh:R^{2}\to R be the constraint function. The gradient h(x)\nabla h(x) at point xx in the domain of the function is orthogonal to the level set of h(x)h(x) passing through that point. Choose a point x=[x1,x1]Tx^{*}=[x^{*}_{1},x^{*}_{1}]^{T} such that h(x)=0h(x^{*})=0 and h(x)0\nabla h(x^{*})\neq 0. The level set passing through point xx^{*} is the set {x:h(x)=0}\{ x:h(x)=0\}. We can parameterize the curve x(t)x(t) in the neighborhood of xx^{*}, where x(t)x(t) is a continuously differentiable vector function h:RR2h:R\to R^{2}:

x(t)=[x1(t),x1(t)]T,t(a,b),x=x(t),x˙(t)0,t(a,b)\begin{align*} x(t)=[x_{1}(t),x_{1}(t)]^{T},t\in (a,b),x^{*}=x(t^{*}),\dot{x}(t^{*})\neq 0,t^{*}\in (a,b) \end{align*}

Next, it can be proven that h(x)\nabla h(x^{*}) is orthogonal to x˙(t)\dot{x}(t^{*}). Since hh is constant 0 on the curve {x(t):t(a,b)}\{x(t):t\in (a,b)\}, for all t(a,b)t\in (a,b) we have

h(x(t))=0h(x(t))=0

Thus, for any t(a,b)t\in(a,b), we have

ddth(x(t))=0\frac{d}{dt}h(x(t))=0

Using the chain rule, we get

ddth(x(t))=h(x(t))Tx˙(t)=0\frac{d}{dt}h(x(t))=\nabla h(x(t))^{T}\dot{x}(t)=0

Therefore, h(x)\nabla h(x^{*}) and x˙(t)\dot{x}(t^{*}) are orthogonal. When xx^{*} is a local minimum of f:RR2f:R\to R^{2} subject to h(x)=0h(x)=0, it can be proven that f(x)\nabla f(x^{*}) is orthogonal to x˙(t)\dot{x}(t^{*}). Construct the composite function with respect to tt:

ϕ(t)=f(x(t))\phi(t)=f(x(t))

When t=tt=t^{*}, it reaches a minimum. According to the first-order necessary condition for unconstrained extrema, we have

dϕdt(t)=0\frac{d\phi}{dt}(t^{*})=0

Using the chain rule, we get

ddtϕ(t)=f(x(t))Tx˙(t)=f(x)Tx˙(t)=0\frac{d}{dt}\phi(t^{*})=\nabla f(x(t^{*}))^{T}\dot{x}(t^{*})=\nabla f(x^{*})^{T}\dot{x}(t^{*})=0

Therefore, f(x)\nabla f(x^{*}) and x˙(t)\dot{x}(t^{*}) are orthogonal. Since f(x)\nabla f(x^{*}) and x˙(t)\dot{x}(t^{*}) are orthogonal, the vectors f(x)\nabla f(x^{*}) and h(x)\nabla h(x^{*}) are parallel. Thus, we obtain the Lagrange theorem for this case:

Lagrange Theorem for n=2, m=3: Let xx^{*} be a local minimum of the function f:R2Rf:R^{2}\to R with the constraint h(x)=0,h:R2Rh(x)=0, h:R^{2}\to R. Then f(x)\nabla f(x^{*}) and h(x)\nabla h(x^{*}) are parallel, i.e., if h(x)0\nabla h(x^{*})\neq 0, there exists a scalar λ\lambda^{*} such that

f(x)+λh(x)=0\nabla f(x^{*})+\lambda^{*}\nabla h(x^{*})=0

where λ\lambda^{*} is the Lagrange multiplier. Extending this theorem to the general case, i.e., f:RnR,h:RnRm,mnf:R^{n}\to R, h:R^{n}\to R^{m}, m\le n, we get:

Lagrange Theorem: xx^{*} is a local minimum (or maximum) of f:RnRf:R^{n}\to R with the constraint h(x)=0,h:RnRm,mnh(x)=0, h:R^{n}\to R^{m}, m\le n. If xx^{*} is a regular point, then there exists λRm\lambda^{*}\in R^{m} such that

Df(x)+λTDh(x)=0D f(x^{*})+\lambda^{*T}D h(x^{*})=0

Second-Order Conditions

Second-Order Necessary Condition: Let xx^{*} be a local minimum of f:RnRf:R^{n}\to R with the constraint h(x)=0,h:RnRm,mn,f,hC2h(x)=0, h:R^{n}\to R^{m}, m\le n, f, h\in C^{2}. If xx^{*} is a regular point, then there exists λRm\lambda^{*}\in R^{m} such that

  1. Df(x)+λTDh(x)=0TD f(x^{*})+\lambda^{*T}D h(x^{*})=0^{T}
  2. For all yT(x)y\in T(x^{*}), we have yTL(x,λ)y0y^{T}L(x^{*},\lambda^{*})y\ge 0

Second-Order Sufficient Condition: If functions f,hC2f, h\in C^{2}, and there exists a point xRnx^{*}\in R^{n} and λRm\lambda^{*}\in R^{m} such that

  1. Df(x)+λTDh(x)=0TD f(x^{*})+\lambda^{*T}D h(x^{*})=0^{T}
  2. For all yT(x)y\in T(x^{*}), we have yTL(x,λ)y>0y^{T}L(x^{*},\lambda^{*})y> 0

Then xx^{*} is a strict local minimum of ff under the constraint h(x)=0h(x)=0.

This article introduced the Lagrange multiplier method for equality constraints. In the future, we will also introduce the Lagrange multiplier method for inequality constraints and the KKT conditions, to be continued…