Solving Optimization Problems with Equality Constraints

Basic Concepts

This article will discuss optimization problems of the following form:

\begin{align*} minimize\quad f(x)\\ subject\ to\quad h(x)=0 \end{align*}

where $x\in R^{n}, f:R^{n}\to R, h:R^{n}\to R^{m}, h=[h_{1},...,h_{m}]^{T}, m\le n$ . We assume the function $h$ is continuously differentiable, i.e., $h\in C^{1}$ .

Here are some basic concepts:

Regular Point: For a point $x^{*}$ satisfying the constraints $h_{1}(x^{*})=0,...,h_{m}(x^{*})=0$ , if the gradient vectors $\nabla h_{1}(x^{*}),...,\nabla h_{m}(x^{*})$ are linearly independent, then $x^{*}$ is called a regular point of the constraint.

Tangent Space: The tangent space at point $x^{*}$ on the surface $S={x\in R^{n}:h(x)=0}$ is the set $T(x^{*})=\{ y:Dh(x^{*})y=0\}$ . The tangent space $T(x^{*})$ is the null space of the matrix $Dh(x^{*})$ , i.e., $T(x^{*})=N(Dh(x^{*}))$ .

Normal Space: The normal space at point $x^{*}$ on the surface $S={x\in R^{n}:h(x)=0}$ is the set $N(x^{*})=\{ x\in R^{n}:x=Dh(x^{*})^{T}z,z\in R^{m}\}$ . The normal space $N(x^{*})$ is the range of the matrix $Dh(x^{*})^{T}$ , i.e., $N(x^{*})=R(Dh(x^{*})^{T})$ .

Lagrange Conditions

First, consider an optimization problem with only two decision variables and one equality constraint. Let $h:R^{2}\to R$ be the constraint function. The gradient $\nabla h(x)$ at point $x$ in the domain of the function is orthogonal to the level set of $h(x)$ passing through that point. Choose a point $x^{*}=[x^{*}_{1},x^{*}_{1}]^{T}$ such that $h(x^{*})=0$ and $\nabla h(x^{*})\neq 0$ . The level set passing through point $x^{*}$ is the set $\{ x:h(x)=0\}$ . We can parameterize the curve $x(t)$ in the neighborhood of $x^{*}$ , where $x(t)$ is a continuously differentiable vector function $h:R\to R^{2}$ :

\begin{align*} x(t)=[x_{1}(t),x_{1}(t)]^{T},t\in (a,b),x^{*}=x(t^{*}),\dot{x}(t^{*})\neq 0,t^{*}\in (a,b) \end{align*}

Next, it can be proven that $\nabla h(x^{*})$ is orthogonal to $\dot{x}(t^{*})$ . Since $h$ is constant 0 on the curve $\{x(t):t\in (a,b)\}$ , for all $t\in (a,b)$ we have

h(x(t))=0

Thus, for any $t\in(a,b)$ , we have

\frac{d}{dt}h(x(t))=0

Using the chain rule, we get

\frac{d}{dt}h(x(t))=\nabla h(x(t))^{T}\dot{x}(t)=0

Therefore, $\nabla h(x^{*})$ and $\dot{x}(t^{*})$ are orthogonal. When $x^{*}$ is a local minimum of $f:R\to R^{2}$ subject to $h(x)=0$ , it can be proven that $\nabla f(x^{*})$ is orthogonal to $\dot{x}(t^{*})$ . Construct the composite function with respect to $t$ :

\phi(t)=f(x(t))

When $t=t^{*}$ , it reaches a minimum. According to the first-order necessary condition for unconstrained extrema, we have

\frac{d\phi}{dt}(t^{*})=0

Using the chain rule, we get

\frac{d}{dt}\phi(t^{*})=\nabla f(x(t^{*}))^{T}\dot{x}(t^{*})=\nabla f(x^{*})^{T}\dot{x}(t^{*})=0

Therefore, $\nabla f(x^{*})$ and $\dot{x}(t^{*})$ are orthogonal. Since $\nabla f(x^{*})$ and $\dot{x}(t^{*})$ are orthogonal, the vectors $\nabla f(x^{*})$ and $\nabla h(x^{*})$ are parallel. Thus, we obtain the Lagrange theorem for this case:

Lagrange Theorem for n=2, m=3: Let $x^{*}$ be a local minimum of the function $f:R^{2}\to R$ with the constraint $h(x)=0, h:R^{2}\to R$ . Then $\nabla f(x^{*})$ and $\nabla h(x^{*})$ are parallel, i.e., if $\nabla h(x^{*})\neq 0$ , there exists a scalar $\lambda^{*}$ such that

\nabla f(x^{*})+\lambda^{*}\nabla h(x^{*})=0

where $\lambda^{*}$ is the Lagrange multiplier. Extending this theorem to the general case, i.e., $f:R^{n}\to R, h:R^{n}\to R^{m}, m\le n$ , we get:

Lagrange Theorem: $x^{*}$ is a local minimum (or maximum) of $f:R^{n}\to R$ with the constraint $h(x)=0, h:R^{n}\to R^{m}, m\le n$ . If $x^{*}$ is a regular point, then there exists $\lambda^{*}\in R^{m}$ such that

D f(x^{*})+\lambda^{*T}D h(x^{*})=0

Second-Order Conditions

Second-Order Necessary Condition: Let $x^{*}$ be a local minimum of $f:R^{n}\to R$ with the constraint $h(x)=0, h:R^{n}\to R^{m}, m\le n, f, h\in C^{2}$ . If $x^{*}$ is a regular point, then there exists $\lambda^{*}\in R^{m}$ such that

$D f(x^{*})+\lambda^{*T}D h(x^{*})=0^{T}$
For all $y\in T(x^{*})$ , we have $y^{T}L(x^{*},\lambda^{*})y\ge 0$

Second-Order Sufficient Condition: If functions $f, h\in C^{2}$ , and there exists a point $x^{*}\in R^{n}$ and $\lambda^{*}\in R^{m}$ such that

$D f(x^{*})+\lambda^{*T}D h(x^{*})=0^{T}$
For all $y\in T(x^{*})$ , we have $y^{T}L(x^{*},\lambda^{*})y> 0$

Then $x^{*}$ is a strict local minimum of $f$ under the constraint $h(x)=0$ .

This article introduced the Lagrange multiplier method for equality constraints. In the future, we will also introduce the Lagrange multiplier method for inequality constraints and the KKT conditions, to be continued…