ECSE 507 Lecture Notes

These are notes used for lecturing ECSE 507 during the Winter 2024 semester.
Please note that these notes constantly evolve and beware of typos/errors.
These notes are heavily influenced by Boyd and Vandenberghe’s text Convex Optimization.
I appreciate being informed about typos/errors in these notes.

Click each subject to unfold.

Introduction

General Problem

Interested in solving and studying minimization problems subject to constraints:

$\begin{cases} \text{minimize} & f_0(x)\\ \text{subject to} & x \in D \end{cases},$

where

$x$ are problem parameters;
$f_0:\mathbb{R}^n \to \mathbb{R}$ is the objective function and $f_0(x)$ usually interpreted as cost for choosing $x$ ;
$D \subset\mathbb{R}^n$ is a constraint set often described “geometrically”.

Example 1. Design Problem Interpretation

Let

$x,y$ =scalar-valued design variables
(e.g., dimensions of manufactured object, yaw and pitch of jet).
$f_0(x,y)$ = penalty for choosing design $(x,y)$
(e.g., cost in material, energy, time, deviation from desired path).
$D$ = design specifications
(i.e., allowable/possible values for $(x,y)$ ).
E.g., $D = \{ (x,y) : a<x<b, c<y<d\}$ specifies minimum and maximum design values.

Then the problem is to find optimal design values $(x,y)$ which minimize cost $f_0$ and satisfy the design specifications $(x,y) \in D$ .

Example 2. Minimize function over ellipse

Let

: $f_0(x,y) = 1 - \frac{\left(x^{2}+y^{2}\right)}{2}$; $D = \{ (x,y) \in \mathbb{R}^2 : x^2 + 2 y^2 \leq 1 \}$ .

Then the problem

$\begin{cases} \text{minimize} & 1-\frac{\left(x^{2}+y^{2}\right)}{2}\\ \text{subject to} & x^2 + 2 y^2 \leq 1 \end{cases}$

is a familiar kind of purely geometric optimization problem.
It has two solutions: $(x,y,f_0(x,y)) = (\pm 1, 0 , \frac{1}{2})$ .
Many applied problems can look like this, but with an applied interpretation.

Problems With Structure

General optimization problems can be numerically inefficient to solve or analytically difficult, unless $f_0$ and $D$ have additional structure/properties.
Identifying nice structure/properties of problem $\implies$ problem may become analytically solvable or numerically efficient.

Examples of nice structure:

linearity
$f_0(x) = c^T x$
$D$ defined in terms of linear equalities/inequalities;
e.g., $a_1^T x \geq 0, \ldots, a_m^T x \geq 0$ , or succinctly written $A x \succeq 0$ .

convexity or quasiconvexity
E.g., convex if $f_0(tx+(1-t)y) \leq tf_0(x) + (1-t)f_0(y)$
$0<t<1$
$D$ is itself a convex set.

sparsity or other matrix structure
E.g., $f_0(x) = x^T C x$ and $D$ given by $Ax \succeq 0$
If $C,A$ sparse
$\implies$ lots of cancellation
$\implies$ improve computational efficiency

$\begin{bmatrix} 1 & 0 & 1 & 0 & 0 & 1\\ 0 & 1 & 0 & 0 & 0 & 0\\ 1 & 0 & 0 & 1 & 0 & 1\\ 0 & 1 & 0 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 & 1 & 0 \end{bmatrix} ,\qquad \begin{bmatrix} 1&2&0&0\\ 2&1&0&0\\ 0&0&2&1\\ 0&0&1&2 \end{bmatrix}$

Linear Programming

Simplest case: $f_0$ is linear and $D$ defined in terms of linear constraints.
Notation: if $x, y \in \mathbb{R}^n$ , then

$x \succeq y$ means $x_i \geq y_i$ for $i =1,\ldots, n$ .

$c^T$ = transpose of the column vector $c \in \mathbb{R}^n$ .
Linear program: given $c \in \mathbb{R}^n$ , $b \in \mathbb{R}^m$ , $A \in \mathbb{R}^{m \times n}$ , solve

$\begin{cases} \text{minimize}&c^T x\\ \text{subject to}& Ax \succeq b\\ &x \succeq 0 \end{cases}.$

Thus $f_0(x) = c^T x$ and $D = \{x \in \mathbb{R}^n: Ax \succeq b, x \succeq 0\}$ .

Positives of linear programming:

Conceptually simple: relies heavily on linear algebra
There are classical numerical methods which are often very efficient.
If $x^\star \in D$ is local minimizer of $f_0$ on $D$ , then it is automatically a global minimizer on $D$ .
Can sometimes approximate smooth problems linearly; however, usually can only give “local” results.
(E.g., $\sin(x) \sim x$ for $0 \leq x \ll 1$ .)

Shortcomings of linear programming:

Many applied problems are not linear.
Many problems may not even be (suitably) approximated by linear programs.
E.g., the “barrier”
$I_{-}(x) = \begin{cases} 0 & x <0\\ \infty & x\geq0 \end{cases}$
is better approximated by the a “logarithmic barrier” of the form $- c \log(-x)$ than any linear function.

Convex Optimization

Convex optimization problem: $f_0$ and $D$ are convex.
This is the main focus of the course.

Positives of Convex Optimization:

Relatively conceptually simple.
Still often have efficient, albeit more sophisticated, numerical methods.
Many applied problems may be recast as or approximated by convex optimization problems.
If $x^\star \in D$ is local minimizer of $f_0$ on $D$ , then it is automatically a global minimizer on $D$ .

Shortcomings of Convex Optimization:

Problems may seriously fail to be analytically tractable or numerically efficient.

Exists nonlinear problems which cannot be approximated by convex problems.

Example (Least Squares)

A standard and ubiquitous kind of convex optimization problem is the least squares problem.
This problem takes the form:

$\begin{cases} \text{minimize} & \Vert Ax-b \Vert^2\\ \text{subject to}& f_i(x) \leq 0, i=1,\ldots,m\\ &h_i(x) =0, i=1,\ldots,p \end{cases},$

where

$\Vert \cdot \Vert$ is some norm
$A \in \mathbb{R}^{m \times n}$ a matrix
$b \in \mathbb{R}^m$ a fixed vector
$f_0(x) = \Vert A x-b \Vert^2$ is the (convex) objective function
$f_i,h_i$ are convex.

N.B.: will come back to this problem.

Example: Distance from points to ellipse

If $\Vert x \Vert^2 = x_1^2 + \cdots + x_n^2$ , $A$ is the identity matrix, and $D$ is an ellipsoid, then the solution is the point in the ellipsoid closest to the point $b$ .
The image below depicts this situation with

$b = \begin{bmatrix}0\\2\end{bmatrix}$ and $D = \{(x,y): (x-2)^2 + 2(y-2)^2 \leq 1 \}$ .

Here, the optimal solution is $(x,y) = (2-\sqrt{2},2)$ .

Optimal Control

Let

: $x(t),u(t),a(t),f(t)$ be functions $\mathbb{R} \to \mathbb{R}$ .

Assume $x(t)$ evolves by

$\begin{aligned} \dot x(t) &= a(t) x(t) + f(t) u(t)\\ \end{aligned}$

Here,

$\dot x(t)$ is the time derivative of $x(t)$ .
$a(t),f(t)$ are assumed to be given;
we think of $x(t)$ as being the state of some system at time $t$ ;
we think of $u(t)$ as an input we are allowed to choose to dictate the evolution $x(t)$ of $x(0)$ ; i.e., $u(t)$ “controls” the system;
When $u(t)=u(x(t),t)$ , the system experiences “feedback.”
the goal: choose control law $u(t) = u(x(t),t)$ so that $x(t)$ is as “desirable” as possible.

Example: Optimal Control Problems

Optimal control problem: choose the “best” control $u(t)$ which gives the “most” desirable $x(t)$ .
Typically “best” and “desirable” are determined by size/cost of $u(t)$ and $x(t)$ ; e.g., one may wish to minimize

$\int u(t)^2 + x(t)^2 dt.$

Therefore: problem is to solve (roughly speaking)

$\begin{cases} \text{minimize} & \int u(t)^2 + x(t)^2 dt\\ \text{subject to} & x'(t) = a(t) x(t) + f(t) u(t)\\ \end{cases}$

This will be another focus of the course and we will see some optimal control problems can be recast as convex optimization problems.

Rough Outline of Course

: Part 1: Basics of Convexity and Convex Optimization Problems.; Part 2: Applications of Convex Optimization Problems.; Part 3: Algorithms for Solving Convex Optimization Problems.; Part 4: Topics in Optimal Control.

Convex Geometry

Convex Sets

Convex set: a subset $X \subset \mathbb{R}^n$ satisfying:

for all $x,y \in X$ and $t \in [0,1]$ , there holds $tx + (1-t) y \in X$ .

I.e., $X$ contains all line segments whose endpoints belong to $X$ .

Examples: some standard convex sets.

Closed or open polytopes in $\mathbb{R}^n$ .
E.g., the interior of a tetrahedron.
Euclidean balls, ellipsoids.
Linear subspaces and affine spaces (e.g., lines, planes).
Given a norm on , the -ball
$\{ x \in \mathbb{R}^n : \Vert x - x_{0} \Vert \leq r \}$
with center and radius is a convex set.
Recall: a norm satisfies
- $\Vert x+ y \Vert \leq \Vert x \Vert + \Vert y \Vert$ for all vectors $x,y$ ;
- $\Vert c x \Vert = |c| \Vert x \Vert$ for all vectors $x$ and scalars $c$ ;
- $\Vert x \Vert=0$ iff $x =0$ .

Affine Subsets

Affine subset: A subset $X \subset \mathbb{R}^n$ satisfying:

For all $x,y \in X$ and $t \in \mathbb{R}$ , there holds $tx + (1-t) y \in X$ .

I.e., $X$ contains all lines which pass through two distinct points in $X$ .
N.B.: An affine subset is just a translated linear subspace:
“a linear space that’s forgotten its origin”.

Example 1.

Let $X = \{ (x,y,0): x,y \in \mathbb{R} \}$ be the $xy$ -plane in $\mathbb{R}^3$ .
Then any translation or rotation of $X$ is an affine subset.

Example 2.

If $A \in \mathbb{R}^{m \times n}$ , $b \in \mathbb{R}^m$ , then $X=\{x \in \mathbb{R}^n: Ax = b\}$ is affine.
( $X$ is just a translate of $\text{ker}\,A$ .)

Example 3.

Let

$A = \begin{bmatrix} 0&0&0\\ 0&0&0\\ 0&0&1 \end{bmatrix}\qquad \text{ and } \qquad b = \begin{bmatrix} 0\\0\\3 \end{bmatrix}$ .

Then the solution set to $Ax=b$ is

$\{(x,y,3): x,y \in\mathbb{R}\}$ ,

which is just $\text{ker}\, A$ translated by 3 in the $z$ -direction:

Cones

Cone: A subset $X \subset \mathbb{R}^n$ satisfying:

For all $x \in X$ and $t \geq 0$ , there holds $tx \in X$ .

I.e., $X$ contains all “positive” rays emanating from the origin and passing through any of its points.
Proposition. $X \subset \mathbb{R}^n$ is a convex cone iff for all $x_1,x_2 \in X$ and $\theta_1,\theta_2\geq0$ , there holds $\theta_1 x_1 + \theta_2 x_2 \in X$ .

Proof.

Step 1. ( $\implies$ )

Suppose $X$ is a convex cone and let $x_1,x_2 \in X$ and $\theta_1,\theta_2\geq0$ be arbitrary.
Want to show: $\theta_1 x_1 + \theta_2 x_2 \in X$ .

Step 2.

Being conic implies $x = \frac{\theta_1}{t} x_1$ and $y=\frac{\theta_2}{1-t}x_2$ belong to $X$ for all $0<t<1$ .

Step 3.

Being convex implies $tx+ (1-t)y = \theta_1 x_1 + \theta_2 x_2 \in X$ , as desired.

Step 4. ( $\impliedby$ )

Suppose $X$ is such that $\theta_1 x_1 + \theta_2 x_2 \in X$ for all $x_1,x_2 \in X$ and $\theta_1,\theta_2\geq0$ .
Want to show: $X$ is a convex cone.

Step 5.

$X$ being conic follows from taking $\theta_1\geq0$ arbitrary and $\theta_2 = 0$ .

Step 6.

Convexity follows from taking $\theta_1 + \theta_2 = 1$ with $\theta_1,\theta_2 \geq0$ and $0 \leq \theta_1 \leq 1$ .
Indeed: $\theta_1 + \theta_2 = 1$ and $t := \theta_1$ $\implies$ $t x_1 + (1-t) x_2 \in X$ for $0 \leq t \leq 1$ since $1-t = \theta_2$ .

Examples.

Hyperplanes $\{ x \in \mathbb{R}^n : a^{T} x = 0\}$ with normal $a \in \mathbb{R}^n$ ,
halfspaces $\{ x \in \mathbb{R}^n : a^{T}x \leq 0 \}$ ,
nonnegative orthants $\{x \in \mathbb{R}^n: x \succeq 0\}$ are all convex cones.
(Here, $u \succeq v$ if $u_i \geq v_i$ for $i = 1,\ldots,n$ .)
Given a norm $\Vert\cdot\Vert$ on $\mathbb{R}^n$ , the $\Vert\cdot\Vert$ -norm cone is
$\{ (x,t) \in \mathbb{R}^{n+1} : \Vert x \Vert \leq t \}$ ,
which is a convex cone in $\mathbb{R}^{n+1}$ .
See “positive semidefinite cone” below.

Polyhedra

Polyhedron: Any subset $X \subset \mathbb{R}^n$ of the form

$X = \{ x \in \mathbb{R}^n : a_j^T x \leq b_j,\, c_i^T x = d_i \}$
$j=1,\ldots,m$
$i=1,\ldots,p$

given the vectors $a_j,c_i \in \mathbb{R}^n$ and scalars $b_j,d_i \in \mathbb{R}$ .
Thus, $X$ is a finite intersection of halfspaces and hyperplanes.
N.B.: Introducing equality constraints can be used to reduce dimension.

Example.

The polyhedron below is given by the indicated system of inequalities:

$\begin{aligned} y-x& \leq0\\ -x-y&\leq -1\\ x &\leq 3\\ -x &\leq -1 \end{aligned}$

It is an easy exercise to rewrite the inequalities in the notation $a_j^T x \leq b_j$ for suitable $a_j,b_j$ .

Positive Semidefiniteness

Symmetric matrix:

a matrix $X \in \mathbb{R}^{n \times n}$ satisfying $X = X^T$ ; i.e.,

$X = \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} \\ x_{21} & x_{22} & \cdots & x_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ x_{n1} & x_{n2} & \cdots & x_{nn} \end{bmatrix} = \begin{bmatrix} x_{11} & x_{21} & \cdots & x_{n1} \\ x_{12} & x_{22} & \cdots & x_{n2}\\ \vdots & \vdots & \ddots & \vdots\\ x_{1n} & x_{2n} & \cdots & x_{nn} \end{bmatrix} = X^T.$

Set of symmetric matrices:

$\boldsymbol{S}^n = \{ X \in \mathbb{R}^{n \times n}: X = X^T \}.$

Positive semidefinite matrix:

$X \in \boldsymbol{S}^n$ satisfying $z^T X z \geq 0$ for all $z \in \mathbb{R}^n$ .
Equivalently, $X$ only has nonnegative eigenvalues.
If $X \in \boldsymbol{S}^n$ is positive semidefinite, then write $X \succeq 0$ .
If $X,Y \in \boldsymbol{S}^n$ and $X-Y\succeq0$ , then write $X \succeq Y$ .
Set of symmetric positive semidefinite matrices:

$\boldsymbol{S}_+^n = \{ X \in \boldsymbol{S}^n: X \succeq 0 \}.$

(N.B.: $\succeq$ is not the same as component-wise inequality, as was the case for vectors.)

Positive definite matrix:

$X \in \boldsymbol{S}_+^n$ satisfying $z^T X z = 0$ if and only if $z=0$ .
Equivalently, $X$ only has positive eigenvalues.
If $X \in \boldsymbol{S}^n_+$ is positive definite, then write $X \succ 0$ .
If $X,Y \in \boldsymbol{S}^n_+$ and $X-Y\succ0$ , then write $X \succ Y$ .
Set of symmetric positive definite matrices:

$\boldsymbol{S}_{++}^n = \{ X \in \boldsymbol{S}^n: X \succ 0 \}.$

Example 1

Let $A = \begin{bmatrix} 2 & 0 \\ 0 & 4 \end{bmatrix}$ .
Since $A$ has positive eigenvalues $\{2,4\}$ , it follows that $A \succ 0$ .
To see $A \succ 0$ explicitly, observe

$\begin{aligned} z^T A z &= \begin{bmatrix} z_1 & z_2 \end{bmatrix} \begin{bmatrix} 2 & 0\\ 0 & 4 \end{bmatrix} \begin{bmatrix} z_1 \\ z_2 \end{bmatrix}\\ &= 2 z_1^2 + 4 z_2^2\\ &\geq0 \end{aligned}$

with $z^T A z = 0$ iff $z = \begin{bmatrix}0\\0\end{bmatrix}$ .

Example 2.

Let $B = \begin{bmatrix} 4 & 0 \\ 0 & 0 \end{bmatrix}$ .
Since $B$ has nonnegative eigenvalues $\{ 4,0 \}$ , it follows that $B \succeq 0$ .
To see $B \succeq 0$ explicitly, observe

$\begin{aligned} z^TBz &= \begin{bmatrix}z_1 & z_2 \end{bmatrix} \begin{bmatrix} 4 & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} z_1 \\ z_2 \end{bmatrix}\\ &=4 z_1^2\\ &\geq 0. \end{aligned}$

Evidently, $z^T B z=0$ for $z = \begin{bmatrix} 0\\z_2 \end{bmatrix}$ and so $B \not\succ 0$ .

Example 3.

Let $C =\begin{bmatrix} 3 & 2 \\ 2 & 3 \end{bmatrix} \succ 0$ .
One can conclude $C \succ0$ by showing that $C$ has positive eigenvalues $\{ 5,1 \}$ .
To see it directly, compute

$\begin{aligned} z^T C z &= \begin{bmatrix}z_1 & z_2 \end{bmatrix} \begin{bmatrix} 3&2\\2&3 \end{bmatrix} \begin{bmatrix} z_1 \\z_2 \end{bmatrix}\\ &= 3z_1^2 + 4z_1z_2 + 3z_2^2. \end{aligned}$

But the discriminant (with respect to $z_1$ ) of this quadratic satisfies $-20 z_2^2 \leq 0$ , from which we conclude the polynomial is positive unless $z_1=z_2=0$ and hence $C \succ 0$ .

Example 4.

Let $D = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix}$ .
We can conclude $D \not\succeq 0$ , either compute its eigenvalues $\{3,-1\}$ or observe that $\det D = 1 - 4 = -3$ and whence $D$ cannot have only nonnegative eigenvalues.

Positive Semidefinite Cone

Proposition 1. $\boldsymbol{S}^n$ is a $\frac{n(n+1)}{2}$ -dimensional real vector space and $\boldsymbol{S}_+^n$ is a convex cone in $\boldsymbol{S}^n$ .

Proof.

Step 1. $\boldsymbol{S}^n$ is a vector space:

if $X,Y \in \boldsymbol{S}^n$ and $c \in \mathbb{R}$ , then it is easy to see:

$(X+cY)^T = X^T + cY^T = X + cY$ .

and so $X+cY \in \boldsymbol{S}^n$ .

Step 2. $\text{dim}\,\boldsymbol{S}^n = \frac{n(n+1)}{2}$ :

since $X \in \boldsymbol{S}^n$ implies $X=X^T$ , we have the identification

$\begin{aligned} X&= \begin{bmatrix} \boldsymbol{x_{11}} & \boldsymbol{x_{12}} & \boldsymbol{x_{13}} & \cdots & \boldsymbol{x_{1n}}\\ x_{12} & \boldsymbol{x_{22}} & \boldsymbol{x_{23}} &\cdots & \boldsymbol{x_{2n}}\\ x_{13} & x_{23} & \boldsymbol{x_{33}} &\cdots & \boldsymbol{x_{3n}}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ x_{1n} & x_{2n} & x_{3n} & \cdots & \boldsymbol{x_{nn}}\\ \end{bmatrix}\\ &\qquad\qquad\qquad\qquad\iff\\ \xi&:= \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} & x_{22} & x_{23} & \cdots & x_{2n} & \cdots & x_{nn} \end{bmatrix}^T, \end{aligned}$

where the bolded entries in $X$ indicate the unique contributions to making $\xi$ .
Counting the number of bold entries shows $\xi$ has $\frac{n(n+1)}{2}$ entries and hence $\xi \in \mathbb{R}^{\frac{n(n+1)}{2}}$ .

Step 3. $\boldsymbol{S}_+^n$ is a convex cone:

For $\theta_1,\theta_2\geq0$ , $X,Y \in \boldsymbol{S}_+^n$ and $z \in \mathbb{R}^n$ there holds

$z^T(\theta_1 X +\theta_2 Y)z = \theta_1 z^T X z + \theta_2 z^T Y z \geq 0$ ,

and so $\theta_1 X + \theta_2 Y \in \boldsymbol{S}_+^n$ .
By the proposition in Convex Geometry.Cones, we conclude the desired result.

Proposition 2. $\begin{bmatrix} a&b\\b&c \end{bmatrix} \in \boldsymbol{S}_+^2$ iff

$a,c\geq0$ and $\det \begin{bmatrix} a&b\\b&c \end{bmatrix} = ac - b^2 \geq 0$ .

Proof.

Step 1.

Let

$x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \quad \text{and} \quad X = \begin{bmatrix} a&b\\b&c \end{bmatrix},$

recalling that $X \in \boldsymbol{S}_+^2$ iff $x^T X x \geq 0$ for all $x \in \mathbb{R}^2$ .

Step 2. (Case $a=0$ )

First compute

$x^T X x = \begin{bmatrix}x_1 & x_2 \end{bmatrix} \begin{bmatrix} 0&b\\b&c \end{bmatrix}\begin{bmatrix} x_1\\x_2\end{bmatrix} = 2bx_1x_2+cx_2^2.$

Observe that

$2bx_1x_2 + cx_2^2 \geq 0$ for all $x_1,x_2 \in \mathbb{R}$ iff $b=0,c\geq0$ .

Note (in case $a=0$ ): $ac - b^2 = 0$ iff $b=0$ .
Can thus conclude (in case $a=0$ ):

$x^TXx \geq 0$ for all $x \in \mathbb{R}^n$ iff $c\geq0, \det X =0$ .

Step 3. (Case $a \neq 0$ )

Completing the square gives

$\begin{aligned} x^T X x &= \begin{bmatrix}x_1&x_2\end{bmatrix}\begin{bmatrix}a&b\\b&c\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix}\\ &=ax_1^2 + 2bx_1 x_2 + c x_2^2 \\ &= a(x_1 + a^{-1}bx_2)^2 + a^{-1}\det\, X\, x_2^2. \end{aligned}$

But

$a(x_1 + a^{-1}bx_2)^2 + a^{-1}\det\, X\, x_2^2 \geq0$ for all $x_1,x_2 \in \mathbb{R}$

iff

$a>0$ and $\det\,X \geq0$ .

(To conclude $a>0$ , take $x_2=0$ .)
N.B.: strictly speaking, $c\geq0$ was not used anywhere; however, $X \succeq0$ immediately implies $c \geq0$ , and $a>0,ac-b^2 \geq0$ also implies $c\geq0$ .

Step 4.

Putting Steps 2. and 3. together, we conclude:
$\begin{bmatrix} a&b\\b&c \end{bmatrix} \in \boldsymbol{S}_+^2$ iff

$a,c\geq0$ and $\det \begin{bmatrix} a&b\\b&c \end{bmatrix} = ac - b^2 \geq 0$ .

Image of $\boldsymbol{S}_+^2$

In light of Proposition 1 and Proposition 2, we can plot

$\boldsymbol{S}_+^2 \subset \boldsymbol{S}^2 \cong \mathbb{R}^3$ .

The image below depicts the boundary of

$\boldsymbol{S}_+^2 \cong \{ (x,y,z): xz \geq y^2 ,x\geq0,z\geq0\}$ .

Separating Hyperplanes

Let $A,B \subset \mathbb{R}^n$ be two sets.
Separating hyperplane: a hyperplane given by

$P_{a,b} := \{ x : a^T x = b \}$

for some $a \in \mathbb{R}^n$ and $b \in \mathbb{R}$ such that

$\begin{aligned} a^Tx -b \geq 0 & \text{ on } A\\ a^Tx - b \leq 0 & \text{ on } B \end{aligned}.$

$P_{a,b}$ is said to separate $A$ and $B$ .
Thus, $P_{a,b}$ cuts $\mathbb{R}^n$ into two halfspaces with one containing all of $A$ and the other containing all of $B$ .
Separating Hyperplane Theorem. If $A,B \subset \mathbb{R}^n$ are two disjoint convex sets, then there exists $a \in \mathbb{R}^n$ and $b \in \mathbb{R}$ such that $P_{a,b}$ is a separating hyperplane which separates $A$ and $B$ .

Example 1.

Consider the convex sets

$\begin{aligned} A &= \{(x,y) : \left(x-1\right)^{2}+2\left(y-1\right)^{2\ }\leq1 \}\\ B &= \{(x,y) : \left(x+1\right)^{2\ }+\ \left(y+1\right)^{2\ }\leq1\}\\ C &= \{(x,y) : \left(x-2\right)^{2}+2\left(y-1\right)^{2\ }\leq1 \}\\ P:&=P_{(1,1),0} = \{(x,y) : y+x=0 \} \end{aligned}.$

These three sets are indicated in the image below.
Note that $P$ separates the pairs $[B,A]$ and $[B,C]$ . Moreover, the pair $[A,C]$ cannot be separated since $A$ and $C$ have significant overlap.

Example 2.

Consider the convex sets

$\begin{aligned} A &= \{(x,y) : x^{2\ }+\ y^{2}<1 \}\\ B &= \{(x,y) : \left(x+2\right)^{2\ }+y^{2}<1 \}\\ P:&=P_{(1,0),-1} = \{(x,y) : x=-1 \} \end{aligned}.$

These sets are indicated in the image below.
First note $A \cap B = \emptyset$ since neither contain their boundaries.
As such, they have a separating hyperplane which is given by $P$ .

N.B.: Replacing $A,B$ with their respective closures $\overline{A},\overline{B}$ , the plane $P$ still separates $\overline{A},\overline{B}$ .
Indeed, $x+1 \geq 0$ for $(x,y) \in A$ and $x+1 \leq 0$ for $(x,y) \in B$ .

Supporting Hyperplanes

Let $A \subset \mathbb{R}^n$ be a fixed set and fix a boundary point

$x_0 \in \text{bd}A := \overline{A} \setminus \text{int}A$ .

If the plane

$P_{a,a^Tx_0} = \{ x : a^Tx = a^Tx_0 \}$

separates $A$ and the singleton $\{x_0\}$ , then $P_{a,a^Tx_0}$ is called the supporting hyperplane of $A$ at $x_0$ .
Equivalently, $A$ lies entirely in a halfspace with boundary given by $P_{a,a^Tx_0}$ .

(Here: $\overline{A}$ indicates the closure of $A$ and $\text{int} A$ indicates its interior.)

Example.

Consider the convex sets

$\begin{aligned} A &= \{ (x,y) : (x-2)^2 + (y-2)^2 \leq 1 \}\\ P :&= P_{(-1,0),-1} = \{ (x,y) : x = 1 \} \end{aligned}$

with boundary point $x_0 = (1,2) \in \partial A$ .
Letting $a = \begin{bmatrix} -1\\0\end{bmatrix}$ , we note $a^T x_0 + 1 = 0 \geq 0$ .
Next, observing that, if $(x,y) \in A$ , then $x \geq 1$ and so

$a^Tx + 1 = \begin{bmatrix} -1&0 \end{bmatrix} \begin{bmatrix} x\\y \end{bmatrix} +1 = -x +1 \leq 0$ .

Thus $P$ separates $A$ and $\{ x_0 \}$ , showing that $P$ is a supporting hyperplane of $A$ at the boundary point $x_0$ ; see image below.

Hulls

Let $X \subset \mathbb{R}^n$ be a fixed subset.

Convex hull:

the set

$\{ \theta_1 x_1 + \cdots + \theta_k x_k : x_i \in X,\, \theta_i \geq 0,\, i = 1,\ldots,k,\, \theta_1 + \cdots + \theta_k= 1\}$ .

This is just the collection of all convex combinations of points in $X$ and is itself convex.
Example: The images below depict a set of three points and its convex hull.

Affine hull:

the set

$\{ \theta_1 x_1 + \cdots + \theta_k x_k : x_i \in X,\, i = 1,\ldots,k,\, \theta_1 + \cdots + \theta_k= 1\}$ .

This is just the collection of all affine combinations of points in $X$ and is itself affine.
Example: The images below depict two points and their affine hull.

Conic hull:

The set

$\{ \theta_1 x_1 + \cdots + \theta_k x_k : x_i \in X,\, \theta_i \geq 0,\, i = 1,\ldots,k\}$ .

This is just the collection of all conic combinations of points in $X$ and is itself a cone.
Example: The images below depict two points $x_1 = (0.5,1)$ , $x_2 = (1.5,0.5)$ and their conic hull.

Details

To see that the conic hull really is the shaded region, note that, by taking $\theta_1 = t s_1$ and $\theta_2 = (1-t)s_2$ , where $t \in [0,1]$ and $s\geq0$ , the conic hull contains all points of the form $ts_1x_1 + (1-t)s_2x_2$ .
Thus, it contains all line segments connecting any two points on the nonnegative rays $\{ s x_1 : s \geq 0 \}$ and $\{ s x_2 : s \geq 0 \}$ .

N.B.:

Conic hulls are convex cones.
Taking the “___ hull” of $X$ does indeed result in a “___” set.
The “___ hull” is a construction of the smallest “___” subset containing $X$ .

Generalized Inequalities

Proper cone: a convex cone $K \subset \mathbb{R}^n$ satisfying

$K$ is closed (i.e., $K$ contains its boundary)
$K$ has nonempty interior
$x,-x \in K \implies x=0$ .

Generalized inequality: given a proper cone $K$ , a partial ordering $\prec_K$ on $\mathbb{R}^n$ defined by

$x \preceq_K y \iff y-x \in K$ .

N.B.: $\preceq_K$ is a partial ordering and so $x \preceq_K y$ is not well-defined for all $x,y$ . Generalized strict inequality: given a proper cone $K$ , a partial ordering $\prec_K$ on $\mathbb{R}^n$ defined by

$x \prec_K y \iff y-x \in \text{int}\,K$ .

Examples

(CO Example 2.14)
If $K = \mathbb{R}^n_+$ , then $\preceq_K$ is the standard componentwise vector inequality:
$v \preceq w \iff v_i \leq w_i, \, i = 1,\ldots,n$ .
N.B.: $\preceq_{\mathbb{R}_+}$ is the standard inequality on $\mathbb{R}$ .
(CO Example 2.15)
If $K = \boldsymbol{S}_+^n$ , then
$\begin{aligned} A &\preceq_K B \implies B-A \text{ is positive semidefinite}\\ A &\prec_K B \implies B-A \text{ is positive definite} \end{aligned}$ .
Let
$K = \{(x_1,x_2) : x_1 \leq 2x_2, x_2 \leq 2x_1 \}$ .
Then is a proper cone.
In the image below:
- $K$ is the cone with vertex $(0,0)$ .
- The cone with vertex $x = (-1,1)$ depicts those $y \in \mathbb{R}^2$ with $x \preceq_K y$ .
- The cone with vertex $x = (1,-1)$ depicts those $y \in \mathbb{R}^2$ with $x \preceq_K y$ .
N.B.: , and so , as indicated in the image.
Moreover, and are not comparable.

Convex Function Theory

Conventions and Notations

Writing $f:\mathbb{R}^n \to \mathbb{R}$ always means a partial function with domain $\text{dom}\,f$ possibly smaller than $\mathbb{R}^n$ .
“Function” will mean “partial function.”
If $\text{dom}\, f \neq \mathbb{R}^n$ , we may work with the extension $\tilde{f}:\mathbb{R}^n \to \mathbb{R} \cup \{ +\infty \}$ given by
$\tilde{f}(x) = \begin{cases} f(x) & x \in \text{dom}\, f\\ +\infty & x \notin \text{dom}\, f \end{cases}.$
It is common to implicitly assume $f$ has been extended and to write $f$ for the partial function $f$ and its extension $\tilde{f}$ .
Given a set $C \subset \mathbb{R}^n$ , its indicator function is
$\tilde{I}_C = \begin{cases} 0 & x \in C\\ +\infty & x \notin C \end{cases}.$
We write
$\begin{aligned} \mathbb{R}_+ &:= \{ x \in \mathbb{R}: x\geq0 \}\\ \mathbb{R}_{++} &:= \{ x \in \mathbb{R}: x > 0 \} \end{aligned}.$

Convex Functions

Let $f:\mathbb{R}^n \to \mathbb{R}$ be a function with convex domain $\text{dom}\, f$ .

Convexity:

for all $x,y \in \text{dom}\, f, t \in [0,1]$ there holds

$f(t x + (1-t)y) \leq t f(x) + (1-t)f(y)$

(This inequality is often called Jensen’s inequality.)

Strict convexity:

for all $x,y \in \text{dom}f,\, x\neq y, t \in (0,1)$ there holds

$f(t x + (1-t)y) < t f(x) + (1-t)f(y)$ .

Example: failure of strict convexity

In the figure, the solid line indicates part of the graph of $x^4$ and the dashed line indicates part of the graph of a linear function.
This function fails to be everywhere strictly convex due to linear functions satisfying

$f(tx + (1-t)y) = tf(x) + (1-t)f(y)$ .

Concavity and strict concavity:

when $-f$ is, respectively, convex and strictly convex.

Remarks.

It is instructive to compare convexity/concavity with linearity and view the former as weak versions of linearity.

It is common to extend the definition of convexity to extended functions, i.e., those of the form $f: \mathbb{R}^n \to \mathbb{R} \cup \{+\infty\}$ .

For example, the indicator function $\tilde{I}_{(-\infty,2)}$ is convex in this sense.
To give insight, consider the image below, where the thick line is the “graph” of $\tilde{I}_{(-\infty,2)}$ and the dashed line is the “secant line” connecting the points $(1,0)$ to $(x,\infty)$ for any $x\geq 2$ .

Examples

All linear functions are convex and concave on their domains.
$e^x$ is convex on $\mathbb{R}$ .
$|x|^p$ is convex on $\mathbb{R}$ for $p \geq 1$ .
$x^p$ is convex on $\mathbb{R}_{++}$ for $p \geq 1$ or $p \leq 0$ and concave for $0 \leq p \leq 1$ .
$- \log\det X$ is convex on $\boldsymbol{S}_{++}^n$
If $C \subset \mathbb{R}^n$ is convex, then its indicator function $\tilde{I}_C$ is convex (in the extended value sense).

One Dimensional Characterization

Proposition. Let $f:\mathbb{R}^n \to \mathbb{R}$ have convex domain and, given
$x \in \text{dom}\, f$ and $v \in \mathbb{R}^n$ ,
define the function $g:\mathbb{R} \to \mathbb{R}$ by
$g(t) = f(x+tv)$
with
$\text{dom}\,g := \{ t \in \mathbb{R} : x + tv \in \text{dom}\, f \}$ .
Then $f$ is convex iff $g$ is convex for all $x \in \text{dom}\,f$ and $v \in \mathbb{R}^n$ such that $g$ is well-defined.

Proof.

Step 1.

First note that $\text{dom}\,g$ is convex: it is the intersection of $\text{dom}\,f$ with the line passing through $x$ with direction $v$ .

Step 2. ( $\implies$ )

Suppose $f$ is convex and let $x \in \text{dom}\,f$ and $v \in \mathbb{R}^n$ be arbitrary.
Then, for $\theta \in [0,1]$ and $t_1,t_2 \in \text{dom}\, g$ , there holds

$\begin{aligned} g(\theta t_1 + (1-\theta) t_2) &= f(x + (\theta t_1 + (1-\theta)t_2)v)\\ &= f(x + \theta t_1 v + (1-\theta) t_2 v)\\ &= f(\theta(x + t_1 v) + (1-\theta)(x+t_2 v) ) \\ &\leq \theta f(x+t_1 v) + (1-\theta)f(x + t_2 v)\\ &= \theta g(t_1) + (1-\theta)g(t_2), \end{aligned}$

proving that $g$ is convex.

Step 3. ( $\impliedby$ )

Suppose now that each $g$ is convex.
Fix $x,y \in \text{dom}\,f$ and let $\theta \in [0,1]$ .
We want to show

$f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta)f(y)$ .

Let $v = y-x$ and

$g(t) = f(x + t(y-x))$ ,

noting that $g(0)=f(x), g(1)=f(y)$ .
Since $g$ is convex, we conclude

$\begin{aligned} f(\theta x + (1-\theta)y) &= f(x + (1-\theta)(y-x))\\ &= g(1-\theta)\\ &=g(\theta \cdot 0 + (1-\theta)\cdot 1)\\ &\leq \theta g(0) + (1-\theta) g(1)\\ &=\theta f(x) + (1-\theta) f(y). \end{aligned}$

This is enough to conclude $f$ is convex.

First Order Characterization

Proposition. If $f:\mathbb{R}^n \to \mathbb{R}$ is differentiable with convex domain $\text{dom}\, f$ , then $f$ is convex iff
$f(x) + \nabla f(x)^{T}(y-x) \leq f(y), \quad \forall x,y \in \text{dom}\, f$ .

Proof (sketch).

We prove it in case $f:\mathbb{R} \to \mathbb{R}$ ; the higher dimensional case follows by using that $f:\mathbb{R}^n \to \mathbb{R}$ with convex domain is convex iff it is convex as a single variable function when restricted to lines intersecting $\text{dom}\,f$ .
Throughout, let $x,y \in \text{dom}\,f$ and $t \in [0,1]$ .

Step 1. ( $\implies$ )

If $f$ is convex, then we obtain the following inequalities

$\begin{aligned} f(ty + (1-t)x) \leq tf(y) + (1-t)f(x)\quad&\text{(convexity)}\\ f(x) + \frac{f(x+t(y-x))-f(x)}{t} \leq f(y)\quad&\text{(rearranging)}\\ f(x) + \frac{f(x+t(y-x))-f(x)}{t(y-x)}(y-x) \leq f(y)\quad &\text{(rearranging)}\\ f(x) + f'(x)(y-x) \leq f(y)\quad&\text{(taking }t \to 0 \text{)}. \end{aligned}$

Step 2. ( $\impliedby$ )

Supposing

$f(x) + f'(x)(y-x) \leq f(y)$

we set $z = tx + (1-t)y$ and add the two inequalities

$\begin{aligned} t f(z) + tf'(z)(x - z) &\leq tf(x)\\ (1-t) f(z) + (1-t)f'(z)(y - z) &\leq (1-t)f(y) \end{aligned}$

to obtain

$f(tx+(1-t)y)=f(z) \leq tf(x) + (1-t)f(y).$

Remarks.

For fixed $x \in \text{dom}\,f$ , the mapping
$A_x: \, y \mapsto f(x) + \nabla f(x)^{T}(y-x)$
is affine whose graph is a hyperplane passing through the point $(x,f(x))$ .
Therefore, the inequality $A_x(y) \leq f(y)$ means this hyperplane is a tangent plane at $(x,f(x))$ of the graph of $f$ lying under the graph of $f$ .
In fact, this plane is a supporting hyperplane of the epigraph
$\text{epi}\,f := \{ (x,t) \in \mathbb{R}^{n} \times \mathbb{R} : x \in \text{dom}\,f, t \geq f(x)\}$
at the point $(x,f(x))$ .

The affine mapping $A_x$ is just the first order Taylor approximation of $f$ at $x$ .
Thus, differential convex functions are such that their first order Taylor approximations serve as a global underestimators of $f$ .

Example.

In the image below:

solid line is the graph of $f(x) = e^x$ ;
shaded region is the convex set given by $\text{epi}\,f = \{ (x,t): t\geq e^x \}$ ;
dashed line is the supporting hyperplane at $(-1,e^{-1})$ given by the graph of $e^{-1} + e^{-1}(x+1)$ .

$f(x) + f'(x)(y-x)$ gives a supporting hyperplane

$f(x) + f'(x)(y-x)$ gives a supporting hyperplane

Second Order Characterization

Proposition. If $f$ is twice-differentiable with $\text{dom}\,f$ convex, then $f$ is convex iff
$\nabla^2 f(x) \succeq 0, \quad \forall x \in \text{dom}\, f$ .

Recall: if $f: \mathbb{R}^n \to \mathbb{R}$ is twice-differentiable, then its Hessian is

$\nabla^2 f(x) = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2}(x) & \frac{\partial^2 f}{\partial x_1 \partial x_2}(x) & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n}(x) \\ \frac{\partial^2 f}{\partial x_2 \partial x_1 }(x) & \frac{\partial^2 f}{\partial x_2^2 }(x) & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n}(x) \\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial^2 f}{\partial x_n \partial x_1}(x) & \frac{\partial^2 f}{\partial x_n \partial x_2}(x) & \cdots & \frac{\partial^2 f}{\partial x_n^2}(x) \\ \end{bmatrix} \in \mathbb{R}^{n \times n}.$

Proof (sketch).

Step 0.

The proof is a little more involved, so let us just give two intuitive justifications.

Justification 1.

The second order Taylor approximation gives

$f(y) = f(x) + \nabla f(x)^T (y-x) + \frac{1}{2} (y-x)^T \nabla^2 f(x)(y-x)$

up to some small error. But

$\nabla^2 f(x) \succeq 0$

implies

$(y-x)^T \nabla^2 f(x)(y-x) \geq0$

and so

$f(y) \geq f(x) + \nabla f(x)^T (y-x)$

(again, up to some small error).
The first order approximation from Convex Function Theory.First Order Characterization then implies convexity.

Justification 2.

Another intuitive justification is that $\nabla^2 f(x) \succeq 0$ means the graph of $f$ curves everywhere upward like a paraboloid, which evidently suggests convexity.

Remarks.

Recall: for $x \in \text{dom}\,f$ , there holds
$\begin{aligned} \nabla^2 f(x) \succeq 0 & \iff \nabla^2 f(x) \text{ is positive semidefinite}\\ & \iff \nabla^2 f(x) \in \boldsymbol{S}_+^n. \end{aligned}$
$\nabla^2 f (x) \succ 0$ for all $x \in \text{dom}\, f$ implies $f$ is strictly convex.
Converse is false since $x^4$ is strictly convex.

Level Sets

Fix a function $f: \mathbb{R}^n \to \mathbb{R}$ and let $c \in \mathbb{R}$

$c$ -Level set:

the set

$S(c):=\{ x \in \text{dom}\, f : f(x) = c\}$ .

The figure below depicts level sets of $x^2 + e^{y^2}$ with $c=5,10,17,26.$

$c$ -Sublevel set:

the set

$S_c = \{ x \in \text{dom}\,f : f(x) \leq c\}.$

The figure below depicts the sublevel sets of $x^2+e^{y^2}$ with $c=5,10,17,26$ .
Each shade of gray indicates a new sublevel set and of course $S_c \subset S_{c'}$ for $c<c'$ .

$c$ -Superlevel set:

the set

$S^c = \{ x \in \text{dom}\,f : f(x) \geq c \}$ .

The figure below depicts the superlevel sets of $x^2+e^{y^2}$ with $c=5,10,17,26$ .
Each shade of gray indicates a new superlevel set and of course $S^c \supset S^{c'}$ for $c<c'$ .

Proposition. If $f$ is convex, then the sublevel set $S_c$ is convex for all $c \in \mathbb{R}$ .
Equivalently, if $f$ is concave, then the superlevel set $S^c$ is convex for all $c \in \mathbb{R}$ .

Proof.

Want to show: $x,y \in S_c$ implies $tx + (1-t)y \in S_c$ for all $t \in [0,1]$ .
If $x,y \in S_c$ , then $f(x),f(y) \leq c$ and so convexity of $f$ gives

$\begin{aligned} f(tx+(1-t)y) &\leq t f(x) + (1-t)f(y) \\ &\leq t c + (1-t)c \\ &= c \end{aligned}$

and hence $tx + (1-t)y \in S_c$ as desired.

Graphs

Fix a function $f: \mathbb{R}^n \to \mathbb{R}$ .

Graph:

the set

$\{ (x,f(x)):x \in \text{dom}\,f \} \subset \mathbb{R}^n \times \mathbb{R}$ .

Example: graph of $e^x$ is given below.

Epigraph:

the set

$\text{epi}\, f = \{ (x,t): x \in \text{dom}\, f, f(x) \leq t \} \subset \mathbb{R}^n \times \mathbb{R}.$

Example: epigraph of $e^x$ is given below.

Hypograph:

the set

$\text{hypo}\, f = \{ (x,t) : x \in \text{dom}\, f, f(x) \geq t \} \subset \mathbb{R}^n \times \mathbb{R}.$

Example: hypograph of $e^x$ is given below.

Proposition. $f$ is convex iff $\text{epi}\, f$ is convex.
Equivalently, $f$ is concave iff $\text{hypo}\, f$ is convex.

Proof. (sketch)

We consider the case $f:\mathbb{R} \to \mathbb{R}$ for simplicity.

Step 1. ( $\implies$ )

Suppose $f$ is convex and let $x,y \in \text{epi}\,f$ be distinct points.
If $x,y$ both lie on a vertical line, then clearly $tx + (1-t)y \in \text{epi}\,f$ for $t \in [0,1]$ ; thus, suppose otherwise.
Let $\ell$ be the line passing through $x,y$ and let $x',y'$ be the two intersection points of $\ell$ with the graph of $f$ .
(If at most one intersection point exists, then it is easy to see that the line connecting $x$ and $y$ is in $\text{epi}\,f$ .)
By convexity of $f$ , the line formed by $tx' + (1-t)y'$ for $t \in [0,1]$ lies in $\text{epi}\,f$ , which is enough to conclude the line given by $tx + (1-t)y$ for $t \in [0,1]$ lies in $\text{epi}\,f$ .
This shows $\text{epi}\, f$ is convex.

Step 2. ( $\impliedby$ )

Suppose now $\text{epi}\,f$ is convex.
Let $x,y$ be two distinct points on the graph of $f$ .
Then $x,y \in \text{epi}\, f$ .
But convexity of $\text{epi}\,f$ implies the line formed by $tx+(1-t)y$ for $t \in [0,1]$ lies entirely in $\text{epi}\,f$ .
This is enough to conclude $f$ is convex.

Convex Calculus

The following list details some operations and actions that preserves convexity.
The main point: to conclude a function $f$ is convex, often one verifies $f$ may be built by other convex functions using, for example, the operations below.
N.B.: Conclusions only holds on common domains of the functions.

Conical combinations:

$f_1,\ldots,f_m$ convex and $c_1,\ldots,c_m \geq0$

$\implies$

$c_1 f_1 + \cdots + c_m f_m$ convex.

Weighted averages:

$f(x,y)$ convex in $x$ , $w(y) \geq0$ $\implies$ $\int f(x,y) w(y) dy$ convex.

Affine change of variables:

$f:\mathbb{R}^n \to \mathbb{R}$ convex, $A \in \mathbb{R}^{n\times m}$ , $b \in \mathbb{R}^n$

$\implies$

$f(A x + b)$ convex.

Maximum:

$f_1,\ldots,f_m$ convex

$\implies$

$f(x) := \max\{f_1(x),\ldots,f_m(x)\}$ convex.

Supremum:

$f(x,a)$ convex in $x$ for each $a \in \mathcal{A}$

$\implies$

$h(x):=\sup\{f(x,a):a \in \mathcal{A}\}$ convex.

Justification.

For $t \in [0,1]$ , there holds

$\begin{aligned} h(tx + (1-t)y) &= \sup \{ f(tx + (1-t)y,a) : a \in \mathcal{A} \}\\ &\leq \sup \{ tf(x,a) + (1-t) f(y,a) : a \in \mathcal{A}\}\\ &\leq t \sup\{ f(x,a) :a \in \mathcal{A}\} + (1-t)\sup\{f(y,a):a \in \mathcal{A} \}\\ &=th(x) + (1-t)h(y). \end{aligned}$

Example.

Let

$\begin{aligned} g(x)&:\mathbb{R}^n \to \mathbb{R} \text{ be given}\\ f(x,y)&:= y^T x - g(x). \end{aligned}$

N.B.: for each $x$ , the mapping $y \mapsto f(x,y)$ is affine and hence convex.
Thus

$h(y):=\sup\{y^Tx - g(x) : x \in \text{dom}\,g \}$

defines a convex function.

Infimum:

$f:\mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}$ convex in $(x,y)$

$C \subset \mathbb{R}^m$ convex

$\inf_{y \in C}f(x,y)$ finite for some $x$

$\implies$

$g(x):=\inf_{y \in C}f(x,y)$ convex on

$\text{dom}\,g := \{ x : (x,y) \in \text{dom}\, f \text{ for some } y \in C\}$ .

Fenchel conjugation

Let $f:\mathbb{R}^n \to \mathbb{R}$ be given (not necessarily convex).
Fenchel conjugate: $f^*(y) = \sup \{ y^T x - f(x) : x \in \text{dom}\,f\}$ .

N.B.:

$\text{dom}\, f^* = \{ y \in \mathbb{R}^n : f^*(y) < \infty \}$ ;

i.e., those $y \in \mathbb{R}^n$ for which $y^Tx - f(x)$ is bounded above on $\text{dom}\,f$ as a function of $x$ .

Intuition.

Suppose $f: \mathbb{R}_+ \to \mathbb{R}_+$ is a differentiable convex function denoting the cost to produce $x$ items.
For a given unit price $y \in \mathbb{R}_+$ , the profit of selling $x$ units is

$P(x,y) = yx - f(x)$ .

Thus $f^*(y)$ is just the optimal profit for selling at price $y$ .
N.B.: $f$ convex implies $P(\cdot,y)$ is concave for each $y$ .
Thus, $P(x,y_0)$ is maximal at $x_0$ satisfying $P'(x_0,y_0) = 0$ , i.e., when $y_0 = f'(x_0)$ .
Viz.: the $x_0$ where $f$ has slope $y_0$ .
The tangent line through $(x_0,f(x_0))$ is then given by $y=y_0 (x - x_0) + f(x_0)$ .
Lastly, note that the $y$ -intercept of this line is $-y_0x_0 + f(x_0)=-f^*(y_0)$ ,

Remarks

Often $f^*$ is just called the conjugate function of $f$ .
Since $f^*$ is the supremum of a family of affine functions, $f^*$ is always convex, even if $f$ is not.
(Follows from Convex Function Theory.Convex Calculus.
If
- $f$ is convex
- $\text{epi}\,f$ is a closed subset of $\mathbb{R}^n \times \mathbb{R}$ ,
then .

Example.

We will compute the conjugate function of

$\begin{aligned} f(x) &= e^x\\ \text{dom}\,f &= \mathbb{R}. \end{aligned}$

Thus, let

$\begin{aligned} h_y(x) &= yx - f(x) \\ &= yx - e^x . \end{aligned}$

Case $y<0$ :

$h_y(x)$ is unbounded on $\text{dom}\,f = \mathbb{R}$ since

$h_y(x) \to +\infty$ as $x \to -\infty$ .

Thus

$\sup\{yx-e^x: x \in \text{dom}\,f\} = +\infty$ .

Case $y>0$ :

Compute

$\begin{aligned} h_y'(x) &= y - e^x =0 \text{ when } x=\log y \\ h_y''(x)&=-e^x \leq 0. \end{aligned}$

Thus $x=\log y$ maximizes $h_y$ and so

$\begin{aligned} \sup\{yx-e^x:x \in \text{dom}\,f\} &= \max\{yx-e^x:x \in \text{dom}\,f \}\\ &= h_y(\log y) \\ &= y\log y - y \end{aligned}$

Case $y=0$ :

Compute

$h_0(x) = -e^x$ ,

which evidently has least upper bound $0$ and so

$\sup\{ -e^x : x \in \text{dom}\,f \} = 0.$

Conclusion:

Since

$yx-e^x$

is bounded on $\text{dom}\,f$ only for $y \geq 0$ , it follows that

$\text{dom}\, f^* = \mathbb{R}_{+} .$

Putting everything together:

$f^*(y) = \sup\{yx-e^x\} = y\log y - y$ for $y \in \mathbb{R}_+$ ,

where we take $0 \log 0 = 0$ .

Legendre Transform

Let $f:\mathbb{R}^n \to \mathbb{R}$ be convex, differentiable and with $\text{dom}\,f = \mathbb{R}^n$ .
Then, the Fenchel conjugate $f^*$ of $f$ is often called the Legendre transform of $f$ .

Proposition. If $f$ as above, $z \in \mathbb{R}^n$ and $y = \nabla f(z)$ , then

$f^*(y) = z^T \nabla f(z) - f(z)$ .

Proof.

Step 1.

Let

$h_y(x) = y^T x - f(x)$ .

and note

$x^\star \in \mathbb{R}^n$ maximizes $h_y$ iff $\nabla h_y(x^\star) = 0$

since $h_y$ is a sum of concave functions and hence concave.

Step 2.

Using Step 1. and

$\begin{aligned} \nabla(y^T x) &= y\\ \nabla h_y(x) &= \nabla(y^Tx-f(x)) = y - \nabla f(x). \end{aligned}$

conclude

$y = \nabla f(x^\star)$ iff $x^\star$ maximizes $h_y$ .

(In particular, $z \in \mathbb{R}^n$ maximizes $h_{\nabla f(z)}$ .)

Step 3.

Letting $z, y \in \mathbb{R}^n$ satisfy

$\begin{aligned} y &= \nabla f(z) \end{aligned}$

and using Steps 1. and 2., we conclude

$\begin{aligned} f^*(y) &= \sup \{ y^T x - f(x) : x \in \text{dom}\, f\}\\ &= \max \{ h_y(x) : x \in \text{dom}\,f \}\\ & = z^T \nabla f(z) - f(z) \end{aligned}$ ,

as desired.

Example 1.

Let

$f(x) = e^x$ ,

and compute

$f'(x) = e^x$ .

Given $z \in \mathbb{R}$ , let

$y = f'(z) = e^z$ ; i.e., $z = \log y$ .

Thus

$f^*(y) = z f'(z) - f(z) = y \log y - y$ ,

which agrees with our calculation for $f^*$ in a previous example.

Example 2.

Fix $Q \in \boldsymbol{S}_{++}^n$ and let

$\begin{aligned} f(x) &= \frac{1}{2} x^T Q x\\ \text{dom}\,f&= \mathbb{R}^n. \end{aligned}$

We will compute

$f^*(y) = \frac{1}{2}y^T Q^{-1}y$ .

Step 0.

Observe

$f$ is convex:
$\begin{aligned} \nabla^2 f(x) = Q \succ 0. \end{aligned}$

(Justification)
Consider case $n = 2$ .
Let $Q = \begin{bmatrix}a&b\\b&d\end{bmatrix}$ .
Thus $f(x_1,x_2) = \frac{1}{2}(ax_1^2 + 2 b x_1 x_2 + c^2 x_2^2)$ .
Easy now to see $\nabla^2 f = Q$ .
$Q \succ 0$ implies $Q$ is invertible since then $\det Q > 0$ .

Step 1.

Using

$\begin{aligned} \nabla f(x) &=\nabla(\frac{1}{2}x^TQx) \\ &= Qx, \end{aligned}$

we conclude

$y = \nabla f(z) \iff y = Qz \iff z = Q^{-1}y.$

Step 2.

Let $y = \nabla f(z)$ .
By preceding proposition and Step 1., there holds

$\begin{aligned} f^*(y) &= z^T \nabla f(z) - f(z)\\ &= (Q^{-1}y)^T y - \frac{1}{2}(Q^{-1}y)^TQ(Q^{-1}y)\\ &= y^T Q^{-1}y - \frac{1}{2} y^T Q^{-1} y\\ &= \frac{1}{2}y^T Q^{-1} y. \end{aligned}$

Other Notions of Convexity

There are two other important notions of convexity that we will return to if needed.
Let $f:\mathbb{R}^n \to \mathbb{R}$ be given.

Quasiconvexity: $\text{dom}\,f$ and the sublevel sets

$\{x \in \text{dom}\,f: f(x) \leq \alpha \}$

are convex for all $\alpha \in \mathbb{R}$ .
Features:

Quasiconvex problems may sometimes be suitably approximated by convex problems.
Local minima need not be global minima

Log-convexity: $f>0$ on $\text{dom}\, f$ and $\log f$ is convex; equivalently

$f(tx+(1-t)y) \leq f(x)^tf(y)^{1-t}$ for all $t \in [0,1]$ .

Generalized Convexity

Let $K \subset \mathbb{R}^m$ be a proper cone and let $f: \mathbb{R}^n \to \mathbb{R}^m$ .

$K$ -convexity: for all $x,y \in \mathbb{R}^n$ and $t \in [0,1]$ , there holds

$f(tx + (1-t)y) \preceq_K t f(x) + (1-t)f(y)$ .

Strict $K$ -convexity: for all $x \neq y \in \mathbb{R}^n$ and $t \in (0,1)$ , there holds

$f(tx + (1-t)y) \prec_K t f(x) + (1-t)f(y)$ .

Examples

(CO Example 3.47)
Let $K = \mathbb{R}_+^n$ .
Then $f: \mathbb{R}^n \to \mathbb{R}^m$ is $K$ -convex iff: $\text{dom}\, f$ is convex and for all $x,y \in \text{dom}f$ and $t \in[0,1]$ , there holds
$f(tx+(1-t)y) \preceq tf(x) + (1-t)f(y)$
which holds iff
$f_i(tx+(1-t)y) \leq tf_i(x) + (1-t)f_i(y)$
for each $i = 1,\ldots, m$ , i.e., iff $f$ is component-wisely convex.
(CO Example 3.48)
A function is -convex iff : is convex and for all and , there holds
$f(tx + (1-t)y) \preceq t f(x) + (1-t)f(y)$ .
N.B.:
- this is a matrix inequality and $\boldsymbol{S}_+^n$ -convexity is often called matrix convexity.
- $f$ is matrix convex iff $z^Tf(x)z$ is convex for all $z \in \mathbb{R}^m$ .
- The two functions
  $\begin{aligned} \mathbb{R}^{n \times m} \ni X &\mapsto XX^T\\ \boldsymbol{S}_{++}^n \ni X &\mapsto X^p, \quad 1 \leq p \leq 2, -1 \leq p \leq 0. \end{aligned}$
  are matrix convex.

Basics of Optimization Problems

General Optimization Problems

By an optimization problem (OP) we mean the following:

$\text{(OP)} \begin{cases} \text{minimize } & f_0(x) \quad \text{(objective)}\\ \text{subject to }& f_i(x) \leq 0, \quad i=1,\ldots,m \quad \text{(inequality constraints)}\\ & h_i(x) = 0, \quad i=1,\ldots,p \quad \text{(equality constraints)} \end{cases}.$

We call

$f_0:\mathbb{R}^n \to \mathbb{R}$ the objective function;
$x \in \mathbb{R}^n$ the optimization variable or parameters;
$f_i:\mathbb{R}^n \to \mathbb{R}$ , $i=1,\ldots,m$ , the inequality constraint functions; and
$h_i:\mathbb{R}^n \to \mathbb{R}$ , $i=1,\ldots,p$ , the equality constraint functions.

The domain of (OP) is the intersection

$D = \bigcap_{i=0}^{m} \text{dom} \, f_i \cap \bigcap_{i=1}^{p} \text{dom} \, h_i$ .

Feasibility

Consider an (OP) as above.
Feasible point: those $x \in D$ satisfying

$\begin{aligned} f_i(x)&\leq 0\quad\text{for } i =1,\ldots,m\\ h_i(x) &= 0 \text{ for }i=1,\ldots,p. \end{aligned}$

Feasible set: the subset $F \subset D$ consisting of the feasible points.
Feasible problem: A problem with nonempty feasible set, i.e., $F \neq \emptyset$ .
Infeasible problem: A problem with empty feasible set; i.e., there are no $x \in D$ which satisfy the inequality and equality constraints.

Remark.

A feasible problem need not have a solution; e.g., $f(x) = e^x$ has no minimizer nor minimum on $\mathbb{R}$ .
An infeasible problem never has a solution–there are no parameters $x$ to even test.

Basic Example

Consider the problem

$\begin{cases} \text{minimize } & \log(1-x^2-y^2)\\ \text{subject to }& (x-1)^2+(y-1)^2-1\leq0\\ & (x-y-1)^2+(y-1)^2-1\leq0 \end{cases}.$

The objective function is

$\begin{aligned} f_0(x,y) &= \log(1-x^2-y^2)\\ \text{dom}\,f_0 &= \{ (x,y) \in \mathbb{R}^2 : x^2+y^2<1\} \end{aligned}$ ,

The inequality constraint functions are

$\begin{aligned} f_1(x,y) &=(x-1)^2 + (y-1)^2-1\\ f_2(x,y) &=(x-y-1)^2 + (y-1)^2 - 1\\ \text{dom}\,f_1 &= \text{dom}\,f_2 = \mathbb{R}^2. \end{aligned}$ .

The domain of the problem is

$D = \text{dom}\, f_0 \cap \text{dom}\, f_1 \cap \text{dom} f_2 = \{ x^2+y^2<1\}.$ .

The feasible set:

Let

$\begin{aligned} A &= \text{dom}\,f_0\\ B &= \{(x-1)^2+(y-1)^2-1\leq0\}\\ C &= \{(x-y-1)^2 +(y-1)^2-1\leq0\} \end{aligned}$ .

These three sets are depicted in the image below.
Note that the darkest region given by $A \cap B \cap C$ is the feasible set.

Can we solve the problem?

Noting

$\log(1-x^2-y^2) \to -\infty$ as $(x,y)$ approaches a point on the circle $\{ x^2+y^2 = 1 \}$ , and
such sequences exist in the feasible set,

we conclude the problem does not have a solution.

The Feasibility Problem

Feasibility problem: Given an (OP) with

$\begin{aligned} &\text{inequality constraint functions } f_i, i = 1,\ldots,m\\ &\text{equality constraint functions } h_i, i = 1, \ldots,p \end{aligned}$

solve

$\begin{cases} \text{find} & x\\ \text{subject to} & f_i(x) \leq 0, i = 1, \ldots, m\\ & h_i(x) = 0, i = 1,\ldots, p \end{cases}.$

Viz.: the feasibility problem determines whether the constraints are consistent.

Example 1.

The problem

$\begin{cases} \text{find} & (x,y)\\ \text{subject to} &f_1(x,y) = x^2 + y^2 - 1 \leq 0\\ &f_2(x,y) = (x-1)^2 + y^2 - 1 \leq 0 \end{cases}$

has a solution since the two inequality constraints describe two intersecting disks.
This is depicted below.

Example 2.

The problem

$\begin{cases} \text{find} & (x,y)\\ \text{subject to} &f_1(x,y) = x^2 + y^2 - 1 \leq 0\\ &f_2(x,y) = (x-1)^2 + y^2 - 1 \leq 0\\ &h_1(x,y) = (x-\frac{1}{2})^2 + y^2 - 1 =0 \end{cases}$

has no solution since the circle given by $h_1=0$ lies outside of the intersection of the two disks.
This is depicted below, where the red circle is given by $h_1=0$ .

Optimal Value and Solvability

Recall:

$\begin{aligned} F &= \text{ feasible set of problem}\\ D &= \text{ domain of problem }. \end{aligned}$

Optimal value:

The value

$p^\star = \inf \{ f_0(x) : x \in F \}$ ,

i.e., $p^\star$ is the largest $p \in \mathbb{R}$ such that $p\leq f_0(x)$ for all $x \in F$ .
N.B.: $p^\star \in \mathbb{R} \cup \{ -\infty, + \infty \}$ .

Example:

Below depicts the graph of $f(x) = \frac{1}{x}$ on $\mathbb{R}_{++}$ .
Evidently, $\inf \{ \frac{1}{x} : x \in \mathbb{R}_{++} \} = 0$ .

Solvable:

When the problem satisfies

there exists $x^\star \in F$ with $f_0(x^\star) = p^\star$ ,

i.e., the minimum value $p^\star$ is attainable.

Example:

Below depicts the graph of a quartic $q(x)$ .
The problem of minimizing $q(x)$ on $\mathbb{R}$ is solvable with solution given by the minimal point $A$ .
N.B.: Point $B$ is a local minimum and hence does not give a solution.

Remarks.

$p^\star = \min\{f_0(x):x \in F \}$ iff the (OP) is solvable.
Indeed, $\min\{f_0(x):x \in F \}$ is not well-defined unless the (OP) is solvable.
$p^\star$ need not be finite:

$p^\star = -\infty$ if $f_0$ is unbounded below on the feasible set; and

$p^\star = + \infty$ if the OP is infeasible.

Standard Form

Optimization problems need not be placed in the form we defined them.
We therefore introduce the following definition.

(OP) in Standard form:

$\text{(OP)} \begin{cases} \text{minimize } & f_0(x) \\ \text{subject to }& f_i(x) \leq 0, \quad i=1,\ldots,m \\ & h_i(x) = 0, \quad i=1,\ldots,p \end{cases}.$

(This is how we defined (OP) before.)

Example: Rewriting in standard form.

We can recast more general optimization problems in standard form; e.g., consider

$\text{(OP2)} \begin{cases} \text{maximize } & F_0(x) \\ \text{subject to }& F_i(x) \leq G_i(x), \quad i=1,\ldots,m \\ & H_i(x) = K_i(x), \quad i=1,\ldots,p \end{cases}.$

Indeed, taking

: $f_0 = -F_0$ (noting $\min f_0 = -\max F_0$ ); $f_i = F_i - G_i$ for $i=1,\ldots,m$; $h_i = H_i - K_i$ for $i=1,\ldots,p$

we readily recast (OP2) into the standard form (OP).

Equivalent Problems

Suppose we are given two OP’s: (OP1) and (OP2).
We say (OP1) and (OP2) are equivalent if: solving (OP1) allows one to solve (OP2), and vice versa.
N.B.: Two problems being equivalent does not mean the problems are the same nor that they have the same solutions.

Example.

Consider the two problems:

$\begin{cases} \text{minimize} & f(x) = x^2\\ \text{subject to}& x \in [1,2] \end{cases} \quad \text{and} \quad \begin{cases} \text{minimize}&g(x) = (x+1)^2+1 \\ \text{subject to}&x \in [0,1] \end{cases}.$

Observing

$x^\star \in [1,2]$ minimizes $f(x)$ on $[1,2]$

iff

$x^\star - 1 \in [0,1]$ minimizes $g(x)$ on $[0,1]$ ,

we readily see the two problems are equivalent.
Indeed, if we find the solution $x^\star = 1$ to the first problem, we readily obtain the solution $x^\star - 1 = 0$ to the second problem, and vice versa.

Change of Variables

Suppose $\phi:\mathbb{R}^n \to \mathbb{R}^n$ is an injective function with $D \subset \phi(\text{dom} \,\phi)$ .
Then, under the change of variable $x \mapsto \phi(x)$ , we have

$\text{(OP1)} \begin{cases} \text{minimize} & f_0(x)\\ \text{subject to}& f_i(x) \leq 0, i=1,\ldots,m\\ & h_i(x) = 0, i=1,\ldots,p \end{cases}$

is equivalent to

$\text{(OP2)} \begin{cases} \text{minimize} & f_0(\phi(x))\\ \text{subject to}& f_i(\phi(x)) \leq 0, i=1,\ldots,m\\ & h_i(\phi(x)) = 0, i=1,\ldots,p \end{cases}.$

N.B.: such a change of variables does not change the optimal value $p^\star$ .
Moreover, injectivity may be dropped.

Justification.

Indeed,

if $x$ solves (OP1), then $\phi^{-1}(x)$ solves (OP2).
(More generally, $z$ such that $\phi(z) = x$ solves (OP2).)
if $z$ solves (OP2), then $\phi(z)$ solves (OP1).

Example

Consider the problem

$\begin{cases} \text{minimize} & e^x\\ \text{subject to} & \sqrt{x} - y \leq 0\\ &y-5 \leq 0\\ &x-5\leq 0 \end{cases}$ .

In the image below, the shaded region is the feasible set and the curve is the graph of $f_0(x) = e^x$ .

Consider the change of variables

$\phi(x) = x^2$ .

The objective and constraints change as follows:

$\begin{aligned} f_0(x) = e^x & \to f_0(\phi(x)) = e^{x^2}\\ \sqrt{x}-y \leq 0 & \to |x| - y \leq 0\\ y-5 \leq 0 & \to y - 5 \leq 0\\ x - 5 \leq 0 & \to x^2 - 5 \leq 0 \end{aligned}$

The new feasible region and objective function are plotted below.

Evidently, this change of variable changed a nonconvex (OP) into a convex one.

Eliminating Linear Constraints

Let $A \in \mathbb{R}^{m \times n}$ , $b \in \mathbb{R}^m$ and $x_0 \in \mathbb{R}^n$ a solution to $Ax=b$ .
Let $B \in \mathbb{R}^{n\times k}$ be such that $\text{range}\, B = \text{kernel}\, A$ .
Then $Ax=b$ iff $x=By + x_0$ for some $y \in \mathbb{R}^{k}$ .

Consequently

$\text{(OP1)} \begin{cases} \text{minimize} & f_0(x)\\ \text{subject to}& f_i(x) \leq 0, i=1,\ldots,m\\ & Ax = b \end{cases}$

is equivalent to

$\text{(OP2)} \begin{cases} \text{minimize} & f_0(By+x_0)\\ \text{subject to}& f_i(By+x_0) \leq 0, i=1,\ldots,m\\ \end{cases}.$

N.B.: this can reduce dimension of problem by $\text{rank}\, A$ many variables. (Recall: $n = \text{rank}\,A + \text{null}\,A$ .)

Justification.

Indeed,

if $x$ solves (OP1), then any $y \in \mathbb{R}^m$ with $x = By+x_0$ solves (OP2), and
if $y$ solves (OP2), then $x = By + x_0$ solves (OP1)

Example.

Consider the minimization problem

$\begin{cases} \text{minimize} & x^2 + y^2 \\ \text{subject to} &x\geq0\\ & y-x=1 \end{cases}.$

We may eliminate the variable $y$ by simply using $y=x+1$ .

But, to match with above: let

$\begin{aligned} f_0(x,y)=x^2 +y^2, &\quad A = \begin{bmatrix}-1&1\end{bmatrix}, \quad b = 1 \\ x_0 = \begin{bmatrix}0\\1\end{bmatrix},&\quad B = \begin{bmatrix} 1\\1\end{bmatrix} \end{aligned}$ .

Thus

$A \begin{bmatrix}x\\y\end{bmatrix}=b \iff \begin{bmatrix}x\\y \end{bmatrix} = Bt + x_0 = \begin{bmatrix}t\\t+1\end{bmatrix}$

for some $t \in \mathbb{R}$ , and so

$f_0(Bt + x_0) = f_0(t,t+1) = t^2 + (1+t)^2$ .

Therefore, the minimization problem becomes

$\begin{cases} \text{minimize} & t^2 + (t+1)^2 \\ \text{subject to} &t\geq0\\ \end{cases},$

which has the obvious solution $t^\star = 0$ with optimal value $p^\star = 1$ .
Thus, the original problem has solution

$\begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}t\\t+1\end{bmatrix} = \begin{bmatrix}0\\1\end{bmatrix}$ .

Slack Variables

Given $f:\mathbb{R}^n \to \mathbb{R}$ with $f(x) \leq 0$ , then there is a variable $s \geq 0$ such that $f(x) + s = 0$ ; such a variable $s$ is called a slack variable.

Using slack variables $s_i,i=1\ldots,m$ , the problem

$\text{(OP1)}\begin{cases} \text{minimize} & f_0(x)\\ \text{subject to}& f_i(x) \leq 0, i=1,\ldots,m\\ & h_i(x) = 0, i = 1,\ldots,p \end{cases}.$

is equivalent to the problem

$\text{(OP2)}\begin{cases} \text{minimize} & f_0(x)\\ \text{subject to} & s_i \geq 0, i = 1,\ldots,m\\ & f_i(x)+s_i = 0, i=1,\ldots,m\\ & h_i(x) = 0, i=1,\ldots,p \end{cases}.$

Remarks.

All of the $x$ which may satisfy the constraints of (OP2) are the same as those which satisfy the constraints of (OP1); this justifies the equivalence.
Let $F_1$ be the feasible set of (OP1) and $F_2$ that of (OP2).
Then $F_1 \subset \mathbb{R}^n$ and $F_2 \subset \mathbb{R}^{n+m}$ ; i.e., the feasible sets are not the same object.
Example: in the images below, the disk depicts a feasible set $F_1 = \{ x^2+y^2-1 \leq 0 \} \subset \mathbb{R}^2$ and the paraboloid-type set depicts the feasible set $F_2 = \{x^2+y^2-1+s =0, s \geq 0 \}$ with slack variable $s$ .
N.B.: the permissible $(x,y)$ coordinates are the same for both sets.

Main point:

Solving the system of equations

$\begin{aligned} f_i(x)+s_i &= 0, i=1,\ldots,m\\ h_i(x) &= 0, i=1,\ldots,p \end{aligned}$

and considering only those solutions with $s_i \geq 0$ may be easier than solving the system of inequalities

$\begin{aligned} f_i(x)&\leq0, i=1,\ldots,m\\ h_i(x) &= 0, i=1,\ldots,p \end{aligned}.$

Example.

Consider

$\text{(OP1)} \begin{cases} \text{minimize} & f_0(x,y)\\ \text{subject to} & a_1 x + b_1 y - c_1 \leq 0\\ & a_2 x + b_2 y - c_2 = 0 \end{cases}.$

Introduce slack variable $s\geq0$ satisfying

$a_1 x + b_1 y - c_1 +s = 0$ .

Then (OP1) is equivalent to the problem

$\text{(OP2)} \begin{cases} \text{minimize} & f_0(x,y)\\ \text{subject to} & s\geq0\\ & a_1 x + b_1 y - c_1 + s = 0\\ & a_2 x + b_2 y - c_2 = 0 \end{cases}.$

Thus, finding feasible $(x,y,s)$ is just a matter of solving a system of equations and choosing those $(x,y,s)$ with $s\geq0$ .

Moreover, one can solve the problem

$\text{(OP3)} \begin{cases} \text{minimize} & f_0(x,y)\\ \text{subject to} & a_1 x + b_1 y - c_1 + s = 0\\ & a_2 x + b_2 y - c_2 = 0 \end{cases}.$

and just choose solutions with $s \geq 0$ to obtain solutions to (OP2), and hence (OP1).

Epigraph Form

Recall: $\text{epi}\, f = \{(x,t): x \in \text{dom}\,f, t \geq f(x) \}$ .

The optimization problem

$\text{(OP1)} \begin{cases} \text{minimize} & f_0(x)\\ \text{subject to}& f_i(x) \leq 0, i=1,\ldots,m\\ & h_i(x) = 0, i=1,\ldots,p \end{cases}$

is equivalent to its epigraph form

$\text{(OP2)} \begin{cases} \text{minimize} & t\\ \text{subject to} &f_0(x) - t \leq0\\ & f_i(x) \leq 0, i=1,\ldots,m\\ & h_i(x) = 0, i=1,\ldots,p \end{cases}$

Viz., minimizing $f_0$ subject to constraints is equivalent to finding the smallest $t$ such that $(x,t) \in \text{epi}\, f$ for some feasible $x$ .

Proof by picture.

The dark curve and shaded region below indicate the epigraph of a function $f$ .
The red dot indicates the minimum point $(x^\star,p^\star)$ .
The black dots indicate points $(x^\star,t) \in \text{epi}\,f$ for different values of $t$ .
Evidently, the smallest $t^\star$ for which $(x^\star,t^\star) \in \text{epi}\, f$ is given by $t^\star = p^\star$ .

Fragmenting a Problem

Proposition. Given $f: \mathbb{R}^n \to \mathbb{R}$ and sets $F,F_1,\ldots,F_q$ with $F = F_1 \cup \cdots \cup F_q,$ let
$\begin{aligned} p^\star &= \inf \{f(x):x \in F\}\\ p_i^\star &= \inf\{ f(x): x \in F_i\}, \quad i = 1,\ldots, q. \end{aligned}$
Then
$p^\star = \min\{p_i^\star: i = 1,\ldots,q\}$ .
(Assuming $\min\{-\infty,a\} = -\infty$ for any real number $a$ .)

Viz., to minimize a function on a set $F$ , one may instead minimize $f$ over pieces of $F$ and then take the minimum optimal value of this procedure.

Example.

Consider the (OP)

$\text{(OP)} \begin{cases} \text{minimize} & f_0(x)\\ \text{subject to} & x \in F \end{cases}$

where $F \subset \text{dom}\, f$ and where the feasible set $F$ is depicted below.

Consider breaking up $F$ into three regions $F1,F2,F3$ as indicated below.

Now formulate the (OP)’s

$\text{(OPi)} \begin{cases} \text{minimize} & f_0(x)\\ \text{subject to} & x \in Fi \end{cases}$

for $i = 1,2,3$ , and let $p_i^\star$ be the optimal value for (OPi).
Using the preceding proposition, the optimal value of (OP) is given by

$p^\star = \min \{ p_1^\star, p_2^\star, p_3^\star \}$ .

Conclusion: solving (OP), whose feasible set is not convex, may be achieved by solving three subproblems (OP1),(OP2),(OP3) whose feasible sets are convex.

Basics of Convex Optimization

Convex Optimization Problems

Abstract convex optimization problem: A problem involving minimizing a convex objective function on a convex set.
Convex optimization problem: a problem of the form

$\text{(COP)} \begin{cases} \text{minimize} & f_0(x)\\ \text{subject to} & f_i(x) \leq 0, i = 1,\ldots,m\\ & a_i^T x = b_i, i =1,\ldots, p \end{cases},$

where

: $f_0:\mathbb{R}^n \to \mathbb{R}$ and $f_i:\mathbb{R}^n \to \mathbb{R}$ are convex; and; $a_i \in \mathbb{R}^n$ and $b_i \in \mathbb{R}$ are fixed.

Some Remarks

Remark 1.

As defined, a (COP) is an (OP) in standard form; naturally, there are nonstandard form (OP)’s equivalent to (COP)’s.
E.g., the abstract (COP)

$\begin{cases} \text{minimize} & f_0(x,y) \\ \text{subject to} & (x+y+1)^2 = 0 \end{cases}$

is readily seen to be equivalent to the standard form (COP)

$\begin{cases} \text{minimize} & f_0(x,y) \\ \text{subject to}& \begin{bmatrix}1&1\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix} + 1 = 0 \end{cases}$ .

Remark 2.

We emphasize: the equality constraints are assumed to be affine constraints.
Moreover, the equality constraints

$a_i^T x = b_i , \quad i=1,\ldots,p$

can be rewritten as

$A x = b$ ,

where

$\begin{aligned} A &= \begin{bmatrix} a_1^T\\a_2^T\\\vdots\\a_p^T\end{bmatrix} \in \mathbb{R}^{p \times n}, \quad b = \begin{bmatrix}b_1\\b_2\\\vdots\\b_p\end{bmatrix} \in \mathbb{R}^{p} \end{aligned}$ .

Remark 3.

The affine assumption on the equality constraints can be lifted at the possible expense of an intractable theory/numerical analysis.
E.g., if $h: \mathbb{R}^n \to \mathbb{R}$ is quasilinear, then $h(x) = 0$ defines a convex set.

Remark 4.

Generally, $h(x)$ convex does not imply the level set $h(x) = 0$ is convex; e.g., $h(x,y)=x^2+y^2-1$ gives a sphere.

Remark 5.

The common domain

$D = \bigcap_{i=0}^m \text{dom}\, f_i$

is convex since it is an intersection of convex sets.

Optimality for Convex Optimization Problems

Assume throughout that $f_0:\mathbb{R}^n \to \mathbb{R}$ is the objective function for some given (COP) and that $F$ is the feasible set.

Proposition 1. If $x^\star$ is a feasible local minimizer for a (COP), then it is the global minimizer for the (COP).

Proof.

We will follow a proof by contradiction; i.e., we will show that assuming $x^\star$ is not a global minimizer leads to a contradiction.

Step 1.

$x^\star$ being a feasible local minimizer means $x^\star \in F$ and that there is a $R>0$ such that

$f_0(x^\star) = \inf\{ f_0(z) : z \in F, \quad \Vert x-x^\star \Vert_2 \leq R \} ;$

i.e., $f_0(x^\star) \leq f_0(z)$ for all $z \in F$ with a distance at most $R$ of $x^\star$ .

Step 2.

Supposing $x^\star$ is not a global minimizer, then there exists $y \in F$ such that $f_0(y) < f_0(x^\star)$ .
By choice of $R$ , there must also hold $\Vert{y-x^\star}\Vert_2 > R$ .

Step 3.

Set

$\begin{aligned} z &= (1-t)x^\star + ty\\ t &= \frac{R}{2\Vert y-x^\star \Vert_2}, \end{aligned}$

noting that $t \in [0,1]$ by Step 2. and so $z \in F$ since $F$ is convex.
It follows that

$\begin{aligned} \Vert z - x^\star \Vert_2 &= \Vert (1-t)x^\star + ty - x^\star \Vert_2\\ &= \Vert t(y-x^\star ) \Vert_2\\ &= \frac{R}{2\Vert y-x^\star\Vert_2} \Vert y-x^\star\Vert_2\\ &= \frac{R}{2}\\ &< R \end{aligned}$

Step 4.

Since $z$ is a convex combination of feasible points, since $f_0$ is convex and since $f_0(y)<f_0(x^\star)$ , there holds

$\begin{aligned} f_0(z) &= f_0((1-t)x^\star + ty) \\ &\leq (1-t)f_0(x^\star) + t f_0(y)\\ & < (1-t) f_0(x^\star) + t f_0(x^\star)\\ &= f_0(x^\star). \end{aligned}$

But, since $x^\star$ minimizes $f_0$ on

$\{x \in F : \Vert x - x^\star \Vert_2 \leq R \}$

and since

$\Vert z - x^\star \Vert_2 \leq R$

we also have

$f_0(x^\star) \leq f_0(z)$ .

This is a contradiction and so $x^\star$ must be a global minimizer.

Proposition 2. If $f_0$ is differentiable on $F$ , then $x^\star \in F$ is a minimizer iff for all $y \in F$ there holds
$\nabla f_0(x^\star)^T (y - x^\star) \geq 0$ .

Proof.

Step 0.

N.B.: since $f_0$ is differentiable and convex on $F$ , then for each $x,y \in F$ there holds

$f_0(y) \geq f_0(x) + \nabla f_0(x)^T (y-x)$ .

(C.f., Convex Function Theory.First Order Characterization.)

Step 1.( $\implies$ )

Suppose $x^\star$ is a minimizer and suppose for contradiction that

$\nabla f_0(x^\star)^T(y-x^\star)<0$

for some $y \in F$ .
Set $z_t = ty+(1-t)x^\star$ , noting that $z_t \in F$ since $F$ is convex.
Using

$\frac{d}{dt} f_0(z_t)|_{t=0} = \nabla f_0(x^\star)^T (y-x^\star) < 0$ ,

we conclude $f_0$ is decreasing near $z_0 = x^\star$ in the direction $y-x^\star$ and so $f_0(z_t)< f_0(x^\star)$ for small $t$ .
Since $z_t \in F$ , this contradicts $x^\star$ being a minimizer.

Additional justification

Since $z_t$ defines a line passing through $x^\star$ with direction $y-x^\star$ , it follows that $\frac{d}{dt} f_0(z_t)|_{t=0}$ is the directional derivative in direction $(y-x^\star)$ , i.e., $\nabla f_0(x^\star)^T(y-x^\star)$ .

Step 2.( $\impliedby$ )

Supposing

$\nabla f_0(x^\star)(y-x^\star) \geq 0$

for all $y \in F$ and using the first order characterization at $x^\star$ , namely,

$f_0(y) \geq f_0(x^\star) + \nabla f_0(x^\star)^T(y-x^\star)$

we readily conclude

$f_0(y) \geq f_0(x^\star)$

for all $y \in F$ ; i.e., that $x^\star$ is a minimizer for the problem.

Corollary In case $f_0$ is differentiable and $F = \text{dom}\,f_0$ (equivalently, there are no nontrivial constraints), $x^\star \in \text{dom}\,f_0$ is a minimizer iff
$\nabla f_0(x^\star) = 0$ .

Proof.

By Proposition 2., we have that $x^\star \in \text{dom}\,f_0$ is a minimizer iff

$\nabla f_0(x^\star)^T(y-x^\star) \geq 0$

for all $y \in \text{dom}\, f_0$ .
Differentiability of $f_0$ requires $\text{dom}\,f_0$ is open and so, for small $t \in \mathbb{R}$ , there holds

$y: = x^\star - t \nabla f_0(x^\star) \in \text{dom}\,f_0$ .

But then

$\begin{aligned} \nabla f_0(x^\star)^T(y-x^\star) &= \nabla f_0(x^\star)^T(x^\star - t \nabla f_0(x^\star) - x^\star)\\ &= -t\nabla f_0(x^\star)^T\nabla f_0(x^\star)\\ &= -t \Vert \nabla f_0(x^\star) \Vert_2^2\\ &\geq 0, \end{aligned}$

which is only possible for $t>0$ iff $\nabla f_0(x^\star) = 0$ .

Some Examples

Example 1.

Let

$\begin{aligned} Q \in &\boldsymbol{S}_+^n, \quad a \in \mathbb{R}^n\quad b \in \mathbb{R}\\ f_0(x)&= \frac{1}{2} x^T Q x + a^T x + b. \end{aligned}$

Consider the unconstrained problem:

$\text{(OP)} \begin{cases} \text{minimize} & f_0(x) = \frac{1}{2} x^T Q x + a^T x + b\\ \text{subject to }& x \in \mathbb{R}^n \end{cases}$ .

Note that $\nabla^2 f_0 = Q \succeq 0$ implies $f_0$ is convex.
C.f.,Convex Function Theory.Second Order Characterization. By the preceding corollary, we have $x^\star$ is a solution to (OP) iff

$\nabla f_0(x^\star) = Qx^\star + a = 0$ .

Thus solvability of (OP) rests on whether $-a \in \text{range}\, Q$ .

Three cases:

$-a \notin \text{range}\,Q$ $\implies$ $f_0$ is unbounded below and hence (OP) is unsolvable;
$Q \succ 0$ $\implies$ $Q$ is invertible and so $x^\star = -Q^{-1}a$ is the unique solution to (OP);
$Q \not\succ 0$ and $-a \in \text{range}\,Q$ $\implies$ $Qx^\star = -a$ has an affine set of solutions.

Example 2.

Let

$\begin{aligned} f_0:\mathbb{R}^n \to \mathbb{R}& \text{ be convex and differentiable}\\ A &\in \mathbb{R}^{m \times n}, \quad b \in \mathbb{R}^{m} \end{aligned}$

and consider the problem

$\text{(OP)} \begin{cases} \text{minimize} & f_0(x)\\ \text{subject to} & Ax = b \end{cases}$ .

Using a preceding proposition, $x^\star$ satisfying $Ax^\star = b$ is a minimizer iff

$\nabla f_0(x^\star)^T(y-x^\star) \geq0$

for all $y$ satisfying $Ay= b$ .

Two cases

$Ax=b$ is an inconsistent system $\implies$ the problem is infeasible.
$Ax=b$ is a consistent system; then
$Ax^\star = b$ , $Ay=b$

$\iff$

$y = x^\star + x$ for some $x \in \text{null}A$ .

In case 2., we have

$\nabla f_0(x^\star)^T(y-x^\star) = \nabla f_0(x^\star)^Tx \geq 0$

for all

$x \in \text{null}A$ and $y = x^\star + x$ .

Since $\text{null}A$ is a linear space, this is only possible iff

$\nabla f_0(x^\star)^Tx = 0$ for all $x \in \text{null}\,A$ ,

i.e., iff

$\nabla f_0(x^\star) \perp \text{null}A$ .

But $\text{null}A = \text{range}A^T$ and so this condition means there exists $\nu \in \mathbb{R}^n$ such that

$\nabla f_0(x^\star) + A^T \nu = 0$ .

This is just a Lagrange multiplier condition, as we will see later.

Linear Programming

Linear program: a (COP) of the form

$\text{(LP)} \begin{cases} \text{miminize} & c^T x \\ \text{subject to} & Gx \preceq h\\ & Ax= b \end{cases}$ ,

where

$\begin{aligned} &G \in \mathbb{R}^{m \times n}, \quad A \in \mathbb{R}^{p\times n}\\ &h \in \mathbb{R}^m, \quad b \in \mathbb{R}^p, \quad x \in \mathbb{R}^n. \end{aligned}$

The feasible set $F$ is a polyhedron (see below).

Recall ( $\preceq$ ):

For $a,b \in \mathbb{R}^n$ the vector inequality

$a \preceq b$

means

$a_1 \leq b_1, \, a_2 \leq b_2, \, \ldots, \, a_n \leq b_n$ .

Different than: $A,B \in \mathbb{R}^{n \times n}$ satisfying the matrix inequality $A \preceq B$ , which means $B - A$ is positive semidefinite.

Determining the Feasible set:

Step 1. ( $A x = b$ )

Given $A \in \mathbb{R}^{p \times n}$ , $b \in \mathbb{R}^p$ , then

$\{x: Ax=b \}$

is an affine subspace of $\mathbb{R}^n$ or empty.

Step 2.

Given $\gamma \in \mathbb{R}^n, \eta \in \mathbb{R}$ , then

$\{x:\gamma^Tx \leq \eta \}$

is a half space in $\mathbb{R}^n$ .

Step 3. ( $G x \preceq h$ )

Given

$g_i \in \mathbb{R}^n, \quad G = \begin{bmatrix} g_1^T \\ g_2^T \\ \vdots \\ g_m^T \end{bmatrix} \in \mathbb{R}^{m\times n}, \quad Gx = \begin{bmatrix} g_1^Tx\\g_2^Tx\\ \vdots \\ g_m^Tx \end{bmatrix}$

Step 2. implies

$\{x : G x \preceq h \} = \{x : g_i^Tx\leq h_i, i = 1,\ldots,m\}$

is a finite intersection of half spaces.

Step 4.

Steps 1. and 3. imply the feasible $F$ to (LP) is the finite intersection of half spaces and an affine space, i.e., $F$ is a polyhedron.
(c.f. Convex Geometry.Polyhedra.)

Example.

Let

$\begin{aligned} c&=\begin{bmatrix}0.1\\1\end{bmatrix},\, G= \begin{bmatrix} 0.8&0.8\\ -1&-1\\ 0&-1\\ 0&1 \end{bmatrix},\, h= \begin{bmatrix} 4\\-3\\-1\\3 \end{bmatrix} \end{aligned}.$

Consider the resulting (LP):

$\text{(LP)} \begin{cases} \text{miminize} & c^T x \\ \text{subject to} & Gx\preceq h \end{cases}.$

Explicitly, this (LP) is given by

$\text{(LP)} \begin{cases} \text{miminize} & 0.1x_1 + x_2\\ \text{subject to} & 0.8x_1 + 0.8x_2 \leq 4\\ & -x_1-x_2 \leq -3\\ &-x_2\leq -1\\ &x_2\leq3 \end{cases}.$

The feasible set $F$ and the graph of the objective function over $F$ are indicated in the image below.

Remarks.

Given $d \in \mathbb{R}$ , have equivalent problem with objective $c^Tx + d$ .
Indeed,
$\begin{aligned} \min\{c^Tx + d :x \in F\} &= \min\{ c^Tx: x \in F \} + d\\ \text{argmin}\{c^Tx + d :x \in F\} &=\text{argmin}\{ c^Tx: x \in F \}. \end{aligned}$
WLOG: can assume $d=0$ to solve problem.

Since
$\begin{aligned} \max\{c^Tx :x \in F\} &= -\min\{ -c^Tx: x \in F \}\\ \text{argmax}\{c^Tx :x \in F\} &=\text{argmin}\{ -c^Tx: x \in F \}. \end{aligned}$
one also calls the problem of maximizing $c^Tx$ over a polyhedron a (LP).

Example: Integer Linear Programming Relaxation.

Integer Linear Program: an (OP) of the form

$\text{(ILP)} \begin{cases} \text{minimize} & c^T x\\ \text{subject to}& Gx \preceq h\\ & x \in \mathbb{Z}^n \end{cases}$

where

$\begin{aligned} &G \in \mathbb{R}^{m \times n},\qquad h \in \mathbb{R}^m\\ \mathbb{Z}^n : &= \{x \in \mathbb{R}^n : x_i \text{ an integer for each } i=1,\ldots,n\} \end{aligned}$

The constraint $x \in \mathbb{Z}^n$ is suitable for parameters which take on discrete quantities.
N.B.: An (ILP) is not a convex problem, but may be approximated by one (see below).

The feasible set

Let

$F = \{x \in \mathbb{Z}^n: Gx \preceq h \}$

denote the feasible set of an (ILP).
Then $F$ is just the collection of integer vectors in the polyhedron

$P = \{ x \in \mathbb{R}^n : Gx \preceq h \}$ .

Example. Consider an (ILP) with constraints given by $(x,y) \in \mathbb{Z}^2$ satisfying

$\begin{aligned} &-x\leq0\\ &-y\leq0\\ &y-2x\leq1\\ &y\leq2.5\\ &y+0.9x\leq4 \end{aligned}.$

The image below depicts the feasible set $F$ (the collection of dots) and polyhedron $P$ (shaded region).

Remarks.

Can of course also impose equality constraints $Ax=b$ :
$\begin{cases} \text{minimize} & c^T x\\ \text{subject to}& Gx \preceq h\\ & Ax = b\\ & x \in \mathbb{Z}^n\\ \end{cases}$
If we impose $x_i \in \{0,1\}$ , then the problem is called a boolean linear program:
$\text{(BLP)} \begin{cases} \text{minimize} & c^T x\\ \text{subject to}& Gx \preceq h\\ & x_i \in \{0,1\} \end{cases}$
Suitable for when coordinates of $x$ indicate when something is “off” or “on” or decision is “no” or “yes”.
Can also use $x_i \in \{-1,1\}$ instead.

Relaxation of (ILP)

The LP

$\text{(LP)} \begin{cases} \text{minimize} & c^T x\\ \text{subject to}& Gx \preceq h \end{cases}$

is called a relaxation of the (ILP) and is a convex approximation.
Important points:

The “tightest” convex relaxation is given by
$F=\{ x \in \mathbb{Z}^n : Gx \preceq h \} \to \text{convex hull of } F.$
Generally speaking, finding the convex hull is not an efficient way of approaching (ILPs).
The relaxation (LP) of (ILP) is generally easier to solve, though exact algorithms exist for (ILP).
If
$\begin{aligned} p_{LP}^\star &= \text{ optimal value for (LP)}\\ p_{ILP}^\star &= \text{ optimal value for (ILP)} \end{aligned}$
then $p_{LP}^\star \leq p_{ILP}^\star$ .
(Indeed the relaxation has larger feasible set.)
If $x^\star \in \mathbb{Z}^n$ solves the (LP), then it solves the (ILP).

Relaxation of (BLP)

Explicitly, a (BLP) is of the form

$\text{(BLP)} \begin{cases} \text{minimize} & c^T x\\ \text{subject to}& Gx \preceq h\\ &x_i \in \{0,1\}, i=1,\ldots,n \end{cases}.$

The (LP)

$\text{(LP)} \begin{cases} \text{minimize} & c^T x\\ \text{subject to}& Gx \preceq h\\ &0 \leq x_i \leq 1, i=1,\ldots,n \end{cases}$

is called a relaxation of the (BLP).
N.B.: the relaxation

$F \to \{x \in [0,1]^n: Gx \preceq h \}$

generally provides a better approximation than the relaxation

$F \to \{x \in \mathbb{R}^n: Gx \preceq h \}$ .

Indeed, the former biases approximate solutions to be close to being binary.

Example.

Problem Given $m$ workers and $n$ locations with $m \leq n$ ,

assign each worker to work at some location
assign at most one worker to a location
minimize cost of operation and transportation

Notational set up

$\begin{aligned} c_j &= \text{cost to operate at location }j\\ c_{ij} &= \text{cost to transport worker } i \text{ to location }j\\ x_j &= \begin{cases} 0 & \text{ if location }j\text{ is not operating}\\ 1 & \text{ if location }j\text{ is operating} \end{cases}\\ x_{ij} &= \begin{cases} 0 & \text{ if worker }i \text{ is not working at location }j\\ 1 & \text{ if worker }i \text{ is working at location }j \end{cases}. \end{aligned}$

Let

$\begin{aligned} \text{Cost vectors} & \begin{cases} c &= (c_j) \in \mathbb{R}^n\\ C &= (c_{ij}) \in \mathbb{R}^{m \times n} \end{cases}\\ \text{Optimization variables} &\begin{cases} x &= (x_j) \in \{0,1\}^n\\ X &= (x_{ij}) \in \{0,1\}^{m \times n} \end{cases} \end{aligned}.$

Construct objective function
We find

$\begin{aligned} \text{total operational cost }&= c^Tx = c_1x_1 + \cdots +c_nx_n\\ \text{total transportation cost }&= \text{tr}\,(C^TX) = \sum_{i=1}^m \sum_{j=1}^n c_{ij}x_{ij}\\ \text{total cost }&= c^Tx + \text{tr}\,(C^TX). \end{aligned}$

Thus the objective function is $f_0(x,X) = c^Tx + \text{tr}\,(C^TX)$ .

Construct constraints
The constraint that $x_i,x_{ij}$ are binary is of course natural for the problem.
Since each worker is selected only once, we have

$\sum_{j=1}^n x_{ij} = 1, \quad \text{ for each } i=1,\ldots,m$ .

Lastly, observe $x_j=0 \implies x_{ij}=0$ since the $j$ th location not operating means it cannot host a worker; thus we have $x_{ij} \leq x_j$ .

Formulate Problem
Putting everything together, the (BLP) formulation of the problem is

$\begin{cases} \text{minimize} & c^Tx + \text{tr}\,(C^TX)\\ \text{subject to} & x \in \{0,1\}^n\\ &X \in \{0,1\}^{m \times n}\\ &\sum_{j=1}^n x_{ij} =1, i = 1,\ldots,m\\ & x_{ij} \leq x_j, i=1,\ldots,m,j=1,\ldots,n \end{cases}.$

An (LP) relaxation (and hence convex approximation) of this (BLP) is

$\begin{cases} \text{minimize} & c^Tx + \text{tr}\,(C^TX)\\ \text{subject to} & x \in [0,1]^n\\ &X \in [0,1]^{m \times n}\\ &\sum_{j=1}^n x_{ij} =1, i = 1,\ldots,m\\ & x_{ij} \leq x_j, i=1,\ldots,m,j=1,\ldots,n \end{cases}.$

Quadratic Programming

Quadratic program: an (OP) of the form

$\text{(QP)} \begin{cases} \text{minimize}&\frac{1}{2}x^TQx + q^Tx\\ \text{subject to} &Gx \preceq h\\ &Ax =b \end{cases}$

where

$\begin{aligned} &Q \in \boldsymbol{S}^n, \quad G \in \mathbb{R}^{m \times n}, \quad A \in \mathbb{R}^{p \times n}\\ &q \in \mathbb{R}^n, \quad h \in \mathbb{R}^m,\quad b \in \mathbb{R}^p. \end{aligned}$

If $Q \in \boldsymbol{S}_+^n$ , then the problem is convex (see remarks).

Remarks.

As for (LP), the constraints
$\begin{cases} Gx \preceq h\\ Ax =b \end{cases}$
describe a polyhedron.
Given $d \in \mathbb{R}$ , then
$\begin{aligned} \min\{\frac{1}{2}x^TQx + q^Tx + d :x \in F\} &= \min\{ \frac{1}{2}x^TQx + q^Tx: x \in F \} + d\\ \text{argmin}\{\frac{1}{2}x^TQx + q^Tx + d :x \in F\} &= \text{argmin}\{ \frac{1}{2}x^TQx + q^Tx: x \in F \} \\ \end{aligned}$
WLOG: can assume $d=0$ to solve problem.
If
$f_0(x) = \frac{1}{2}x^TQx + q^Tx$ ,
then
$\begin{aligned} \nabla f_0(x) &= Qx + q\\ \nabla^2 f_0(x) &= Q. \end{aligned}$
Thus $Q \succeq 0$ implies $f_0$ is convex.
N.B.: The factor $\frac{1}{2}$ is just a convenient normalization.
The generalization
$\begin{aligned} &Gx \preceq h \, \to \, \frac{1}{2} x^T Q_ix+g_i^Tx \leq h_i \\ &Q_i \in \mathbb{R}^{n\times n}, g_i \in \mathbb{R}^n, h_i \in \mathbb{R}, \quad i =1,\ldots,m \\ \end{aligned}$
results in quadratically constrained quadratic programming (QCQP).
Imposing $Q,Q_i \in \boldsymbol{S}_+^n$ ensures the (QCQP) is convex.
(C.f., Convex Function Theory.Level Sets.)
The generalization
$\begin{aligned} &Ax=b \, \to \, \frac{1}{2} x^T P_ix+a_i^Tx =b_i \\ &P_i \in \mathbb{R}^{n\times n}, a_i \in \mathbb{R}^n, b_i \in \mathbb{R}, \quad i =1,\ldots,m \\ \end{aligned}$
also results in a (QCQP), but this can break convexity.
E.g., The quadratic constraints
$x_i(x_i-1)=0 \text{ or } x_i^2 = 1$
result in a (BLP) since these constraints enforce $x_i \in \{0,1\}$ or $x_i \in \{-1,1\}$ , respectively.
There holds
$\begin{aligned} \text{(QCQP)} + Q_i = P_i= 0 &\iff (QP)\\ (QP) + Q=0 &\iff (LP). \end{aligned}$
The assumption $Q \in \boldsymbol{S}^n$ (i.e., $Q = Q^T$ ) is not a serious restriction.
Indeed, first note that, since $x^TQx$ is a scalar, we have
$x^TQx = (x^TQx)^T = x^TQ^Tx$ .

Let
$f(x) = \frac{1}{2}x^TQx + q^T x$ .

Thus,
$\begin{aligned} f(x) &= \frac{f(x) + f(x)}{2}\\ &= \frac{1}{2}\left( \frac{1}{2}x^T Q x + q^Tx + \frac{1}{2}x^T Q^T x + q^Tx \right)\\ &=\frac{1}{2}\left( \frac{1}{2}x^T(Q+Q^T)x + 2 q^Tx \right)\\ &=:\frac{1}{2}x^T \tilde{Q} x + q^T x, \end{aligned}$
where
$\tilde{Q} = \frac{1}{2}(Q+Q^T)$ .

Lastly, observe the symmetry of $\tilde{Q}$ :
$\tilde{Q}^T = \left( \frac{1}{2}(Q+Q^T) \right)^T = \frac{1}{2}(Q^T+Q) = \tilde{Q}$ .

Thus every quadratic
$f(x) = \frac{1}{2}x^TQx + q^Tx$
has a “symmetric representation” of the form
$f(x) = \frac{1}{2}x^T\tilde{Q}x + q^Tx$ with $\tilde{Q} \in \boldsymbol{S}^n$ .

Example: Least Squares

Least squares: an unconstrained (QP) of the form

$\text{(LS)} \begin{cases} \text{minimize}&\Vert Ax-b\Vert_2^2= x^TA^TAx-2b^TAx + b^Tb \end{cases}$

where $A \in \mathbb{R}^{m \times n}$ and $b \in \mathbb{R}^m$ .
Features:

WLOG: may assume columns of $A$ are linearly independent, and so
$m \geq n$ .
N.B.: a least norm solution to (LS) is generally given by
$x^\star = A^{\dagger}b$ ,
where $A^\dagger$ is the pseudo-inverse (aka Moore-Penrose inverse) of $A$ .
Under the WLOG assumption, there holds
$x^\star = (A^TA)^{-1}A^Tb$ .

Recall (definition of $\Vert \cdot \Vert_2$ )

For $x \in \mathbb{R}^n$ , the notation $\Vert x \Vert_2$ means the vector norm

$\Vert x \Vert_2 := \sqrt{x_1^2 + \cdots + x_n^2 }$ .

Thus

$\Vert x- y \Vert_2$

is the distance between the two vectors $x,y \in \mathbb{R}^n$ .

Rough Justification of WLOG

Let

$\begin{aligned} A &= \begin{bmatrix} a_1 & a_2 & a_3 \end{bmatrix} \in \mathbb{R}^{2 \times 3}, \quad a_1,a_2,a_3 \in \mathbb{R}^2 \\ x &= \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}, \quad b = \begin{bmatrix} b_1 \\ b_2 \end{bmatrix}. \end{aligned}$

N.B.: a set of three 2-dimensional vectors is always linearly dependent and so

$a_3 = c_1 a_1 + c_2 a_2$

for some $c_1,c_2 \in \mathbb{R}$
Therefore, we may write

$A=: \begin{bmatrix} a & \alpha & u \\ b & \beta & v \end{bmatrix} = \begin{bmatrix} a & \alpha & c_1 a + c_2 \alpha\\ b & \beta & c_1 b + c_2 \beta \end{bmatrix}$ .

Computing

$\begin{aligned} Ax &= \begin{bmatrix} a & \alpha & c_1 a + c_2 \alpha\\ b & \beta & c_1 b + c_2 \beta \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}\\ &= \begin{bmatrix} ax_1 + \alpha x_2 + c_1 a x_3 + c_2 \alpha x_3\\ bx_1 + \beta x_2 + c_1 b x_3 + c_2 \beta x_3 \end{bmatrix}\\ &= \begin{bmatrix} a(x_1 + c_1 x_3) + \alpha (x_2 + c_2 x_3)\\ b(x_1 + c_1 x_3) + \beta(x_2 + c_2 x_3) \end{bmatrix} \end{aligned}$

we see that minimizing

$\Vert Ax-b \Vert_2^2 = \bigg\Vert \begin{bmatrix} a(x_1 + c_1 x_3) + \alpha (x_2 + c_2 x_3)\\ b(x_1 + c_1 x_3) + \beta(x_2 + c_2 x_3) \end{bmatrix} - \begin{bmatrix} b_1\\b_2 \end{bmatrix} \bigg\Vert_2^2$

is equivalent to minimizing

$\begin{aligned} \Vert \hat{A}y - b \Vert_2^2 &:= \bigg\Vert \begin{bmatrix} a&\alpha\\b&\beta \end{bmatrix} \begin{bmatrix}y_1\\y_2\end{bmatrix}-\begin{bmatrix}b_1\\b_2\end{bmatrix} \bigg\Vert_2^2\\ &= \bigg\Vert \begin{bmatrix} ay_1 + \alpha y_2\\ by_1 + \beta y_2 \end{bmatrix} -\begin{bmatrix}b_1\\b_2\end{bmatrix} \bigg\Vert_2^2 \end{aligned}.$

Equivalence follows from the change of variables

$\begin{aligned} y_1 &= x_1 + c_1 x_3\\ y_2 &= x_2 + c_2 x_3 \end{aligned}$ .

Therefore, a (LS) problem with matrix of size $2 \times 3$ is equivalent to a (LS) problem with matrix of size $2 \times 2$ .

This argument holds in general: if $A$ has linearly dependent columns, can use change of variables to ensure linear independence.

Remarks.

If $Ax=b$ has solution $x^\star$ , then $x^\star$ solves the (LS) problem.
If $Ax=b$ has no solution, then any $x^\star$ solving the (LS) problem gives a “best” estimate solution to $Ax = b$ , where “best” is chosen to mean in terms of the vector norm $\Vert \cdot \Vert_2$ .
While minimizing $\Vert \cdot \Vert_2^2$ is equivalent to minimizing $\Vert \cdot \Vert_2$ , the exponent $2$ ensures $\Vert \cdot \Vert_2^2$ is differentiable (at $0$ ).
(C.f.: $|x|$ is not differentiable at $x=0$ , but $x^2$ is.)

Solving the (LS)

Step 1. (The problem is convex)

The objective function

$f_0(x) = \Vert Ax-b\Vert_2^2 = x^TA^TAx - 2b^TAx+b^Tb$

is convex since

$\nabla^2 f_0 = 2 A^TA \succeq 0$ .

To see $A^TA \succeq 0$ , observe:

$(A^T A)^T = A^T A$ and so $A^TA \in \boldsymbol{S}^n$ , i.e., $A^TA$ is symmetric;
for all $z \in \mathbb{R}^n$ there holds
$z^T A^T A z = (Az)^T Az = \Vert Az \Vert_2^2 \geq 0$ ,
i.e., $A^TA \in \boldsymbol{S}_+^n$ .

Step 2. (The critical points)

We compute

$\begin{aligned} \nabla f_0 &= \nabla (x^T A^TA x) - \nabla(2b^TAx) + \nabla(b^Tb)\\ &= 2 A^TAx - 2A^Tb + 0\\ &= 2(A^TAx - A^Tb). \end{aligned}$

Therefore, $\nabla f_0 = 0$ iff

$A^TAx = A^T b$ .

This system of equations consists of what are called the normal equations.

Step 3. (The solution)

By corollary proved above: $x^\star$ is a solution iff $\nabla f_0(x^\star) = 0$ .
Moreover, $A$ having linearly independent columns ensures that $A^TA$ is invertible:

linear independent columns $\implies$ $Az=0$ iff $z=0$ .
$A^TAz = 0$ $\iff$ $Az=0$ $\iff$ $z = 0$ .
Since $A^TA$ is square, can conclude $A^TA$ is invertible.

Lastly: since

$\nabla f_0(x^\star)= 0$ iff $A^TA x^\star = A^T b$

and since $A^TA$ is invertible, we conclude that the solution to the (LS) is

$x^\star = (A^TA)^{-1}A^Tb$ .

Example: Distance Between Convex Sets

Let $K_1,K_2 \subset \mathbb{R}^n$ be convex subsets.
Nearest Point Problem (NPP): Among $x \in K_1, y \in K_2$ , which pair $(x,y)$ minimizes the distance $\Vert x-y \Vert_2^2$ ?

The (NPP) may be expressed as a standard form (COP).
Indeed, supposing

$\begin{aligned} K_1 &= \{ f_1(x) \leq 0 \}\\ K_2 &= \{ f_2(x) \leq 0 \}, \end{aligned}$

with $f_1,f_2$ convex, then the (NPP) for $K_1,K_2$ is the (COP)

$\begin{cases} \text{minimize} & \Vert x - y \Vert_2^2\\ \text{subject to} & f_1(x) \leq 0 \\ &f_2(y) \leq 0 \end{cases}.$

N.B.: the feasible set is

$\begin{aligned} F&=\{(x,y) \in \mathbb{R}^n \times \mathbb{R}^n : f_1(x) \leq 0, f_2(y) \leq 0 \}\\ &=K_1 \times K_2. \end{aligned}$

Viz.: $(x,y) \in F$ iff $x \in K_1, y \in K_2$ .

When the $K_1,K_2$ are polyhedra, then the (NPP) is a (QP).
Indeed, supposing

$\begin{aligned} K_1 &= \{ G_1 x \preceq h_1 \}\\ K_2 &= \{ G_2 x \preceq h_2 \}, \end{aligned}$

then the (NPP) for $K_1,K_2$ is the (QP) given by

$\begin{cases} \text{minimize} & \Vert x - y \Vert_2^2\\ \text{subject to} & G_1 x \preceq h_1\\ & G_2 y \preceq h_2 \end{cases}.$

Can formulate as constrained least squares problem: defining

$\begin{aligned} B,C \in \mathbb{R}^{n \times n} &\mapsto B \oplus C = \begin{bmatrix} B & \vline & 0\\ \hline 0 & \vline & C \end{bmatrix} \in \mathbb{R}^{2n \times 2n}, \\ v,w \in \mathbb{R}^n &\mapsto v \oplus w = \begin{bmatrix} v_1 & \cdots & v_n & w_1 & \cdots & w_n \end{bmatrix}^T \in \mathbb{R}^{n + n}\\ A &= \begin{bmatrix} Id_{n\times n} &\vline & - Id_{n\times n}\\ \hline 0 &\vline& 0 \end{bmatrix} \in \mathbb{R}^{2n\times 2n} \end{aligned}$

we readily see that the problem is equivalent to

$\begin{cases} \text{minimize} & \Vert A(x\oplus y) \Vert_2^2\\ \text{subject to} & (G_1 \oplus G_2) (x \oplus y) \preceq h_1 \oplus h_2 \end{cases}.$

Polyhedral Approximation.

Let $K_1,K_2$ be nonempty convex domains, not necessarily polyhedral. Let $P_1,P_2,Q_1,Q_2$ be polyhedral domains such that

$\begin{aligned} &P_1 \subset K_1 \subset Q_1\\ &P_2 \subset K_2 \subset Q_2. \end{aligned}$

Then the (NPP) for the pairs $[P_1,P_2]$ and $[Q_1,Q_2]$ provide (QP) relaxations of the (NPP) for $[K_1,K_2]$ and, respectively, provide upper and lower bounds for the optimal values for the original (NPP).

Geometric Programming

Monomial function: given

$c \geq 0,\, a_i \in \mathbb{R},\, i =1,\ldots,n$ ,

a function of the form

$\begin{aligned} f&: \mathbb{R}^n \to \mathbb{R}\\ f(x)&= c x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n}\\ \text{dom}\,f&=\mathbb{R}_{++}^n. \end{aligned}$

Example.

$3x_1^2x_2^{3.4}x_3^{-\pi}$ .

Posynomial: given

$c_k \geq 0,\, a_{ik}\in \mathbb{R},\, i=1,\ldots,n,\, k=1,\ldots,K$ ,

a function of the form

$\begin{aligned} f&: \mathbb{R}^n \to \mathbb{R}\\ f(x)&= \sum_{k=1}^K c_k x_1^{a_{1k}} x_2^{a_{2k}} \cdots x_n^{a_{nk}}\\ \text{dom}\,f&=\mathbb{R}_{++}^n. \end{aligned}$

Example.

$f(x) = \sqrt{2}x_1x_2\sqrt{x_3} + 3 x_1^4x_2^{-4}x_3^{-1.5}+x_5$

Geometric Programming: An (OP) of the form

$\text{(GP)} \begin{cases} \text{minimize}&f_0(x)\\ \text{subject to}&f_i(x) \leq 1, \, i=1,\ldots,m\\ & h_i(x) = 1, \, i=1,\ldots,p \end{cases},$

where

$\begin{aligned} f_0,f_1,\ldots,f_m & \text{ are posynomials}\\ h_1,\ldots,h_p & \text{ are monomials}. \end{aligned}$

N.B.: this is an undercover (COP).

Remarks.

Let
$\begin{aligned} Posy_n &= \text{ set of posynomials on } \mathbb{R}_{++}^n\\ Mon_n &= \text{ set of monomials on } \mathbb{R}_{++}^n. \end{aligned}$
Since any monomial is a posynomial, we have $Mon_n \subset Posy_n$ .
If
$\begin{aligned} \lambda \geq 0\\ p(x),q(x) &\in Posy_n\\ f(x),g(x) &\in Mon_n , \end{aligned}$
then
$\begin{aligned} \lambda p(x),\, p(x) + q(x),\, p(x) q(x),\, \frac{p(x)}{f(x)} \in Posy_n\\ \lambda f(x),\, f(x)g(x),\, \frac{f(x)}{g(x)} \in Mon_n. \end{aligned}$
Thus $Mon_n$ is a group and $Posy_n$ a “conic representation” of $Mon_n$ .
As usual, we use the language “writing in standard form” to refer to writing an equivalent (OP) written in the form (GP) above.

General (OPs) clearly equivalent to a (GP) may be called a geometric program in nonstandard form.

For example, the geometric program
$\begin{cases} \text{maximize}&f_0(x)\\ \text{subject to} & f_i(x) \leq g_i(x)\\ & h_i(x) = k_i(x) \end{cases}$
with
$\begin{aligned} f_i &\text{ are posynomials}\\ f_0,h_i,g_i\not\equiv 0 ,k_i\not\equiv0 & \text{ are monomials} \end{aligned}$
is readily rewritten as a standard form (GP):
$\begin{cases} \text{minimize}&\frac{1}{f_0(x)}\\ \text{subject to} & \frac{f_i(x)}{g_i(x)} \leq 1\\ & \frac{h_i(x)}{k_i(x)} = 1 \end{cases}.$

Rewriting (GP) as a (COP)

General (GPs) are not convex (e.g., $f_0(x) = \sqrt{x}$ ).
However, any (GP) is easily recast as a (COP) via change of variable.

Step 1. (The change of variable)

We will write $x \mapsto y$ to mean the change of variable given by

$x_i = e^{y_i}.$

Step 2. (Monomials $\to$ convex function)

Let

$\begin{matrix} c>0, \quad b = \log c,& f(x) \in Mon_n,\\ a = \begin{bmatrix} a_1 & \cdots & a_n \end{bmatrix}^T \in \mathbb{R}^n,\quad & f(x) = c x_1^{a_1}x_2^{a_2} \cdots x_n^{a_n}. \end{matrix}$

Under the change of variable $x \mapsto y$ :

$\begin{aligned} f(x) &= f(x_1,\ldots,x_n)\\ &= f(e^{y_1},\ldots, e^{y_n})\\ &= c(e^{y_1})^{a_1} \cdots (e^{y_n})^{a_n}\\ &= e^{\log c}e^{a_1y_1}\cdots e^{a_ny_n}\\ &= e^{a^Ty+b} \end{aligned}$

But

$F\,$ convex $\implies$ $e^F\,$ convex

and so $e^{a^Ty+b}$ is convex since affine functions are convex.

Step 3. (Posynomial $\to$ convex function)

For $k=1,\ldots,K$ , let

$\begin{matrix} c_k>0, \quad b_k = \log c_k,& f(x) \in Posy_n,\\ a_k = \begin{bmatrix} a_{1k} & \cdots a_{nk} \end{bmatrix} \in \mathbb{R}^{n},\quad & f(x) = \sum_{k=1}^K c_k x_1^{a_{1k}}\cdots x_{n}^{a_{nk}}. \end{matrix}$

By Step 2., there holds

$\begin{aligned} f(x) = f(y) = \sum_{k=1}^K e^{a_k^Ty + b_k}, \end{aligned}$

which is again a convex function.

Step 4. ((GP) $\to$ (COP))

We explicitly write the (GP) as:

$\begin{cases} \text{minimize}&f_0(x) = \sum_{k=1}^{K_0} c_{0k} x_1^{a_{01k}}\cdots x_n^{a_{0nk}}\\ \text{subject to}& f_i(x) = \sum_{k=1}^{K_i} c_{ik} x_1^{a_{i1k}} \cdots x_n^{a_{ink}} \leq 1, \, i=1,\ldots,m\\ & h_i(x) =d_{i}x_1^{b_{i1}}\cdots x_n^{b_{in}} = 1, \, i=1,\ldots,p \end{cases}.$

Let

$\begin{aligned} a_{ik} &= \begin{bmatrix} a_{i1k} & \cdots a_{ink} \end{bmatrix}^T \in \mathbb{R}^n\\ b_i &= \begin{bmatrix} b_{i1} & \cdots & b_{in} \end{bmatrix}^T \in \mathbb{R}^n\\ \alpha_{ik} &= \log c_{ik}\\ \delta_i &= \log d_{i} \end{aligned}.$

Under the change of variable $x \mapsto y$ , this (GP) becomes the (COP)

$\begin{cases} \text{minimize}&f_0(y) = \sum_{k=1}^{K_0} e^{a_{0k}^T y + \alpha_{0k}}\\ \text{subject to}& f_i(y) = \sum_{k=1}^{K_i} e^{a_{ik}^T y + \alpha_{ik}} \leq 1, \, i=1,\ldots,m\\ & h_i(y) = e^{b_{i}^Ty + \delta_i} = 1, \, i=1,\ldots,p \end{cases}.$

Step 5. ((GP) in convex form)

At last, since exponentiation may result in unreasonably large numbers, it is customary to take logarithms, resulting in the geometric problem in convex form:

$\begin{cases} \text{minimize}& \log \left( \sum_{k=1}^{K_0} e^{a_{0k}^T y + \alpha_{0k}} \right)\\ \text{subject to}& \log \left(\sum_{k=1}^{K_i} e^{a_{ik}^T y + \alpha_{ik}}\right) \leq 0, \, i=1,\ldots,m\\ & b_{i}^Ty + \delta_i = 0, \, i=1,\ldots,p \end{cases}.$

N.B.:

Concavity of $\log$ is too weak to break the convexity of $e^{a^Ty+b}$ , and so the problem is still convex.
The constraints are equivalent since $\log$ is monotonic and injective on $\mathbb{R}_{++}$ .

Example.

(Taken from Boyd-Kim-Vandenberghe-Hassibi: A tutorial on geometric programming)
Problem Maximize the volume of a box with

a limit on total wall area;
a limit on total floor area; and
upper and lower bounds on the aspect ratios $height/width$ and $depth/width$ .

Notational set up

$\begin{aligned} \text{Optimization Variables}& \begin{cases} w &= \text{width}\\ h &= \text{height}\\ d &= \text{depth} \end{cases}\\ \text{Problem Parameters}& \begin{cases} A_{\text{wall}} &= \text{max wall area}\\ A_{\text{floor}} &= \text{max floor area}\\ \alpha_{-},\alpha_+ &= \text{lower and upper aspect ratio bounds for }h/w\\ \beta_{-},\beta_+ &= \text{lower and upper aspect ratio bounds for }d/w \end{cases} \end{aligned}$

Construct objective function
The volume of the box is

$hwd$

and so the objective function is

$f_0(h,w,d) = hwd$ .

N.B.: $f_0 \in Mon_3$

Construct contraints

$\begin{aligned} \text{wall area bound }:&\quad 2hw+2hd \leq A_{\text{wall}}\\ \text{floor area bound }:&\quad wd \leq A_{\text{floor}}\\ \text{aspect ratio bounds }:&\quad \alpha_1 \leq \frac{h}{w} \leq \alpha_2\\ &\quad \beta_1 \leq \frac{d}{w} \leq \beta_2 \end{aligned}$

N.B.:

$\begin{aligned} 2hw+2hd &\in Posy_3\\ wd,\, hw^{-1},\, dw^{-1} &\in Mon_3 \end{aligned}$

Formulate Problem
Putting everything together, we realize the problem may be formulated as the following (GP):

$\begin{cases} \text{maximize} & hwd\\ \text{subject to} & 2hw+2hd\leq A_{\text{wall}}\\ & wd \leq A_{\text{floor}}\\ & \alpha_1 \leq hw^{-1} \leq \alpha_2\\ & \beta_1 \leq dw^{-1} \leq \beta_2 \end{cases}.$

To write the problem in standard form: note the following equivalence of constraints

$\begin{aligned} 2hw+2hd\leq A_{\text{wall}} & \iff A_{\text{wall}}^{-1}2hw + A_{\text{wall}}^{-1}2hd \leq 1\\ wd \leq A_{\text{floor}} & \iff A_{\text{floor}}^{-1}wd \leq 1\\ &\\ \alpha_1 \leq hw^{-1} \leq \alpha_2 &\iff \begin{array}{l} \alpha_2^{-1}hw^{-1} \leq 1 \\ \alpha_1 h^{-1}w \leq 1 \end{array}\\ \beta_1 \leq dw^{-1} \leq \beta_2 & \iff \begin{array}{l} \beta_2^{-1} dw^{-1} \leq 1\\ \beta_1 d^{-1}w \leq 1 \end{array} \end{aligned}$

Moreover, maximizing $hwd$ is equivalent to minimizing $h^{-1}w^{-1}d^{-1}$ . Therefore, the problem in standard form is given by

$\begin{cases} \text{minimize} & h^{-1}w^{-1}d^{-1}\\ \text{subject to} & A_{\text{wall}}^{-1}2hw +A_{\text{wall}}^{-1}2hd \leq 1\\ & A_{\text{floor}}^{-1} wd \leq 1\\ & \alpha_2^{-1} hw^{-1}\leq 1 \\ & \alpha_1 wh^{-1} \leq 1\\ & \beta_2^{-1}dw^{-1} \leq 1\\ & \beta_1 wd^{-1} \leq 1 \end{cases}.$

Semidefinite Programming

(Heavily influenced by Vandenberghe-Boyd Semidefinite Programming.)
Linear matrix inequality (LMI): given

$\begin{aligned} &F_0,F_1,\ldots,F_n \in \boldsymbol{S}^m\\ &x = \begin{bmatrix}x_1 & \cdots & x_n \end{bmatrix}^T \in \mathbb{R}^n\\ F(x) &:= F_0 + x_1 F_1 + \cdots + x_n F_n \end{aligned}$

an inequality of the form

$F(x) \succeq 0$ .

Recall: for $A \in \boldsymbol{S}^m$ , we write $A \succeq 0$ to mean $A$ is positive semidefinite, i.e., $z^TAz \geq 0$ for all $z \in \mathbb{R}^m$ .

Semidefinite program (SDP): an (OP) of the form

$\text{(SDP)} \begin{cases} \text{minimize} & c^Tx\\ \text{subject to} & F(x) \succeq 0, \end{cases}$

where

$\begin{aligned} &F_0,F_1,\ldots,F_n \in \boldsymbol{S}^m\\ F(x) &:= F_0 + x_1 F_1 + \cdots + x_n F_n\\ c & \in \mathbb{R}^n \end{aligned}$

N.B.: The LMI $F(x) \succeq 0$ defines a feasible set which is convex and hence (SDPs) are convex problems.

Convexity of Feasible Set.

To see that (SDP) is a convex problem, first note: if

$t>0$ and $A \succeq 0,$

then

$z^T(tA)z = t ( z^TAz) \geq 0$

and so

$tA \succeq 0$ .

Next, observe: for $x,y$ feasible and $t \in [0,1]$ , the function

$F(x) = F_0 + x_1 F_1 + \cdots + x_n F_n$

evaluated at the convex combination

$tx + (1-t)y$

$\begin{aligned} F(tx +(1-t)y) &= F_0 + (tx_1 + (1-t)y_1) F_1 + \cdots + (tx_n + (1-t)y_n)F_n. \end{aligned}$

Expanding, rearranging and using

$F_0 = tF_0 + (1-t)F_0$

gives:

$\begin{aligned} F(tx +(1-t)y) &= tF_0 + tx_1F_1 + \cdots tx_nF_n \\ &+ (1-t)F_0 + (1-t)y_1F_1 + \cdots + (1-t)y_nF_n\\ &= tF(x) + (1-t)F(y). \end{aligned}$

Using $F(x),F(y)\succeq 0$ , we conclude

$F(tx+(1-t)y) = tF(x) + (1-t)F(y) \succeq 0$ .

Thus, $x,y$ feasible $\implies$ $tx + (1-t)y$ feasible for $t \in [0,1]$ , i.e., the feasible set is convex.

Example 1. LPs are SDPs

Consider the (LP)

$\begin{cases} \text{minimize} & c^Tx\\ \text{subject to} & Ax + b \succeq 0 \end{cases},$

where

$A \in \mathbb{R}^{m \times n}, \quad b \in \mathbb{R}^{m}, \quad c \in \mathbb{R}^n$

and

$Ax+b \succeq 0$

means componentwise inequality.

Given $v = \begin{bmatrix}v_1 & \cdots & v_m \end{bmatrix}^T \in \mathbb{R}^m$ , define

$\text{diag}(v) = \begin{bmatrix} v_1 & 0 & \cdots & 0\\ 0 & v_2 &\cdots &0\\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & v_m \end{bmatrix}.$

Since $A \in \boldsymbol{S}^m$ satisfies $A \succeq 0$ iff $A$ has nonnegative eigenvalues, we have

$v \succeq 0 \iff \text{diag}(v) \succeq 0$ .

(Indeed, the eigenvalues of $\text{diag}(v)$ are the components of $v$ .) Therefore,

$\begin{aligned} Ax+b \succeq 0 &\iff \text{diag}(Ax+b) \succeq 0 \\ \text{(vector inequality)} & \iff \text{(matrix inequality)}. \end{aligned}$

Letting

$A=\begin{bmatrix}a_1 & \cdots & a_n \end{bmatrix}, \quad a_i \in \mathbb{R}^m$ ,

we have

$Ax + b = b + x_1 a_1 + \cdots + x_n a_n$ .

Therefore, using

$\begin{aligned} \text{diag}(v+\lambda w) &= \begin{bmatrix} v_1 + \lambda w_1 &0 & \cdots & 0\\ 0 & v_2 + \lambda w_2 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & v_n + \lambda w_n \end{bmatrix}\\ &= \begin{bmatrix} v_1 &0 & \cdots & 0\\ 0 & v_2 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & v_n \end{bmatrix} + \lambda \begin{bmatrix} w_1 &0 & \cdots & 0\\ 0 & w_2 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & w_n \end{bmatrix}\\ &= \text{diag}(v) + \lambda \text{diag}(w) \end{aligned}$

we have

$\begin{aligned} \text{diag}(Ax + b) &= \text{diag}(b + x_1 a_1 + \cdots + x_n a_n)\\ &= \text{diag}(b) + x_1 \text{diag}(a_1) + \cdots + x_n \text{diag}(a_n). \end{aligned}$

Therefore, defining

$\begin{aligned} F_0 &= \text{diag}(b), \quad F_i = \text{diag}(a_i)\\ F(x) &= F_0 + x_1 F_1 + \cdots + x_n F_n = \text{diag}(Ax+b), \end{aligned}$

we conclude

$\begin{aligned} Ax+b \succeq 0 \iff \text{diag}(Ax+b) \succeq 0 \iff F(x) \succeq 0 \end{aligned}$ .

In conclusion, we have that the (LP)

$\begin{cases} \text{minimize} & c^Tx\\ \text{subject to} & Ax + b \succeq 0 \end{cases},$

is equivalent to the (SDP)

$\begin{cases} \text{minimize} & c^Tx\\ \text{subject to} & F(x)= \text{diag}(Ax+b) \succeq 0 \end{cases}.$

Example 2. Nonlinear OP as a SDP

Consider the nonlinear (COP)

$\text{(OP1)} \begin{cases} \text{minimize}& \frac{(c^Tx)^2}{d^Tx}\\ \text{subject to} & Ax + b \succeq 0\\ & d^Tx > 0 \end{cases}$

where

$\begin{aligned} &c,d \in \mathbb{R}^n\\ &A \in \mathbb{R}^{m \times n}, b \in \mathbb{R}^m. \end{aligned}$

We will recast (OP1) as a (SDP).
N.B.:

$d^Tx > 0 \implies$ choice of domain (a halfspace) of objective and ensures convexity.
$d^Tx < 0 \implies$ concave problem.

To begin, recall we may recast problem in epigraph form:

$\text{(OP1)} \begin{cases} \text{minimize}& t\\ \text{subject to} & Ax + b \succeq 0\\ & d^Tx > 0\\ & \frac{(c^Tx)^2}{d^Tx} \leq t \end{cases}.$

N.B.: this introduces the new optimization variable $t$ . Goal: find a symmetric matrix-valued function

$F(x,t) = F_0 + x_1F_1 + \cdots + x_nF_n + t F_{n+1}$

such that the constraints

$\begin{cases} &Ax + b \succeq 0\\ & d^Tx > 0\\ & \frac{(c^Tx)^2}{d^Tx} \leq t \end{cases}$

may be recast as the LMI

$\begin{cases} F(x) \succeq 0 \end{cases}.$

Idea: we know

$Ax + b \succeq 0 \iff \text{diag}(Ax+b) \succeq 0$ .

On the other hand:

$\begin{aligned} \frac{(c^Tx)^2}{d^Tx} \leq t &\iff (c^Tx)^2 \leq td^Tx \\ &\iff td^Tx - (c^Tx)^2 \geq0 \end{aligned}$ .

Recall: if $\gamma>0$ , then

$\begin{bmatrix}\alpha&\beta\\\beta&\gamma\end{bmatrix}\succeq0 \iff \alpha\geq0, \alpha\gamma - \beta^2 \geq 0$

Therefore, given $d^Tx>0$ , we have

$\begin{aligned} \frac{(c^Tx)^2}{d^Tx} \leq t & \iff \begin{bmatrix}t & c^Tx\\ c^Tx & d^Tx \end{bmatrix} \succeq 0 \end{aligned}$ .

Using that

$\begin{bmatrix} A &\vline &0\\ \hline 0 & \vline& B \end{bmatrix} \succeq 0 \iff A \succeq 0 , B \succeq 0,$

we therefore introduce

$E = \begin{bmatrix} \text{diag}(Ax+b) & 0 & 0\\ 0 & t & c^Tx\\ 0 & c^Tx & d^Tx \end{bmatrix}$

to capture the problems constraints.
Indeed, evidently, $E \succeq 0$ iff

$\text{diag}(Ax+b) \succeq 0$ and $\begin{bmatrix} t & c^tx\\c^tx & d^tx \end{bmatrix} \succeq 0$ .

Therefore,

$\begin{array}{l} Ax + b \succeq 0\\ d^Tx > 0\\ \frac{(c^Tx)^2}{d^Tx} \leq t \end{array} \iff \begin{array}{l} \begin{bmatrix} \text{diag}(Ax+b) & 0 & 0\\ 0 & t & c^Tx\\ 0 & c^Tx & d^Tx \end{bmatrix} \succeq 0 \end{array}$ .

This is enough to conclude (OP1) may be recast as an (SDP).

To make it clearer, introduce the notation

$A = \begin{bmatrix} a_1 & \cdots & a_n \end{bmatrix}, \quad a_i \in \mathbb{R}^m$

and $(m+2)\times(m+2)$ matrices

$\begin{aligned} F_0 &= \text{diag}(b,0,0) \\ F_i & = \begin{bmatrix} \text{diag}(a_i) &0 &0\\ 0&0&c_i\\ 0&c_i&d_i \end{bmatrix}\\ F_{n+1} & = \begin{bmatrix}0_{m \times m} &0 &0\\0 & 1 & 0\\0 & 0 & 0\end{bmatrix}\\ \end{aligned}.$

Then

$\begin{aligned} \begin{bmatrix} \text{diag}(Ax+b) & 0 & 0\\ 0 & t & c^Tx\\ 0 & c^Tx & d^Tx \end{bmatrix} = F_0 + x_1F_1 + \cdots + x_nF_n + t F_{n+1}:=F(x,t) \end{aligned}$

and so (OP1) is equivalent to the (SDP)

$\begin{cases} \text{minimize}& t\\ \text{subject to} & F(x,t) \succeq 0. \end{cases}$

Lagrangian Duality

Throughout, let (OP) denote a given optimization problem of the form

$\text{(OP)} \begin{cases} \text{minimize} & f_0(x)\\ \text{subject to} & f_i(x) \leq 0 , \quad i=1,\ldots,m\\ & h_i(x) = 0, \quad i = 1,\ldots,p \end{cases}.$

Recall:

$\begin{aligned} \text{Problem domain: }& D := \bigcap_{i=0}^m \text{dom}\, f_i \cap \bigcap_{i=1}^p \text{dom}\,h_i\\ \text{Optimal value: } & p^\star := \inf\{f_0(x): x \in D , x \text{ feasible}\}. \end{aligned}$

The Lagrange Dual

Lagrangian: the function

$\begin{aligned} L&:\mathbb{R}^n \times \mathbb{R}^m \times \mathbb{R}^p \to \mathbb{R} \\ \text{dom}\,L &= D \times \mathbb{R}^m \times \mathbb{R}^p \end{aligned}$

given by

$\begin{aligned} L(x,\lambda,\nu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x) \end{aligned}$ .

N.B.: for each fixed $x \in D$ , the function

$(\lambda,\nu) \mapsto L(x,\lambda,\mu)$

is affine.

Lagrange multipliers: the variables $\lambda_i$ and $\nu_i$ .
The vectors

$\begin{aligned} \lambda := \begin{bmatrix}\lambda_1\\\lambda_2\\\vdots\\\lambda_m\end{bmatrix}, \quad \nu := \begin{bmatrix}\nu_1\\\nu_2\\\vdots\\\nu_p\end{bmatrix} \end{aligned}$

are called dual variables.

Lagrange dual function: the function

$\begin{aligned} g&:\mathbb{R}^m \times \mathbb{R}^p \to \mathbb{R}\\ \end{aligned}$

given by

$g(\lambda,\nu) = \inf\{L(x,\lambda,\nu): x \in D\}$ .

Those $(\lambda,\nu) \in \mathbb{R}^m_+ \times \mathbb{R}^m$ satisfying

$g(\lambda,\nu) > -\infty$

are called dual feasible.
N.B.: as an infimum of affine functions, $g$ is automatically concave.

Proposition. For
$\lambda \in \mathbb{R}^m_+, \quad \nu \in \mathbb{R}^p$ ,
there holds
$g(\lambda,\nu) \leq p^\star$ ,
where $p^\star$ is the optimal value for the given (OP).

Proof.

Let $x \in D$ be feasible.
Then
$\begin{aligned} f_i(x) &\leq 0, \quad i=1,\ldots,m\\ h_i(x) &=0, \quad i=1,\ldots,p. \end{aligned}$
Let $\lambda \succeq 0$ and $\nu$ be arbitrary.
Then feasibility of $x$ implies
$\begin{aligned} \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x) = \sum_{i=1}^m \lambda_i f_i(x) \leq 0. \end{aligned}$
Consequently,
$\begin{aligned} L(x,\lambda,\nu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x) \leq f_0(x). \end{aligned}$
Therefore, for all feasible $x$ , for $\lambda \succeq 0$ and for arbitrary $\nu$ , there holds
$\begin{aligned} g(\lambda,\nu) = \inf\{L(z,\lambda,\nu):z \in D \} \leq L(x,\lambda,\nu) \leq f_0(x) \end{aligned}$
and so
$g(\lambda,\nu) \leq p^\star$ .
(Indeed, $g(\lambda,\nu)$ is a lower bound of $f_0(x)$ and $p^\star$ is the greatest lower bound of $f_0(x)$ .)

Lagrangian as underestimator.
(See CO 5.1.4)
Define the indicator functions

$I_-(t)= \begin{cases} 0 &t \leq 0\\ +\infty & t> 0 \end{cases},\qquad I_0(t)= \begin{cases} 0 & t=0\\ +\infty & t\neq0 \end{cases}.$

Then

$\begin{aligned} I_-(f_i(x)) &\text{ indicates when the constraint }f_i \text{ is active}\\ I_0(h_i(x)) &\text{ indicates when the constraint }h_i \text{ is active} \end{aligned}$

and the (OP) is equivalent to

$\begin{aligned} \begin{cases} \text{ minimize } f_0(x) + \sum_{i=1}^mI_-(f_i(x)) + \sum_{i=1}^p I_0(h_i(x)). \end{cases} \end{aligned}$

N.B.: the terms in

$\begin{aligned}\sum_{i=1}^mI_-(f_i(x)) + \sum_{i=1}^p I_0(h_i(x)) \end{aligned}$

act as penalties for breaking the desired constraints.

N.B.: if $x$ is feasible and $(\lambda,\nu) \in \mathbb{R}_+^m\times\mathbb{R}^p$ , then

$\begin{aligned} \lambda_i f_i(x) &\leq I_-(f_i(x))\\ \nu_i h_i(x) &\leq I_0(h_i(x)) \end{aligned}$

and hence

$\begin{aligned}L(x,\lambda,\nu) & = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x) \\ &\leq f_0(x) + \sum_{i=1}^mI_-(f_i(x)) + \sum_{i=1}^p I_0(h_i(x))\end{aligned}$ .

Viz., $L$ is an underestimater of the objective function

$f_0(x) + \sum_{i=1}^mI_-(f_i(x)) + \sum_{i=1}^p I_0(h_i(x))$

obtained by “softening” or “weakening” the penalty functions:

$\begin{aligned} I_-(t) & \to ct, \quad c>0\\ I_0(t) & \to bt, \quad b \in \mathbb{R}. \end{aligned}$

In particular, for each $(\lambda,\nu) \in \mathbb{R}_+^m \times \mathbb{R}^p$ , the problem

$\begin{cases} \text{minimize } L(x,\lambda,\nu) \end{cases}$

has optimal value $g(\lambda,\nu)$ and provides an underestimation of the original (OP).

Example 1

Consider the least squares problem

$(LS) \begin{cases} \text{minimize} & x^Tx\\ \text{subject to}& Ax=b \end{cases}$

for given

$A = \begin{bmatrix} a_1^T \\ \vdots \\ a_p^T \end{bmatrix} \in \mathbb{R}^{p\times n}, \quad a_i \in \mathbb{R}^n, \quad b \in \mathbb{R}^p$ .

N.B.:

$\begin{aligned} Ax = b \iff h_i(x) = a_i^Tx - b_i = 0 \text{ for } i=1,\ldots,p \end{aligned}$ .

Therefore, the Lagrangian $L$ for (LS) is

$\begin{aligned} L&:\mathbb{R}^n \times \mathbb{R}^p \to \mathbb{R}\\ \text{dom}\,L &= \mathbb{R}^n \times \mathbb{R}^p\\ L(x,\nu) &= x^Tx + \sum_{i=1}^p \nu_i (a_i^Tx - b_i) \\ &= x^Tx + \nu^T(Ax-b) \end{aligned}$

and the Lagrange dual is

$\begin{aligned} g&:\mathbb{R}^p \to \mathbb{R}\\ g(\nu) &= \inf\{ x^Tx + \nu^T(Ax-b) : x \in \mathbb{R}^n \}. \end{aligned}$

N.B.:

$\begin{aligned} \nabla^2_x L(x,\nu) &= \nabla_x^2 ( x^Tx + \nu^T(Ax-b)) \\ &= 2Id_{n \times n}\\ & \succeq 0 \end{aligned}$

and so $x\mapsto L(x,\nu)$ is convex.
Consequently,

$\begin{aligned} L(x^\star,\nu) = \inf\{L(x,\nu):x \in \mathbb{R}^n \} = \text{min}\{L(x,\nu):x \in \mathbb{R}^n \} \end{aligned}$

iff

$\nabla_x L(x^\star,\nu) = 2x^\star + A^T\nu = 0$ ,

i.e., iff

$x^\star = -\frac{1}{2}A^T\nu.$

In conclusion,

$\begin{aligned} g(\nu) &= L(x^\star,\nu)\\ &= (x^\star)^Tx^\star + \nu^T(Ax^\star-b)\\ &= \left(-\frac{1}{2}A^T\nu\right)^T\left(-\frac{1}{2}A^T\nu\right) +\nu^T\left(A\left(-\frac{1}{2}A^T\nu\right) - b\right)\\ &=\frac{1}{4}\nu^TAA^T\nu - \frac{1}{2}\nu^TAA^T\nu - \nu^Tb\\ &=-\frac{1}{4}\nu^TAA^T\nu - \nu^Tb. \end{aligned}$

In particular,

$-\frac{1}{4}\nu^TAA^T\nu - b^T\nu \leq \inf\{x^Tx:Ax=b\}.$

for all $\nu \in \mathbb{R}^p$ .

Example 2

Consider the linear program

$\text{(LP)} \begin{cases} \text{minimize} & c^Tx\\ \text{subject to}& Ax=b\\ & x\succeq 0 \end{cases}$

for given

$c \in \mathbb{R}^n, \quad A \in \mathbb{R}^{p \times n}, \quad b \in \mathbb{R}^p$ .

N.B.:

equality constraints given by
$h_i(x) = a_i^Tx-b_i =0, \quad i=1,\ldots,p$ .
$x\succeq 0$ iff
$x_i \geq 0, \quad i=1,\ldots,n\quad$
iff
$\quad f_i(x) = -x_i \leq 0,\quad i=1,\ldots,n$ .

Therefore, the Lagrangian for (LP) is

$\begin{aligned} L&: \mathbb{R}^n \times \mathbb{R}^n \times \mathbb{R}^p \to \mathbb{R}\\ \text{dom}\, L &=\mathbb{R}^n \times \mathbb{R}^n \times \mathbb{R}^p \\ L(x,\lambda,\nu) &= c^Tx - \sum_{i=1}^n \lambda_i x_i + \sum_{i=1}^p \nu_i(a_i^Tx-b_i)\\ &= c^Tx - \lambda^Tx + \nu^T(Ax-b)\\ &=(c - \lambda + A\nu^T)^Tx - \nu^Tb \end{aligned}$

Want to compute

$g(\lambda,\nu) = \inf\{ L(x,\lambda,\nu) : x \in \mathbb{R}^n \},$

but

$x \mapsto (c - \lambda + A\nu^T)^Tx - \nu^Tb$

is an affine function with domain $\mathbb{R}^n$ .
Therefore,

$x \mapsto (c - \lambda + A\nu^T)^Tx - \nu^Tb$

is bounded below iff $(\lambda,\nu)$ satisfy

$c - \lambda + A\nu^T = 0$ .

Therefore, the Lagrange dual is

$\begin{aligned} g&:\mathbb{R}^n \times \mathbb{R}^p \to \mathbb{R}\\ \text{dom}\, g &= \{(\lambda,\nu) \in \mathbb{R}^n \times \mathbb{R}^n \times \mathbb{R}^p : c - \lambda + A\nu^T = 0\}\\ g(\lambda,\nu) &= \begin{cases} -b^T\nu & c - \lambda + A\nu^T = 0\\ -\infty & \text{else} \end{cases}. \end{aligned}$

In particular, for dual feasible $(\lambda,\nu)$ , there holds

$-b^T\nu \leq c^Tx$

for all feasible $x$ .

Return of Conjugate Function

Recall: given $f:\mathbb{R}^n \to \mathbb{R}$ , its conjugate function $f^*$ is the convex function

$f^*(y) = \sup\{y^Tx - f(x) : x \in \text{dom}\,f\}.$

Interestingly: the conjugate function is related to the Lagrange dual. Example.
Consider the the (OP)

$\text{(OP)} \begin{cases} \text{minimize}&f_0(x)\\ \text{subject to}&Ax \preceq b\\ &Cx=d \end{cases},$

for given

$\begin{aligned} f_0:\mathbb{R}^n \to \mathbb{R},\quad A \in \mathbb{R}^{m \times n}, \quad b \in \mathbb{R}^m\\ C \in \mathbb{R}^{p \times n}, \quad d \in \mathbb{R}^p. \end{aligned}$

We may write

$\begin{aligned} Ax\preceq b &\iff f_i(x) = a_i^Tx - b_i \leq0 , \quad i =1,\ldots, m\\ Cx = d &\iff h_i(x) =c_i^Tx - d_i=0, \quad i=1,\ldots,p \end{aligned}$

for suitable

$a_i ,c_i \in \mathbb{R}^n.$

The Lagrangian is

$\begin{aligned} L&:\mathbb{R}^n \times \mathbb{R}^m \times \mathbb{R}^p \to \mathbb{R}\\ \text{dom}\, L &= D\\ L(x,\lambda,\nu) &= f_0(x) + \sum_{i=1}^m \lambda_i(a_i^Tx - b_i) + \sum_{i=1}^p \nu_i(c_i^Tx - d_i)\\ &= f_0(x) + \lambda^T(Ax-b) + \nu^T(Cx-d)\\ &= f_0(x) + \lambda^TAx + \nu^TCx - \lambda^Tb - \nu^Td \end{aligned}$

We may now compute the Lagrange dual in terms of the conjugate $f_0^*$ :

$\begin{aligned} g&:\mathbb{R}^m \times \mathbb{R}^p \to \mathbb{R}\\ g(\lambda,\nu)&= \inf\{L(x,\lambda,\nu):x \in D\}\\ &=\inf\{f_0(x) + \lambda^TAx + \nu^TCx - \lambda^Tb - \nu^Td : x \in D\}\\ &=\inf\{ (A^T\lambda + C^T\nu)^Tx + f_0(x):x \in D\} - \lambda^Tb - \nu^Td \\ &=-\sup\{-(A^T\lambda+C^T\nu)^Tx - f_0(x) : x \in D\} - \lambda^Tb - \nu^Td\\ &=-f_0^*(-A^T\lambda-C^T\nu) - \lambda^Tb - \nu^Td. \end{aligned}$

Since

$\begin{aligned} g(\lambda,\nu)>-\infty \iff f_0^*(-A^T\lambda-C^T\nu)< +\infty \end{aligned}$ ,

we conclude

$\begin{aligned}\text{dom}\,g = \{(\lambda,\nu)\in \mathbb{R}^m\times\mathbb{R}^p: -A^T\lambda-C^T\nu \in \text{dom}\,f_0^* \} \end{aligned}.$

Appendix

Differentiating $b^Tx$

Given

$b = \begin{bmatrix}b_1\\b_2\\\vdots\\b_n\end{bmatrix} \in \mathbb{R}^n$ ,

define the scalar function

$g(x) = b^Tx = b_1x_1 + \cdots + b_n x_n$ .

Using the Taylor expansion

$\begin{aligned} g(y) = g(x) + (y-x)^T \nabla g(x) + \cdots \end{aligned}$

at $y = 0$ , we find

$\begin{aligned} 0 = b^Tx - x^T \nabla g(x) + \cdots \end{aligned}$

and so

$x^T \nabla g(x) = b^T.$

To see this computed directly, computing

$\begin{aligned} \frac{\partial}{\partial x_k} g(x) &= \frac{\partial}{\partial x_k}(b_1x_1 + \cdots + b_nx_n)\\ &=b_k, \end{aligned}$

we conclude

$\nabla g = \begin{bmatrix}\frac{\partial}{\partial x_1} g(x) \\ \frac{\partial}{\partial x_2}g(x) \\ \vdots \\ \frac{\partial}{\partial x_n}g(x) \end{bmatrix} = \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{bmatrix} = b.$

Differentiating $\frac{1}{2}x^TQx$

Given

$Q = \begin{bmatrix} q_{11} & q_{12} & \cdots & q_{1n}\\ q_{21} & q_{22} & \cdots & q_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ q_{n1} & q_{n2} & \cdots & q_{nn} \end{bmatrix} \in \boldsymbol{S}^n$

define the scalar function

$f(x) = \frac{1}{2}x^T Q x.$

Using the Taylor expansion

$\begin{aligned} f(y) = f(x) + (y-x)^T \nabla f(x) + \frac{1}{2} (y-x)^T \nabla^2 f(x) (y-x) + \cdots \end{aligned}$

at $y = 0$ , we find

$\begin{aligned} 0 &= \frac{1}{2}x^TQx - x^T \nabla f(x) + \frac{1}{2} x^T \nabla^2 f(x)x + \cdots\\ &= x^TQx - x^T \nabla f(x) - \frac{1}{2}x^TQx + \frac{1}{2}x^T\nabla^2 f(x) x + \cdots, \end{aligned}$

which is evidently satisfied in case

$\begin{aligned} \nabla f(x) &= Qx\\ \nabla^2 f(x) &= Q. \end{aligned}$

To see this computed directly: expanding the matrix multiplication, we have

$\begin{aligned} f(x) = \frac{1}{2} \sum_{i,j=1}^{n} x_i x_j q_{ij} \end{aligned}$ .

Computing

$\begin{aligned} \frac{\partial}{\partial x_k} f(x) &= \frac{1}{2}\sum_{i,j=1}^n \frac{\partial}{\partial x_k} (x_i x_j q_{ij})\\ &=\frac{1}{2} \left(\sum_{i=1}^n x_i q_{ik} + \sum_{j=1}^n x_j q_{kj} \right)\\ &=\frac{1}{2}\sum_{i=1}^n x_i(q_{ik}+q_{ki})\\ &=\sum_{i=1}^n x_iq_{ik} \end{aligned}$

we have

$\begin{aligned} \nabla f(x) &= \begin{bmatrix} \frac{\partial}{\partial x_1} f(x)\\ \frac{\partial}{\partial x_2} f(x)\\ \vdots\\ \frac{\partial}{\partial x_n} f(x) \end{bmatrix} =Qx \end{aligned}$ .

Taking the second derivative gives

$\begin{aligned} \frac{\partial^2}{\partial x_k \partial x_l} f(x) &= \sum_{i=1}^n\frac{\partial}{\partial x_l}x_i q_{ik}\\ &=q_{lk}. \end{aligned}$

Consequently, there holds

$\begin{aligned} \nabla^2 f(x) &= \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2}(x) & \frac{\partial^2 f}{\partial x_1 \partial x_2}(x) & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n}(x) \\ \frac{\partial^2 f}{\partial x_2 \partial x_1 }(x) & \frac{\partial^2 f}{\partial x_2^2 }(x) & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n}(x) \\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial^2 f}{\partial x_n \partial x_1}(x) & \frac{\partial^2 f}{\partial x_n \partial x_2}(x) & \cdots & \frac{\partial^2 f}{\partial x_n^2}(x) \\ \end{bmatrix}\\ & = Q. \end{aligned}$

Example.

Let

$\begin{aligned} f(x_1,x_2) &= \frac{1}{2}\begin{bmatrix}x_1 & x_2 \end{bmatrix} \begin{bmatrix} a&b\\c&d \end{bmatrix} \begin{bmatrix}x_1\\x_2 \end{bmatrix}\\ &=\frac{1}{2} \begin{bmatrix}x_1&x_2\end{bmatrix}\begin{bmatrix}ax_1+bx_2\\cx_1+dx_2\end{bmatrix}\\ &=\frac{1}{2}(ax_1^2 + (b+c)x_1x_2 + dx_2^2). \end{aligned}$

Compute

$\begin{aligned} \frac{\partial}{\partial x_1} f(x_1,x_2) &= \frac{1}{2}\frac{\partial}{\partial x_1} (ax_1^2 + (b+c)x_1x_2 + dx_2^2)\\ &=\frac{1}{2}(2ax_1 + (b+c)x_2)\\ \frac{\partial}{\partial x_2} f(x_1,x_2) &= \frac{1}{2}\frac{\partial}{\partial x_2} (ax_1^2 + (b+c)x_1x_2 + dx_2^2)\\ &=\frac{1}{2}((b+c)x_1 + 2dx_2)\\ \end{aligned}.$

Consequently,

$\begin{aligned} \nabla f(x) &= \begin{bmatrix} \frac{\partial}{\partial x_1} f(x) \\ \frac{\partial}{\partial x_2} f(x) \end{bmatrix}\\ &=\begin{bmatrix} \frac{1}{2}(2ax_1 + (b+c)x_2)\\ \frac{1}{2}((b+c)x_1 + 2dx_2) \end{bmatrix}\\ &=\frac{1}{2} \left(\begin{bmatrix}ax_1 + bx_2 \\ cx_1 + dx_2 \end{bmatrix} + \begin{bmatrix} ax_1 + cx_2 \\ bx_1 + dx_2 \end{bmatrix} \right)\\ &=\frac{1}{2}\left( \begin{bmatrix}a&b\\c&d\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix} + \begin{bmatrix}a&c\\b&d\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix}\right)\\ &=\frac{1}{2}(Q+Q^T)x. \end{aligned}$

Next compute the second derivatives:

$\begin{aligned} \frac{\partial^2}{\partial x_1^2} f(x_1,x_2) &=\frac{1}{2}\frac{\partial}{\partial x_1}(2ax_1 + (b+c)x_2)\\ &=a\\ \frac{\partial^2}{\partial x_1 \partial x_2} f(x_1,x_2) &=\frac{1}{2}\frac{\partial}{\partial x_2}(2ax_1 + (b+c)x_2)\\ &=\frac{1}{2}(b+c)\\ \frac{\partial^2}{\partial x_2 \partial x_1} f(x_1,x_2) &= \frac{1}{2}\frac{\partial}{\partial x_1}((b+c)x_1 + 2dx_2)\\ &=\frac{1}{2}(b+c)\\ &=\frac{1}{2}(b+c)\\ \frac{\partial^2}{\partial x_2^2} f(x_1,x_2) &= \frac{1}{2}\frac{\partial}{\partial x_2}((b+c)x_1 + 2dx_2)\\ &=d. \end{aligned}$

Putting this together gives

$\begin{aligned} \nabla^2 f(x) &= \begin{bmatrix} \frac{\partial^2}{\partial x_1^2}f & \frac{\partial^2}{\partial x_1 \partial x_2}f\\ \frac{\partial^2}{\partial x_2 \partial x_1}f & \frac{\partial^2}{\partial x_2^2}f \end{bmatrix}\\ &= \frac{1}{2}\begin{bmatrix} a & b+c\\ c+b & d \end{bmatrix}\\ &= \frac{1}{2}\begin{bmatrix} a & b\\ c& d \end{bmatrix} + \frac{1}{2}\begin{bmatrix}a&c\\b&d\end{bmatrix}\\ &= \frac{1}{2}(Q+Q^T). \end{aligned}$

Differentiating $b^TAx$

Given $A \in \mathbb{R}^{m \times n}$ , $b \in \mathbb{R}^m$ , define the scalar function $h(x) = b^TAx.$
Using the Taylor expansion

$h(y) = h(x) + (y-x)^T\nabla h(x) + \cdots$

at $y = 0$ , we get

$\begin{aligned} 0 &= b^TAx - x^T \nabla h(x) + \cdots\\ &= x^T(Ab^T) - x^T \nabla h(x) + \cdots \end{aligned}$

which directly implies

$\nabla h(x) = Ab^T$ .

To see this computed directly, first compute

$\begin{aligned} b^TAx &= \begin{bmatrix}b_1& \cdots & b_m \end{bmatrix} \begin{bmatrix} a_1^T\\ \vdots \\ a_m^T \end{bmatrix} \begin{bmatrix}x_1\\\vdots\\x_n\end{bmatrix}\\ &=\begin{bmatrix}b_1& \cdots & b_m \end{bmatrix} \begin{bmatrix} a_1^Tx\\\vdots\\ a_m^T x\end{bmatrix}\\ &= b_1a_1^Tx + \cdots + b_ma_m^Tx. \end{aligned}$

Computing

$\begin{aligned} \frac{\partial}{\partial x_k} b^TAx &= \frac{\partial}{\partial x_k}(b_1a_1^Tx + \cdots + b_ma_m^Tx)\\ &= b_1 a_{1k} + \cdots + b_m a_{mk}, \end{aligned}$

we conclude

$\begin{aligned} \nabla b^TAx &= \begin{bmatrix} b_1 a_{1k} + \cdots + b_m a_{mk}\\ \vdots\\ b_1 a_{1m} + \cdots + b_m a_{mm} \end{bmatrix}\\ &= A^Tb. \end{aligned}$