Study notes from Convex Optimization by Stephen Boyd, Lieven Vandenberghe.

1. Affine and convex sets

1.1 Lines and line segments

\qquadSuppose x1x2x_1 \neq x_2 are two points in Rn\boldsymbol{R}^n. Points of the form

y=θx1+(1θ)x2(1.1.1)\begin{array}{c} y = \theta x_1 + (1-\theta) x_2 \tag{1.1.1} \end{array}

where θR\theta \in \boldsymbol{R},form the line passing through x1x_1 and x2x_2. The parameter value θ=0\theta = 0 corresponds to y=x2y = x_2, and the parameter value θ=1\theta = 1 corresponds to y=x1y = x_1. Values of the parameter θ\theta between 00 and 11 corresponds to the line segment between x1x_1 and x2x_2.

1.2 Affine sets

\qquadA set CRnC \subseteq \boldsymbol{R}^n is affine sets if the line through any two distinct points in CC lies in CC, i.e.i.e.,

ifx1,x2C,θRthenθx1+(1θ)x2C(1.2.1)\begin{array}{rl} \text{if} & \forall x_1, x_2 \in C, \theta \in \boldsymbol{R} \\ \text{then} & \theta x_1 + (1-\theta) x_2 \in C \tag{1.2.1} \end{array}

\qquadIn other words, CC contains the linear combination of any two points in CC, provided the coefficients in the linear combination sum to one. This idea can be generalized to more than two points. We refer to a point of the form θx1++θkxk\theta x_1 + \cdots + \theta_k x_k, where θ1++θk=1\theta_1 + \cdots + \theta_k = 1, as an affine combination of the points x1,,xkx_1, \cdots, x_k.

ifx1,,xkC,θ1,,θkRθ1++θk=1thenθ1x1++θkxkC(1.2.2)\begin{array}{rl} \text{if} & \forall x_1,\cdots,x_k \in C, \theta_1,\cdots,\theta_k \in R \\ & \theta_1 + \cdots + \theta_k = 1 \\ \text{then} & \theta_1 x_1 + \cdots + \theta_k x_k \in C \tag{1.2.2} \end{array}

\qquadUsing induction from the definition of affine, it can be shown that an affine set contains every affine combination of its points: if CC is an affine set, x1,,xkCx_1, \cdots, x_k \in C, and θ1++θk=1\theta_1 + \cdots + \theta_k = 1, the the point θ1x1++θkxkC\theta_1 x_1 + \cdots + \theta_k x_k \in C. If CC is an affine set and x0Cx_0 \in C, then the set

V=Cx0={xx0xC}(1.2.3)\begin{array}{c} V = C - x_0 = \{x - x_0 | x \in C\} \tag{1.2.3} \end{array}

is a subspace, i.e.i.e., closed under sums and scalar multiplication. Thus, the affine set CC can be expressed as C=V+x0={v+x0vV}C = V + x_0 = \{v + x_0 | v \in V\}, i.e.i.e., as a subplace plus an offset. We define the dimension of an affine set CC as the dimension of the subspace V=Cx0V = C - x_0, where x0x_0 is any element of CC.

\qquadThe set of all affine combinations of points in some set CRnC \subseteq \boldsymbol{R}^n is called the affine hull of CC, and denoted aff C\textbf{aff } C:

aff C={θ1x1++θkxkx1,,xkC,θ1++θk=1}(1.2.4)\begin{array}{c} \textbf{aff } C = \{\theta_1 x_1 + \cdots + \theta_k x_k | x_1, \cdots, x_k \in C, \theta_1 + \cdots + \theta_k = 1\} \tag{1.2.4} \end{array}

The affine hull is the smallest affine set that contains CC, in the following sense: if SS is any affine set with CSC \subseteq S, then aff CS\textbf{aff } C \subseteq S.

\qquadWe define the affine dimension of a set CC as the dimension of its affine hull. Affine dimension is useful in the context of convex analysis and optimization, but is not always consistent with other definitions of dimension. If the affine dimension of a set CRnC \subseteq \boldsymbol{R}^n is less than nn, then the set lies in the affine set aff CRn\textbf{aff } C \neq \boldsymbol{R}^n. We define the releative interior of the set CC, denoted relint C\textbf{relint }C, as its interior relative to aff C\textbf{aff }C:

relint C={xCB(x,r)aff CC for some r>0}(1.2.5)\begin{array}{c} \textbf{relint } C = \{x \in C | B(x,r) \cap \textbf{aff } C \subseteq C \text{ for some } r \gt 0\} \tag{1.2.5} \end{array}

where B(x,r)={yyxr}B(x,r) = \{y | \parallel{y-x}\parallel \leq r \}, the ball of radius rr and center xx in the norm \parallel\cdot\parallel. (Here \parallel\cdot\parallel is any norm, all norms define the same relative interior.) We can the define the relative boundary of a set CC as cl C\relint C\textbf{cl }C \backslash \textbf{relint }C, where cl C\textbf{cl }C is the closure of CC.

1.3 Convex sets

\qquadA set CC is convex if the line segment between any two points in CC lies in CC, i.e.i.e., if for any x1,x2Cx_1, x_2 \in C and any θ\theta with 0θ10 \leq \theta \leq 1, we have

θx1+(1θ)x2C(1.3.1)\begin{array}{c} \theta x_1 + (1 - \theta) x_2 \in C \tag{1.3.1} \end{array}

Every affine set is also convex, since it contains the entire line between any two distinct points in it, and therefor also the line segment between the points.

\qquadWe call a point of the form θ1x1++θkxk\theta_1 x_1 + \cdots + \theta_k x_k, where θ1++θk=1\theta_1 + \cdots + \theta_k = 1 and θi0,i=1,,k\theta_i \geq 0, i=1,\cdots,k, a convex combination of the points x1,,xkx_1,\cdots,x_k. As with affine sets, it can be shown that a set is convex if and only if it contains every convex combination of its points.

\qquadThe convex hull of a set CC, denoted conv C\textbf{conv }C, is the set of all convex combinations of points in CC:

conv C={θ1x1++θkxkxiC,θi0,i=1,,k,θ1++θk=1}(1.3.2)\begin{array}{c} \textbf{conv }C = \{\theta_1 x_1 + \cdots + \theta_k x_k | x_i \in C, \theta_i \geq 0, i=1,\cdots,k, \theta_1 + \cdots + \theta_k = 1\} \tag{1.3.2} \end{array}

As the name suggests, the convex hull conv C\textbf{conv }C is always convex. It is the smallest convex set that contains CC: If BB is any convex set that contains CC, then conv CB\textbf{conv }C \subseteq B.

The idea of a convex combination can be generalized to include infinite sums, integrals, and, in the most general form, probability distributions. Suppose θ1,θ2,\theta_1,\theta_2,\cdots satisfy

θi0,i=1,2,,i=1θi=1\begin{array}{c} \theta_i \geq 0, i = 1,2,\cdots, \sum_{i=1}^{\infty} \theta_i = 1 \end{array}

and x1,x2,Cx_1,x_2,\cdots \in C, where CRnC \subseteq \boldsymbol{R}^n is convex. Then

i=1θixiC\begin{array}{c} \sum_{i=1}^{\infty} \theta_i x_i \in C \end{array}

if the series converges. More generally, suppose p:RnRp:\boldsymbol{R}^n \rightarrow \boldsymbol{R} satisfies p(x)0p(x) \geq 0 for all xCx \in C and Cp(x)dx=1\int_{C} p(x) dx = 1, where CRnC \subset \boldsymbol{R}^n is convex. Then

Cp(x)xdxC\begin{array}{c} \int_{C} p(x)x dx \in C \end{array}

if the integral exists.

\qquadIn the most general form, suppose CRnC \subseteq \boldsymbol{R}^n is convex and x is a random vector with xCx \in C with probability one. The ExC\boldsymbol{E}x \in C. Indeed, this form includes all the others as special cases.

2. Cones

\qquadA set CC is called a cone, or nonnegative homogeneous, if for every xCx \in C and θ0\theta \geq 0 we have

θxC(2.1)\begin{array}{c} \theta x \in C \tag{2.1} \end{array}

\qquadA set CC is a convex cone is it is convex and a cone, which means that for any x1,x2Cx_1,x_2 \in C and θ1,θ20\theta_1,\theta_2 \geq 0, we have

θ1x1+θ2x2C(2.2)\begin{array}{c} \theta_1 x_1 + \theta_2 x_2 \in C \tag{2.2} \end{array}

\qquadA point of the form θ1x1++θkxk\theta_1 x_1 + \cdots + \theta_k x_k with θ1,,θk0\theta_1, \cdots, \theta_k \geq 0 is called a conic combination (or a nonnegative linear combination) of x1,,xkx_1, \cdots, x_k. If xix_i are in a convex cone CC, then every conic combination of xix_i is in CC. Conversely, a set CC is a convex cone if and only if it contains all conic combinations of its elements.

\qquadThe conic hull of a set CC is the set of all conic combinations of points in CC, i.e.i.e.,

{θ1x1++θkxkxiC,θi0,i=1,,k}(2.3)\begin{array}{c} \{\theta_1 x_1 + \cdots + \theta_k x_k | x_i \in C, \theta_i \geq 0, i = 1, \cdots, k\} \tag{2.3} \end{array}

The conic hull is also the the smallest convex cone that contains CC.

3. Hyperplanes and halfspaces

3.1 Hyperplanes

\qquadA hyperplane is a set of the form

{xaTx=b}(3.1.1)\begin{array}{c} \{x | a^T x = b\} \tag{3.1.1} \end{array}

where aRn,a0a \in \boldsymbol{R}^n, a \neq 0 and bRb \in \boldsymbol{R}. Analytically it is the solution set of a nontrivial linear equation among the components of xx (and hence an affine set). The geometric interpretation can be understood by expressing the hyperplane in the form

{xaT(xx0)=0}(3.1.2)\begin{array}{c} \{x | a^T(x-x_0) = 0\} \tag{3.1.2} \end{array}

where x0x_0 is any point in the hyperplane (i.e.i.e., any point that satisfies aTx0=ba^Tx_0 = b). This representation can in turn be expressed as

{xaT(xx0)=0}=x0+a(3.1.3)\begin{array}{c} \{x | a^T(x-x_0) = 0\} = x_0 + a^{\bot} \tag{3.1.3} \end{array}

where aa^\bot denotes the orthogonal complement of aa, i.e.i.e., the set of all vectors orthogonal to it:

a={vaTv=0}(3.1.4)\begin{array}{c} a^\bot = \{v | a^T v = 0\} \tag{3.1.4} \end{array}

This shows that the hyperplane consists of an offset x0x_0, plus all vectors orthogonal to the vector aa.

3.2 Halfspaces

\qquadA hyperplane divides Rn\boldsymbol{R}^n into two halfspaces. A (closed) halfspace is a set of the form

{xaTxb}(3.2.1)\begin{array}{c} \{x | a^T x \leq b\} \tag{3.2.1} \end{array}

where a0a \neq 0, i.e.i.e., the solution set of one (nontrivial) linear inequality. Halfspaces are convex, but not affine. The halfspace can also be expressed as

{xaT(xx0)0}(3.2.2)\begin{array}{c} \{x | a^T (x-x_0) \leq 0\} \tag{3.2.2} \end{array}

Where x0x_0 is any point on the associated hyperplane, i.e.i.e., satisfies aTx0=ba^T x_0 = b. The representation (3.2.2)(3.2.2) suggest a simple geometric interpretation: the halfspace consists of x0x_0 plus any vector makes an obtuse (or right) angle with the vector aa. The boundary of the halfspace is the hyperplane {xaT=b}\{x | a^T = b\}. The set {xaTx<b}\{x | a^T x \lt b\}, which is the interior of the halfspace {xaTxb}\{x | a^T x \leq b\}, is called an open halfspace.

4. Euclidean balls and ellipsoids

4.1 Euclidean balls

\qquadA Euclidean ball (or just ball) in Rn\boldsymbol{R}^n has the form

B(xc,r)={xxxc2r}={x(xxc)T(xxc)r2}(4.1.1)\begin{array}{c} B(x_c,r) = \{x | \parallel{x - x_c}\parallel_2 \leq r\} = \{x | (x-x_c)^T(x-x_c) \leq r^2\} \tag{4.1.1} \end{array}

where r>0r \gt 0, and 2\parallel\cdot\parallel_2 denotes the Euclidean norm, i.e.i.e., u2=(uTu)12\parallel{u}\parallel_2 = (u^Tu)^{\frac{1}{2}}. The vector xcx_c is the center of the ball and the scalar rr is its radius. B(xc,r)B(x_c,r) consists of all points within a distance rr of the center xcx_c. Another common representation for Euclidean ball is

B(xc,r)={xc+ruu21}(4.1.2)\begin{array}{c} B(x_c,r) = \{x_c + ru | \parallel{u}\parallel_2 \leq 1\} \tag{4.1.2} \end{array}

A Euclidean ball is a convex set: if x1xc2r,x2xc2r\parallel x_1 - x_c \parallel_2 \leq r, \parallel x_2 - x_c \parallel_2 \leq r,and 0θ10 \leq \theta \leq 1, then

θx1+(1θ)x2xc2=θ(x1xc)+(1θ)(x2xc)2θx1xc2+(1θ)x2xc2triangle inequalityr\begin{array}{llll} \parallel \theta x_1 + (1-\theta)x_2 - x_c \parallel_2 & = & \parallel \theta(x_1 - x_c) + (1-\theta)(x_2 - x_c) \parallel_2 \\ & \leq & \theta \parallel x_1 - x_c \parallel_2 + (1-\theta) \parallel x_2 - x_c \parallel_2 & \text{triangle inequality} \\ & \leq & r \end{array}

4.2 Ellipsoids

\qquadA related family of convex sets is the ellipsoids, which have the form

E={x(xxc)TP1(xxc)1}(4.2.1)\begin{array}{c} \mathcal{E} = \{x | (x-x_c)^T P^{-1} (x-x_c) \leq 1 \} \tag{4.2.1} \end{array}

where P=PT0P = P^T \succ 0, i.e.i.e., PP is symmetric and positive definite. The vector xcRnx_c \in \boldsymbol{R}^n is the center of the ellipsoid. The matrix PP determines how far the ellipsoid extends in every direction from xcx_c. The lengths of the semi-axes of E\mathcal{E} are given by λi\sqrt{\lambda_i}, where λi\lambda_i are the eigenvalues of PP. A ball is an ellipsoid with P=r2IP = r^2 I. Another common representation of an ellipsoid is

E={xc+Auu21}(4.2.2)\begin{array}{c} \mathcal{E} = \{x_c + Au | \parallel u \parallel_2 \leq 1\} \tag{4.2.2} \end{array}

where AA is square and nonsingular. In this representation we can assume without loss of generality that AA is symmetric and positive definite. By taking A=P12A = P^{\frac{1}{2}}, this representation gives the ellipsoid defined in (4.2.1)(4.2.1). When the matrix AA in (4.2.2)(4.2.2) is symmetric positive semidefinite but singular, the set in (4.2.2)(4.2.2) is called a degenerate ellipsoid. Its affine dimension is equal to the rank of AA. Degenrate elliposids are also convex.

5. Norm balls and norm cones

5.1 Norm balls

\qquadSuppose \parallel\cdot\parallel is any norm on Rn\boldsymbol{R}^n. From the general properties of norms it can be shown that a norm ball of radius rr and center xcx_c, given by

{xxxcr}(5.1.1)\begin{array}{c} \{x | \parallel x-x_c \parallel \leq r\} \tag{5.1.1} \end{array}

Norm ball is convex.

5.2 Norm cones

\qquadThe norm cone associated with the norm \parallel\cdot\parallel is the set

C={(x,t)xt}Rn+1(5.2.1)\begin{array}{c} C = \{(x,t) | \parallel x \parallel \leq t\} \subseteq \boldsymbol{R}^{n+1} \tag{5.2.1} \end{array}

Norm cone is also a convex cone.

6. Polyhedra

\qquadA polyhedron is defined as the solution set of a finite number of linear equalities and inequalities:

P={xajTxbj,j=1,,m,cjTx=dj,j=1,,p}(6.1)\begin{array}{c} \mathcal{P} = \{x | a_j^T x \leq b_j, j = 1, \cdots, m, c_j^T x = d_j, j = 1, \cdots, p\} \tag{6.1} \end{array}

A polyhedron is thus the intersection of a finite number of halfspaces and hyperplanes. Affine sets, rays, line segments, and halfspaces are all polyhedra. It is easily shown that polyhedra are convex sets. A bounded polyhedron is sometimes called a polytope, but some authors use the opposite convention (i.e.i.e., polytope for any set of the form (6.1)(6.1), and polyhedron when it is bounded). It will be convenient to use the compact notation

P={xAxb,Cx=d}A=[a1TamT],C=[c1TcpT](6.2)\begin{array}{c} \mathcal{P} = \{x | Ax \preceq b, Cx = d\} \\ A = \left[\begin{matrix} a_1^T \\ \vdots \\ a_m^T \end{matrix}\right], C = \left[\begin{matrix} c_1^T \\ \vdots \\ c_p^T \end{matrix}\right] \tag{6.2} \end{array}

where the symbol \preceq denotes vector inequality or componentwise inequality in Rm\boldsymbol{R}^m: uvu \preceq v means uiviu_i \leq v_i for i=1,,mi = 1, \cdots, m.

7. Simplexes

\qquadSimplexes are another important family of polyhedra. Suppose the k+1k+1 points v0,,vkRnv_0,\cdots,v_k \in \boldsymbol{R}^n are affinely independent, which means v1v0,,vkv0v_1-v_0, \cdots, v_k-v_0 are linearly independent. The simplex determined by them is given by

C=conv{v0,,vk}={θ0v0++θkvkθ0,1Tθ=1}(7.1)\begin{array}{c} C = \textbf{conv}\{v_0,\cdots,v_k\} = \{\theta_0 v_0 + \cdots + \theta_k v_k | \theta \succeq 0, \boldsymbol{1}^T \theta = 1\} \tag{7.1} \end{array}

Where 1\boldsymbol{1} denotes the vector with all entries one. The affine dimension of this simplex is kk, so it is sometimes referred to as a kk-dimensional simplex in Rn\boldsymbol{R}^n.

8. Positive semidefinite cone

\qquadWe use the notation Sn\boldsymbol{S}^n to denote the set of symmetric n×nn \times n matrices,

Sn={XRn×nX=XT}(8.1)\begin{array}{c} \boldsymbol{S}^n = \{X \in \boldsymbol{R}^{n \times n} | X = X^T\} \tag{8.1} \end{array}

which is a vector space with dimension n(n+1)2\frac{n(n+1)}{2}. We use the notation S+n\boldsymbol{S}_+^n to denote the set of symmetric positive semidefinite matrices:

S+n={XSnX0}(8.2)\begin{array}{c} \boldsymbol{S}_+^n = \{X \in \boldsymbol{S}^n | X \succeq 0\} \tag{8.2} \end{array}

and the notation S++n\boldsymbol{S}_{++}^n to denote the set of symmetric positive definite matrices:

S++n={XSnX0}(8.3)\begin{array}{c} \boldsymbol{S}_{++}^n = \{X \in \boldsymbol{S}^n | X \succeq 0\} \tag{8.3} \end{array}

(This notation is meant to be analogous to R+\boldsymbol{R}_+, which denotes the nonnegative reals, and R++\boldsymbol{R}_{++}, which denotes the positive reals.)

\qquadThe set S+n\boldsymbol{S}_+^n is a positive semidefinite cone. Obviously, S+n\boldsymbol{S}_+^n is also a convex cone: if θ1,θ20\theta_1,\theta_2 \geq 0 and A,BS+nA,B \in \boldsymbol{S}_+^n, then θ1A+θ2BS+n\theta_1 A + \theta_2 B \in \boldsymbol{S}_+^n. This can be seen directly from the definition of positive semidefiniteness: for any xRnx \in \boldsymbol{R}^n, we have

xT(θ1A+θ2B)x=θ1xTAx+θ2xTBx0(8.4)\begin{array}{c} x^T(\theta_1 A + \theta_2 B)x = \theta_1 x^T A x + \theta_2 x^T B x \geq 0 \tag{8.4} \end{array}

if A0,B0A \succeq 0, B \succeq 0 and θ1,θ20\theta_1, \theta_2 \geq 0.