Convex_Sets

Study notes from Convex Optimization by Stephen Boyd, Lieven Vandenberghe.

1. Affine and convex sets

1.1 Lines and line segments

$\qquad$ Suppose $x_1 \neq x_2$ are two points in $\boldsymbol{R}^n$ . Points of the form

$\begin{array}{c} y = \theta x_1 + (1-\theta) x_2 \tag{1.1.1} \end{array}$

where $\theta \in \boldsymbol{R}$ ，form the line passing through $x_1$ and $x_2$ . The parameter value $\theta = 0$ corresponds to $y = x_2$ , and the parameter value $\theta = 1$ corresponds to $y = x_1$ . Values of the parameter $\theta$ between $0$ and $1$ corresponds to the line segment between $x_1$ and $x_2$ .

1.2 Affine sets

$\qquad$ A set $C \subseteq \boldsymbol{R}^n$ is affine sets if the line through any two distinct points in $C$ lies in $C$ , $i.e.$ ,

$\begin{array}{rl} \text{if} & \forall x_1, x_2 \in C, \theta \in \boldsymbol{R} \\ \text{then} & \theta x_1 + (1-\theta) x_2 \in C \tag{1.2.1} \end{array}$

$\qquad$ In other words, $C$ contains the linear combination of any two points in $C$ , provided the coefficients in the linear combination sum to one. This idea can be generalized to more than two points. We refer to a point of the form $\theta x_1 + \cdots + \theta_k x_k$ , where $\theta_1 + \cdots + \theta_k = 1$ , as an affine combination of the points $x_1, \cdots, x_k$ .

$\begin{array}{rl} \text{if} & \forall x_1,\cdots,x_k \in C, \theta_1,\cdots,\theta_k \in R \\ & \theta_1 + \cdots + \theta_k = 1 \\ \text{then} & \theta_1 x_1 + \cdots + \theta_k x_k \in C \tag{1.2.2} \end{array}$

$\qquad$ Using induction from the definition of affine, it can be shown that an affine set contains every affine combination of its points: if $C$ is an affine set, $x_1, \cdots, x_k \in C$ , and $\theta_1 + \cdots + \theta_k = 1$ , the the point $\theta_1 x_1 + \cdots + \theta_k x_k \in C$ . If $C$ is an affine set and $x_0 \in C$ , then the set

$\begin{array}{c} V = C - x_0 = \{x - x_0 | x \in C\} \tag{1.2.3} \end{array}$

is a subspace, $i.e.$ , closed under sums and scalar multiplication. Thus, the affine set $C$ can be expressed as $C = V + x_0 = \{v + x_0 | v \in V\}$ , $i.e.$ , as a subplace plus an offset. We define the dimension of an affine set $C$ as the dimension of the subspace $V = C - x_0$ , where $x_0$ is any element of $C$ .

$\qquad$ The set of all affine combinations of points in some set $C \subseteq \boldsymbol{R}^n$ is called the affine hull of $C$ , and denoted $\textbf{aff } C$ :

$\begin{array}{c} \textbf{aff } C = \{\theta_1 x_1 + \cdots + \theta_k x_k | x_1, \cdots, x_k \in C, \theta_1 + \cdots + \theta_k = 1\} \tag{1.2.4} \end{array}$

The affine hull is the smallest affine set that contains $C$ , in the following sense: if $S$ is any affine set with $C \subseteq S$ , then $\textbf{aff } C \subseteq S$ .

$\qquad$ We define the affine dimension of a set $C$ as the dimension of its affine hull. Affine dimension is useful in the context of convex analysis and optimization, but is not always consistent with other definitions of dimension. If the affine dimension of a set $C \subseteq \boldsymbol{R}^n$ is less than $n$ , then the set lies in the affine set $\textbf{aff } C \neq \boldsymbol{R}^n$ . We define the releative interior of the set $C$ , denoted $\textbf{relint }C$ , as its interior relative to $\textbf{aff }C$ :

$\begin{array}{c} \textbf{relint } C = \{x \in C | B(x,r) \cap \textbf{aff } C \subseteq C \text{ for some } r \gt 0\} \tag{1.2.5} \end{array}$

where $B(x,r) = \{y | \parallel{y-x}\parallel \leq r \}$ , the ball of radius $r$ and center $x$ in the norm $\parallel\cdot\parallel$ . (Here $\parallel\cdot\parallel$ is any norm, all norms define the same relative interior.) We can the define the relative boundary of a set $C$ as $\textbf{cl }C \backslash \textbf{relint }C$ , where $\textbf{cl }C$ is the closure of $C$ .

1.3 Convex sets

$\qquad$ A set $C$ is convex if the line segment between any two points in $C$ lies in $C$ , $i.e.$ , if for any $x_1, x_2 \in C$ and any $\theta$ with $0 \leq \theta \leq 1$ , we have

$\begin{array}{c} \theta x_1 + (1 - \theta) x_2 \in C \tag{1.3.1} \end{array}$

Every affine set is also convex, since it contains the entire line between any two distinct points in it, and therefor also the line segment between the points.

$\qquad$ We call a point of the form $\theta_1 x_1 + \cdots + \theta_k x_k$ , where $\theta_1 + \cdots + \theta_k = 1$ and $\theta_i \geq 0, i=1,\cdots,k$ , a convex combination of the points $x_1,\cdots,x_k$ . As with affine sets, it can be shown that a set is convex if and only if it contains every convex combination of its points.

$\qquad$ The convex hull of a set $C$ , denoted $\textbf{conv }C$ , is the set of all convex combinations of points in $C$ :

$\begin{array}{c} \textbf{conv }C = \{\theta_1 x_1 + \cdots + \theta_k x_k | x_i \in C, \theta_i \geq 0, i=1,\cdots,k, \theta_1 + \cdots + \theta_k = 1\} \tag{1.3.2} \end{array}$

As the name suggests, the convex hull $\textbf{conv }C$ is always convex. It is the smallest convex set that contains $C$ : If $B$ is any convex set that contains $C$ , then $\textbf{conv }C \subseteq B$ .

The idea of a convex combination can be generalized to include infinite sums, integrals, and, in the most general form, probability distributions. Suppose $\theta_1,\theta_2,\cdots$ satisfy

$\begin{array}{c} \theta_i \geq 0, i = 1,2,\cdots, \sum_{i=1}^{\infty} \theta_i = 1 \end{array}$

and $x_1,x_2,\cdots \in C$ , where $C \subseteq \boldsymbol{R}^n$ is convex. Then

$\begin{array}{c} \sum_{i=1}^{\infty} \theta_i x_i \in C \end{array}$

if the series converges. More generally, suppose $p:\boldsymbol{R}^n \rightarrow \boldsymbol{R}$ satisfies $p(x) \geq 0$ for all $x \in C$ and $\int_{C} p(x) dx = 1$ , where $C \subset \boldsymbol{R}^n$ is convex. Then

$\begin{array}{c} \int_{C} p(x)x dx \in C \end{array}$

if the integral exists.

$\qquad$ In the most general form, suppose $C \subseteq \boldsymbol{R}^n$ is convex and $x$ is a random vector with $x \in C$ with probability one. The $\boldsymbol{E}x \in C$ . Indeed, this form includes all the others as special cases.

2. Cones

$\qquad$ A set $C$ is called a cone, or nonnegative homogeneous, if for every $x \in C$ and $\theta \geq 0$ we have

$\begin{array}{c} \theta x \in C \tag{2.1} \end{array}$

$\qquad$ A set $C$ is a convex cone is it is convex and a cone, which means that for any $x_1,x_2 \in C$ and $\theta_1,\theta_2 \geq 0$ , we have

$\begin{array}{c} \theta_1 x_1 + \theta_2 x_2 \in C \tag{2.2} \end{array}$

$\qquad$ A point of the form $\theta_1 x_1 + \cdots + \theta_k x_k$ with $\theta_1, \cdots, \theta_k \geq 0$ is called a conic combination (or a nonnegative linear combination) of $x_1, \cdots, x_k$ . If $x_i$ are in a convex cone $C$ , then every conic combination of $x_i$ is in $C$ . Conversely, a set $C$ is a convex cone if and only if it contains all conic combinations of its elements.

$\qquad$ The conic hull of a set $C$ is the set of all conic combinations of points in $C$ , $i.e.$ ,

$\begin{array}{c} \{\theta_1 x_1 + \cdots + \theta_k x_k | x_i \in C, \theta_i \geq 0, i = 1, \cdots, k\} \tag{2.3} \end{array}$

The conic hull is also the the smallest convex cone that contains $C$ .

3. Hyperplanes and halfspaces

3.1 Hyperplanes

$\qquad$ A hyperplane is a set of the form

$\begin{array}{c} \{x | a^T x = b\} \tag{3.1.1} \end{array}$

where $a \in \boldsymbol{R}^n, a \neq 0$ and $b \in \boldsymbol{R}$ . Analytically it is the solution set of a nontrivial linear equation among the components of $x$ (and hence an affine set). The geometric interpretation can be understood by expressing the hyperplane in the form

$\begin{array}{c} \{x | a^T(x-x_0) = 0\} \tag{3.1.2} \end{array}$

where $x_0$ is any point in the hyperplane ( $i.e.$ , any point that satisfies $a^Tx_0 = b$ ). This representation can in turn be expressed as

$\begin{array}{c} \{x | a^T(x-x_0) = 0\} = x_0 + a^{\bot} \tag{3.1.3} \end{array}$

where $a^\bot$ denotes the orthogonal complement of $a$ , $i.e.$ , the set of all vectors orthogonal to it:

$\begin{array}{c} a^\bot = \{v | a^T v = 0\} \tag{3.1.4} \end{array}$

This shows that the hyperplane consists of an offset $x_0$ , plus all vectors orthogonal to the vector $a$ .

3.2 Halfspaces

$\qquad$ A hyperplane divides $\boldsymbol{R}^n$ into two halfspaces. A (closed) halfspace is a set of the form

$\begin{array}{c} \{x | a^T x \leq b\} \tag{3.2.1} \end{array}$

where $a \neq 0$ , $i.e.$ , the solution set of one (nontrivial) linear inequality. Halfspaces are convex, but not affine. The halfspace can also be expressed as

$\begin{array}{c} \{x | a^T (x-x_0) \leq 0\} \tag{3.2.2} \end{array}$

Where $x_0$ is any point on the associated hyperplane, $i.e.$ , satisfies $a^T x_0 = b$ . The representation $(3.2.2)$ suggest a simple geometric interpretation: the halfspace consists of $x_0$ plus any vector makes an obtuse (or right) angle with the vector $a$ . The boundary of the halfspace is the hyperplane $\{x | a^T = b\}$ . The set $\{x | a^T x \lt b\}$ , which is the interior of the halfspace $\{x | a^T x \leq b\}$ , is called an open halfspace.

4. Euclidean balls and ellipsoids

4.1 Euclidean balls

$\qquad$ A Euclidean ball (or just ball) in $\boldsymbol{R}^n$ has the form

$\begin{array}{c} B(x_c,r) = \{x | \parallel{x - x_c}\parallel_2 \leq r\} = \{x | (x-x_c)^T(x-x_c) \leq r^2\} \tag{4.1.1} \end{array}$

where $r \gt 0$ , and $\parallel\cdot\parallel_2$ denotes the Euclidean norm, $i.e.$ , $\parallel{u}\parallel_2 = (u^Tu)^{\frac{1}{2}}$ . The vector $x_c$ is the center of the ball and the scalar $r$ is its radius. $B(x_c,r)$ consists of all points within a distance $r$ of the center $x_c$ . Another common representation for Euclidean ball is

$\begin{array}{c} B(x_c,r) = \{x_c + ru | \parallel{u}\parallel_2 \leq 1\} \tag{4.1.2} \end{array}$

A Euclidean ball is a convex set: if $\parallel x_1 - x_c \parallel_2 \leq r, \parallel x_2 - x_c \parallel_2 \leq r$ ,and $0 \leq \theta \leq 1$ , then

$\begin{array}{llll} \parallel \theta x_1 + (1-\theta)x_2 - x_c \parallel_2 & = & \parallel \theta(x_1 - x_c) + (1-\theta)(x_2 - x_c) \parallel_2 \\ & \leq & \theta \parallel x_1 - x_c \parallel_2 + (1-\theta) \parallel x_2 - x_c \parallel_2 & \text{triangle inequality} \\ & \leq & r \end{array}$

4.2 Ellipsoids

$\qquad$ A related family of convex sets is the ellipsoids, which have the form

$\begin{array}{c} \mathcal{E} = \{x | (x-x_c)^T P^{-1} (x-x_c) \leq 1 \} \tag{4.2.1} \end{array}$

where $P = P^T \succ 0$ , $i.e.$ , $P$ is symmetric and positive definite. The vector $x_c \in \boldsymbol{R}^n$ is the center of the ellipsoid. The matrix $P$ determines how far the ellipsoid extends in every direction from $x_c$ . The lengths of the semi-axes of $\mathcal{E}$ are given by $\sqrt{\lambda_i}$ , where $\lambda_i$ are the eigenvalues of $P$ . A ball is an ellipsoid with $P = r^2 I$ . Another common representation of an ellipsoid is

$\begin{array}{c} \mathcal{E} = \{x_c + Au | \parallel u \parallel_2 \leq 1\} \tag{4.2.2} \end{array}$

where $A$ is square and nonsingular. In this representation we can assume without loss of generality that $A$ is symmetric and positive definite. By taking $A = P^{\frac{1}{2}}$ , this representation gives the ellipsoid defined in $(4.2.1)$ . When the matrix $A$ in $(4.2.2)$ is symmetric positive semidefinite but singular, the set in $(4.2.2)$ is called a degenerate ellipsoid. Its affine dimension is equal to the rank of $A$ . Degenrate elliposids are also convex.

5. Norm balls and norm cones

5.1 Norm balls

$\qquad$ Suppose $\parallel\cdot\parallel$ is any norm on $\boldsymbol{R}^n$ . From the general properties of norms it can be shown that a norm ball of radius $r$ and center $x_c$ , given by

$\begin{array}{c} \{x | \parallel x-x_c \parallel \leq r\} \tag{5.1.1} \end{array}$

Norm ball is convex.

5.2 Norm cones

$\qquad$ The norm cone associated with the norm $\parallel\cdot\parallel$ is the set

$\begin{array}{c} C = \{(x,t) | \parallel x \parallel \leq t\} \subseteq \boldsymbol{R}^{n+1} \tag{5.2.1} \end{array}$

Norm cone is also a convex cone.

6. Polyhedra

$\qquad$ A polyhedron is defined as the solution set of a finite number of linear equalities and inequalities:

$\begin{array}{c} \mathcal{P} = \{x | a_j^T x \leq b_j, j = 1, \cdots, m, c_j^T x = d_j, j = 1, \cdots, p\} \tag{6.1} \end{array}$

A polyhedron is thus the intersection of a finite number of halfspaces and hyperplanes. Affine sets, rays, line segments, and halfspaces are all polyhedra. It is easily shown that polyhedra are convex sets. A bounded polyhedron is sometimes called a polytope, but some authors use the opposite convention ( $i.e.$ , polytope for any set of the form $(6.1)$ , and polyhedron when it is bounded). It will be convenient to use the compact notation

$\begin{array}{c} \mathcal{P} = \{x | Ax \preceq b, Cx = d\} \\ A = \left[\begin{matrix} a_1^T \\ \vdots \\ a_m^T \end{matrix}\right], C = \left[\begin{matrix} c_1^T \\ \vdots \\ c_p^T \end{matrix}\right] \tag{6.2} \end{array}$

where the symbol $\preceq$ denotes vector inequality or componentwise inequality in $\boldsymbol{R}^m$ : $u \preceq v$ means $u_i \leq v_i$ for $i = 1, \cdots, m$ .

7. Simplexes

$\qquad$ Simplexes are another important family of polyhedra. Suppose the $k+1$ points $v_0,\cdots,v_k \in \boldsymbol{R}^n$ are affinely independent, which means $v_1-v_0, \cdots, v_k-v_0$ are linearly independent. The simplex determined by them is given by

$\begin{array}{c} C = \textbf{conv}\{v_0,\cdots,v_k\} = \{\theta_0 v_0 + \cdots + \theta_k v_k | \theta \succeq 0, \boldsymbol{1}^T \theta = 1\} \tag{7.1} \end{array}$

Where $\boldsymbol{1}$ denotes the vector with all entries one. The affine dimension of this simplex is $k$ , so it is sometimes referred to as a $k$ -dimensional simplex in $\boldsymbol{R}^n$ .

8. Positive semidefinite cone

$\qquad$ We use the notation $\boldsymbol{S}^n$ to denote the set of symmetric $n \times n$ matrices,

$\begin{array}{c} \boldsymbol{S}^n = \{X \in \boldsymbol{R}^{n \times n} | X = X^T\} \tag{8.1} \end{array}$

which is a vector space with dimension $\frac{n(n+1)}{2}$ . We use the notation $\boldsymbol{S}_+^n$ to denote the set of symmetric positive semidefinite matrices:

$\begin{array}{c} \boldsymbol{S}_+^n = \{X \in \boldsymbol{S}^n | X \succeq 0\} \tag{8.2} \end{array}$

and the notation $\boldsymbol{S}_{++}^n$ to denote the set of symmetric positive definite matrices:

$\begin{array}{c} \boldsymbol{S}_{++}^n = \{X \in \boldsymbol{S}^n | X \succeq 0\} \tag{8.3} \end{array}$

(This notation is meant to be analogous to $\boldsymbol{R}_+$ , which denotes the nonnegative reals, and $\boldsymbol{R}_{++}$ , which denotes the positive reals.)

$\qquad$ The set $\boldsymbol{S}_+^n$ is a positive semidefinite cone. Obviously, $\boldsymbol{S}_+^n$ is also a convex cone: if $\theta_1,\theta_2 \geq 0$ and $A,B \in \boldsymbol{S}_+^n$ , then $\theta_1 A + \theta_2 B \in \boldsymbol{S}_+^n$ . This can be seen directly from the definition of positive semidefiniteness: for any $x \in \boldsymbol{R}^n$ , we have