Properties of (the Inverse of) the Gradient of a (Strictly) Convex \(\mathrm{C}^{1}\) Functions

In this short note, I present the characterizations of convex, continuously differentiable functions in a Eucledean space \(\mathbb{R}^{n}\) as the functions whose derivatives are monotone. If it is strictly convex, then its derivatives are also strictly monotone and are one-to-one mappings. This note is the supplemental material of Section 3.3 of the paper [5] since it is obvious by the results that if a strictly convex \(\mathrm{C}^{1}\) function \(S|_{\mathsf{int}}:\mathcal{U}_{\mathsf{int}}\to\mathbb{R}\) with an \(m\)-dim. manifold \(\mathcal{U}_{\mathsf{int}}\subseteq\mathbb{R}^{m}\) satisfies \(\mathrm{Im}(\nabla S|_{\mathsf{int}}^{\mathsf{T}})=\mathbb{R}^{m}\), then

  1. 1.

    its gradient \(\nabla S|_{\mathsf{int}}\) is strictly monotone and bijective (by Theorem 1 below);

  2. 2.

    its inverse \(\sigma\doteq\big{(}\nabla S|_{\mathsf{int}}^{\mathsf{T}}\big{)}^{-1}\) exists and is also strictly monotone, continuous, and bijective (by Theorem 3 below).

Throughout the note, we let \(D\) be a given open subset of \(\mathbb{R}^{m}\).

(Strictly) Convex \(\mathrm{C}^{1}\) Functions

As a first step, define convex sets and convex functions as follows:

Definition 1. \(D\) is convex if:

\begin{equation} x,y\in D\quad\Longrightarrow\quad tx+(1-t)y\in D\;\,\forall t\in[0,1].\nonumber \\ \end{equation}

Definition 2. A function \(f:D\to\mathbb{R}\) is convex if

  1. 1)

    \(D\) is convex;

  2. 2)

    for any \(x,y\in D\): \(\quad f(tx+(1-t)y)\leq t\,f(x)+(1-t)\,f(y)\qquad\forall t\in[0,1]\).

\(\;\,f\) is said to be strictly convex if 2) is replaced by

  1. \(\;\,\)2\({}^{\prime}\))

    for any \(x,y\in D\) such that \(x\neq y\), \(\quad f(tx+(1-t)y)<t\,f(x)+(1-t)\,f(y)\qquad\forall t\in(0,1)\).

By a \(\mathrm{C}^{1}\) function, it is meant a function \(f:D\to\mathbb{R}\) whose first partial derivatives exist and all continuous. The following lemma is essential in the proof of the theorems.

Lemma 1. Suppose \(f:D\to\mathbb{R}\) is convex and \(\mathrm{C}^{1}\). Then, for any \(x\), \(y\in D\),

\begin{equation} f(y)\geq f(x)+\nabla f(x)(y-x).\nonumber \\ \end{equation}

Moreover, if \(f\) is strictly convex, then for any \(x\), \(y\in D\) such that \(x\neq y\),

\begin{equation} f(y)>f(x)+\nabla f(x)(y-x).\nonumber \\ \end{equation}

Monotonicity of the Gradients

Here, we show that the gradient of a convex (resp. strictly convex) \(\mathrm{C}^{1}\) function \(f:D\to\mathbb{R}\) is monotone (resp. strictly monotone). For this, we precisely define the monotone function as follows.

Definition 3. A map \(F:D\to\mathbb{R}^{m}\) on a convex open subset \(D\) of \(\mathbb{R}^{m}\) is monotone if for all \(x\), \(y\in D\),

\begin{equation} (y-x)^{\mathsf{T}}(F(y)-F(x))\geq 0.\nonumber \\ \end{equation}

\(F\) is said to be strictly monotone if for all \(x\), \(y\in D\) such that \(x\neq y\),

\begin{equation} (y-x)^{\mathsf{T}}(F(y)-F(x))>0.\nonumber \\ \end{equation}

Theorem 1. Let \(f:D\to\mathbb{R}\) be \(\mathrm{C}^{1}\). Then, its gradient \(\nabla f^{\mathsf{T}}:D\to\mathbb{R}^{m}\) is monotone (resp. strictly monotone) if \(f\) is convex (resp. strictly convex).

Proof. For any \(x\), \(y\in D\), we have

\begin{equation} f(y)\geq f(x)+\nabla f(x)(y-x)\nonumber \\ \end{equation}

by Lemma 1. By choosing the points reverse, we also have

\begin{equation} f(x)\geq f(y)+\nabla f(y)(x-y)\nonumber \\ \end{equation}

Adding these inequalities and rearranging it finally yield

\begin{equation} (\nabla f(y)-\nabla f(x))(y-x)\geq 0.\nonumber \\ \end{equation}

The proof of the strictly convex case can be done in the exactly same manner with the strict inequality ‘\(>\)’ rather than ‘\(\geq\).’ ∎

Properties of the Gradient and its Inverse

When \(f:D\to\mathbb{R}\) is strictly convex and \(\mathrm{C}^{1}\) function, then its gradient \(\nabla f^{\mathsf{T}}:D\to\mathbb{R}^{m}\) is an injective mapping. This can be shown via applying the following lemma:

Lemma 2. A map \(F:D\to\mathbb{R}^{m}\) is injective if it is strictly monotone.

Proof. The proof can be easily done by contradiction. Suppose there exist \(x\), \(y\in D\) such that \(F(x)=F(y)\) but \(x\neq y\). Then, we can directly see the contradiction from the definition of the strict monotonicity as

\begin{equation} 0=(y-x)^{\mathsf{T}}\mathbf{0}=(y-x)^{\mathsf{T}}(F(y)-F(x))>0,\nonumber \\ \end{equation}

which implies “\(0>0\),” where \(\mathbf{0}\) is the zero vector in \(\mathbb{R}^{m}\). Hence, if \(x\), \(y\in D\) satisfies \(F(x)=F(y)\), then \(x=y\), so that \(F\) is injective. ∎

Theorem 2. Suppose \(f:D\to\mathbb{R}\) is strictly convex and \(\mathrm{C}^{1}\). Then, its gradient \(\nabla f^{\mathsf{T}}\) is an injective mapping.

Proof. By Theorem 1, \(\nabla f^{\mathsf{T}}\) is strictly monotone, and hence it is injective by Lemma 2. ∎

The key point given by Theorem 2 is that if \(\mathrm{Im}(\nabla f^{\mathsf{T}})=\mathbb{R}^{m}\), then \(\nabla f\) is now bijective, so that the inverse function \((\nabla f^{\mathsf{T}})^{-1}:\mathbb{R}^{m}\to D\) is defined over \(\mathbb{R}^{m}\). This inverse function has the following properties.

Theorem 3. Suppose \(f:D\to\mathbb{R}\) is strictly convex and \(\mathrm{C}^{1}\). If \(\mathrm{Im}(\nabla f^{\mathsf{T}})=\mathbb{R}^{m}\), then

  1. 1.

    \(\nabla f\) is bijective and the inverse function \((\nabla f^{\mathsf{T}})^{-1}:\mathbb{R}^{m}\to\mathcal{D}\) is defined over \(\mathbb{R}^{m}\);

  2. 2.

    the inverse \((\nabla f^{\mathsf{T}})^{-1}\) is also strictly monotone, bijective, and continuous.

Proof. The first argument and the fact that the inverse \((\nabla f^{\mathsf{T}})^{-1}\) is also bijective are obvious by the above discussion. To show strict monotonicity, suppose \(z,w\in\mathbb{R}^{m}\) and let \(x^{\mathsf{T}}\doteq(\nabla f)^{-1}(z)\) and \(y^{\mathsf{T}}\doteq(\nabla f)^{-1}(w)\). Then, since \(\nabla f\) (and \((\nabla f)^{-1}\)) is bijective, we have \(z=\nabla f^{\mathsf{T}}(x)\) and \(w=\nabla f^{\mathsf{T}}(y)\). Therefore, by strict monotonicity of \(\nabla f\) (Theorem 1), we conclude that

\begin{equation} \big{(}(\nabla f)^{-1}(z)-(\nabla f)^{-1}(w)\big{)}(z-w)=(x-y)^{\mathsf{T}}(\nabla f^{\mathsf{T}}(x)-\nabla f^{\mathsf{T}}(y))>0,\nonumber \\ \end{equation}

meaning that \((\nabla f^{\mathsf{T}})^{-1}\) is also strictly monotone. Finally, as \(\nabla f^{\mathsf{T}}\) is bijective and continuous, the application of invariance of domain [4] proves continuity of \((\nabla f^{\mathsf{T}})^{-1}\).     ∎