# MATH stuff

### Complex derivative

Here we provide a definition for the ’complex’ derivative of a real-valued function $$f : {\mathbb{C}}^n \to {\mathbb{R}}$$ with respect to its complex variables. The notation $$f : {\mathbb{C}}^n \to {\mathbb{R}}$$ means “$$f$$ is a mapping (or function) from the set of column vectors of size $$n$$ with complex components (denoted $${\mathbb{C}}^n$$) into the set of real numbers (denoted $${\mathbb{R}}$$).”

The complex derivative of $$x = a + jb \in {\mathbb{C}}$$, $$a,b \in {\mathbb{R}}$$, is defined as $\frac{dx}{dx} = \frac{dx}{da} + j\frac{dx}{db}.$

#### Example 1.

Given $$x = a + jb \in {\mathbb{C}}$$, $$a,b \in {\mathbb{R}}$$, What is $$D|x|$$?

Solution:
We have $|x| = \sqrt{x^*x} = \sqrt{(a-jb)(a+jb)} = \sqrt{a^2 + b^2}. \nonumber$ Applying the definition of the complex derivative yields \begin{aligned} \frac{d|x|}{dx} &=& \frac{d|x|}{da} + j\frac{d|x|}{db} \nonumber\\ &=& \frac{2a}{2\sqrt{a^2 + b^2}} + j\frac{2b}{2\sqrt{a^2 + b^2}} \nonumber\\ &=& \frac{a}{\sqrt{a^2 + b^2}} + j\frac{b}{\sqrt{a^2 + b^2}} \nonumber\\ &=& \frac{x}{|x|}. \nonumber\end{aligned}

#### Example 2.

Given $$x = a + jb \in {\mathbb{C}}$$, $$a,b \in {\mathbb{R}}$$, What is $$D|x|^2$$?

Solution:
We have $|x|^2 = x^*x = (a-jb)(a+jb) = a^2 + b^2. \nonumber$ Applying the definition of the complex derivative yields \begin{aligned} \frac{d|x|^2}{dx} &=& \frac{d|x|^2}{da} + j\frac{d|x|^2}{db} \nonumber\\ &=& 2a + j2b \nonumber\\ &=& 2x. \nonumber\end{aligned} Suppose $$f: {\mathbb{C}}^n \to {\mathbb{R}}$$ is a real-valued function and $$x \in {\mathop{\bf int}}{\mathop{\bf dom}}f$$. The derivative $$Df(x)$$ is a $$1 \times n$$ matrix (a row vector), defined by $\label{eqn:derivative} Df(x) = \left[ \frac{\partial f}{\partial x_1}(x), \dots, \frac{\partial f}{\partial x_n}(x) \right].$

#### Example 3.

Given $$x = [x_1, \ldots, x_n]^T \in {\mathbb{C}}^n$$ with $$x_i = a_i + jb_i \in {\mathbb{C}}$$, $$a_i,b_i \in {\mathbb{R}}$$, What is $$D\|x\|_{\ell_2}^2$$?

Solution:
We have \begin{aligned} \|x\|_{\ell_2}^2 &=& \sum_{i=1}^n |x_i|^2 = \sum_{i=1}^n x_i^*x_i \nonumber\\ &=& \sum_{i=1}^n (a_i +jb_i)^*(a_i +jb_i) \nonumber\\ &=& \sum_{i=1}^n (a_i -jb_i)(a_i +jb_i) \nonumber\\ &=& \sum_{i=1}^n (a_i^2 +b_i^2). \nonumber\end{aligned} We first look at the first element of Equation \ref{eqn:derivative} with $$f(x) = \|x\|_{\ell_2}^2$$. Applying the definition of the complex derivative gives \begin{aligned} \frac{\partial \|x\|_{\ell_2}^2}{\partial x_1} &=& \frac{\partial \|x\|_{\ell_2}^2}{\partial a_1} + j\frac{\partial \|x\|_{\ell_2}^2}{\partial b_1} \nonumber\\ &=& \frac{\partial }{\partial a_1} \left(\sum_{i=1}^n (a_i^2 +b_i^2)\right) + j\frac{\partial }{\partial b_1} \left(\sum_{i=1}^n (a_i^2 +b_i^2)\right) \nonumber\\ &=& 2a_1 + j2b_1 \nonumber\\ &=& 2x_1. \nonumber\end{aligned} Therefore, it follows that \begin{aligned} Df(x) &=& \left[ \frac{\partial \|x\|_{\ell_2}^2}{\partial x_1}, \dots, \frac{\partial \|x\|_{\ell_2}^2}{\partial x_n} \right] \nonumber\\ &=& \left[2x_1, \ldots, 2x_n \right] \nonumber\\ &=& 2x^T. \nonumber\end{aligned}

#### Example 4.

Suppose $$A \in {\mathbb{C}}^{m \times n}$$, and $$x = [x_1, \ldots, x_n]^T \in {\mathbb{C}}^n$$ with $$x_i = a_i + jb_i \in {\mathbb{C}}$$, $$a_i,b_i \in {\mathbb{R}}$$. What is $$D(Ax)$$?

Solution:
Since $$f(x) = Ax : {\mathbb{C}}^n \to {\mathbb{C}}^m$$, we have $D(Ax) = \left[ \frac{\partial (Ax)}{\partial x_1}, \dots, \frac{\partial (Ax)}{\partial x_n} \right]. \nonumber$ Since $$Ax \in {\mathbb{C}}^m$$, we express it as $Ax = \left[ \begin{array}{c} (Ax)_1 \\ \vdots \\ (Ax)_m \end{array} \right] = \left[ \begin{array}{c} \sum_{i=1}^n A_{1i}x_i \\ \vdots \\ \sum_{i=1}^n A_{mi}x_i \end{array} \right], \nonumber$ and it follows that $\frac{\partial (Ax)}{\partial x_1} = \left[ \begin{array}{c} \frac{\partial (Ax)_1}{\partial x_1} \\ \vdots \\ \frac{\partial (Ax)_m}{\partial x_1} \end{array} \right] = \left[ \begin{array}{c} A_{11} \\ \vdots \\ A_{m1} \end{array} \right]. \nonumber$ Using the expression above, we write the derivative of $$Ax$$ as $D(Ax) = \left[ \begin{array}{ccc} \frac{\partial (Ax)_1}{\partial x_1} & \cdots & \frac{\partial (Ax)_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial (Ax)_m}{\partial x_1} & \cdots & \frac{\partial (Ax)_m}{\partial x_n} \end{array} \right] = \left[ \begin{array}{ccc} A_{11} & \cdots & A_{1n} \\ \vdots & \ddots & \vdots \\ A_{m1} & \cdots & A_{mn} \end{array} \right] = A. \nonumber$

Suppose $$f: {\mathbb{C}}^n \to {\mathbb{R}}$$ is a real-valued function and $$x \in {\mathop{\bf int}}{\mathop{\bf dom}}f$$. The gradient $$\nabla f(x)$$ is an $$n \times 1$$ matrix (a column vector), defined by $\nabla f(x) = Df(x)^T = \left[ \begin{array}{c} \frac{\partial f}{\partial x_1}(x) \\ \vdots \\ \frac{\partial f}{\partial x_n}(x) \end{array} \right]. \nonumber$
Let $$f : {\mathbb{C}}^n \to {\mathbb{C}}^m$$ be differentiable at $$x \in {\mathop{\bf int}}{\mathop{\bf dom}}f$$, and let $$g : {\mathbb{C}}^m \to {\mathbb{R}}$$ be differentiable at $$f(x) \in {\mathop{\bf int}}{\mathop{\bf dom}}g$$. Define the composite function $$h = g \circ f: {\mathbb{C}}^n \to {\mathbb{R}}$$ by $$h(x) = g(f(x))$$, with $${\mathop{\bf dom}}h = \{x \, | \, f(x) \in {\mathop{\bf dom}}g\}$$. Then h is differentiable at $$x$$, with derivative $Dh(x) = Dg(f(x))Df(x),$ Taking the transpose of $$Dh(x) = Dg(f(x))Df(x)$$ gives the gradient of $$h(x)$$: \begin{aligned} \nabla h(x) &=& Dh(x)^T \nonumber\\ &=& (Dg(f(x))Df(x))^T \nonumber\\ &=& Df(x)^T Dg(f(x))^T \nonumber\\ &=& \nabla f(x) \nabla g(f(x)). \nonumber\end{aligned}
Suppose $$f : {\mathbb{C}}^n \to {\mathbb{C}}^m$$ and $$x \in {\mathop{\bf int}}{\mathop{\bf dom}}f$$. It follows that if $f(x) = \left[ \begin{array}{c} f_1(x_1, \ldots, x_n) \\ \vdots \\ f_m(x_1, \ldots, x_n) \end{array} \right], \nonumber$ then the derivative of $$f$$ at $$x$$, denoted $$Df(x) \in {\mathbb{C}}^{m \times n}$$, is given by \begin{aligned} Df(x) &=& \left[ \frac{\partial f}{\partial x_1}(x), \dots, \frac{\partial f}{\partial x_n}(x) \right] \nonumber\\ &=& \left[ \begin{array}{ccc} \frac{\partial f_1}{\partial x_1}(x_1, \ldots, x_n) & \cdots & \frac{\partial f_1}{\partial x_n}(x_1, \ldots, x_n) \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1}(x_1, \ldots, x_n) & \cdots & \frac{\partial f_m}{\partial x_n}(x_1, \ldots, x_n) \\ \end{array} \right]. \nonumber\end{aligned}
Here we provide a detailed derivation of the ’complex’ gradient of each term in the cost function as defined in Equation [A1]. Since by definition the ’complex’ gradient is the transpose of the ’complex’ derivative, we first find expressions in terms of derivatives. The cost function $$f : {\mathbb{C}}^{{n_v}} \to {\mathbb{R}}$$ is given by $f(m) = \| {\mathcal{F}_u}m - y\|^2_{\ell_2} + \lambda \| \Psi m\|_{\ell_1}.$ The ’complex’ derivative of the cost function $$f$$ at $$m$$, denoted $$Df(m)$$, is a $$1 \times {n_v}$$ row vector, and given by $Df(m) = D\| {\mathcal{F}_u}m - y\|^2_{\ell_2} + \lambda D\| \Psi m\|_{\ell_1}.$ Let’s look at each term separately. For the first term, $$D\| {\mathcal{F}_u}m - y\|^2_{\ell_2}$$, we apply the chain rule to the composite function $$h({\mathcal{F}_u}m - y)$$, where $$h(x) = \|x\|^2_{\ell_2} : {\mathbb{C}}^{{n_v}} \to {\mathbb{C}}^{{n_k}}$$. It follows that \begin{aligned} D\| {\mathcal{F}_u}m - y\|^2_{\ell_2} &=& Dh({\mathcal{F}_u}m - y) D({\mathcal{F}_u}m - y) \nonumber\\ &=& 2({\mathcal{F}_u}m - y)^T {\mathcal{F}_u}. \nonumber\end{aligned} Lustig et al. used a smooth approximation of the absolute value of a complex number $$x = a + jb \in {\mathbb{C}}$$, given as $|x| \approx \sqrt{x^*x + \mu}, \nonumber$ where $$\mu$$ is a positive smoothing parameter. With this approximation, applying the definition of the complex derivative yields \begin{aligned} \frac{d|x|}{dx} &=& \frac{d|x|}{da} + j\frac{d|x|}{db} \nonumber\\ &\approx& \frac{d}{da} \left( \sqrt{(a + jb)^*(a + jb) + \mu} \right) + j\frac{d}{db} \left( \sqrt{(a + jb)^*(a + jb) + \mu} \right) \nonumber\\ &\approx& \frac{d}{da} \left( \sqrt{a^2 + b^2 + \mu} \right) + j\frac{d}{db} \left( \sqrt{a^2 + b^2 + \mu} \right) \nonumber\\ &\approx& \frac{2a}{2\sqrt{a^2 + b^2 + \mu}} + j\frac{2b}{2\sqrt{a^2 + b^2 + \mu}} \nonumber\\ &\approx& \frac{x}{\sqrt{x^*x + \mu}}. \nonumber\end{aligned} For the second term, we apply the chain rule to the composite function $$h = g \circ f : {\mathbb{C}}^{{n_v}} \to {\mathbb{R}}$$ defined by $$h(m) = g(f(m))$$, with $$g(x) = \|x\|_{\ell_1} : {\mathbb{C}}^{{n_k}} \to {\mathbb{R}}$$ and $$f(m) = \Psi m : {\mathbb{C}}^{{n_v}} \to {\mathbb{C}}^{n_k}$$. Using the definition of the $$\ell_1$$-norm $\|x\|_{\ell_1} = \sum_{i=1}^{{n_k}} |x_i|, \nonumber$ and the smooth approximation, the derivative of $$g$$ at $$x$$ is given by \begin{aligned} Dg(x) &=& \left[ \frac{\partial g}{\partial x_1}(x), \dots, \frac{\partial g}{\partial x_{{n_k}}}(x) \right] \nonumber\\ &=& \left[ \frac{\partial }{\partial x_1} \left(\sum_{i=1}^{{n_k}} |x_i|\right), \dots, \frac{\partial }{\partial x_{{n_k}}} \left(\sum_{i=1}^{{n_k}} |x_i|\right) \right] \nonumber\\ &=& \left[ \frac{d |x_1|}{d x_1}, \dots, \frac{d |x_{{n_k}}|}{d x_{{n_k}}} \right] \nonumber\\ &\approx& \left[ \frac{x_1}{\sqrt{x_1^*x_1 + \mu}}, \dots, \frac{x_{{n_k}}}{\sqrt{x_{{n_k}}^*x_{{n_k}} + \mu}} \right]. \nonumber\end{aligned} Therefore, the derivative of the second term is \begin{aligned} \label{eqn:derivative_2nd_term} D\|\Psi m\|_{\ell_1} &=& Dh(m) = Dg(f(m)) \nonumber\\ &=& Dg(f(m)) Df(m) \nonumber\\ &=& Dg(\Psi m) D(\Psi m) \nonumber\\ &\approx& \left[ \frac{(\Psi m)_1}{\sqrt{(\Psi m)_1^*(\Psi m)_1 + \mu}}, \dots, \frac{(\Psi m)_{{n_k}}}{\sqrt{(\Psi m)_{{n_k}}^*(\Psi m)_{{n_k}} + \mu}} \right] \Psi.\end{aligned} Taking the transpose of Equation \ref{eqn:derivative_2nd_term} yields \begin{aligned} \nabla \|\Psi m\|_{\ell_1} &\approx& \left( \left[ \frac{(\Psi m)_1}{\sqrt{(\Psi m)_1^*(\Psi m)_1 + \mu}}, \dots, \frac{(\Psi m)_{{n_k}}}{\sqrt{(\Psi m)_{{n_k}}^*(\Psi m)_{{n_k}} + \mu}} \right] \Psi \right)^T \nonumber &\approx& \Psi^T\end{aligned}