Non-convex optimization formulation

Since \(\mathbf{A_0}\) is complete, the row space of \(\mathbf{Y}\) is equivalent to the row space of \(\mathbf{X}_0\). By the result from \cite{spielman2013exact}, under the generative model, rows of \(\mathbf{X}_0\) corresponds to the \(n\) sparsest vectors in the row space. Once \(\mathbf{X}_0\) is recovered, \(\mathbf{A}_0\) could be solved from least square \(\mathbf{A}_0 = \mathbf{Y} \mathbf{X}_0 (\mathbf{X}_0 \mathbf{X}_0^T)^{-1}\).

A straightforward formulation for recovering one row of \(\mathbf{X}_0\) is \[\min \| \mathbf{q}^{T} \mathbf{Y} \|_0 \quad s.t.\ \mathbf{q} \neq \mathbf{0}.\] \(\mathbf{X}_0\) can be recovered row by row by deflation.

However, this is discontinuous and the scale of \(\mathbf{q}\) is unidentifiable. The problem above is recast as

\[\min f(\mathbf{q}; \hat{\mathbf{Y}}) = \frac{1}{p} \sum_{k=1}^p h_{\mu} (\mathbf{q}^T \hat{\mathbf{y}_k}), \quad s.t. \ \|q\|_2 = 1,\] where \(\hat{\mathbf{Y}}\) is the data after pre-conditioning. The loss function is a special convex function, namely

\[h_{\mu} (z) = \mu \log \cosh(z / \mu).\]

Next, we summarize some of the key properties of this problem, which are later formalized by the paper.