Parallel implementations of a TV-\(L^{1}\) image-denoising algorithm

I describe parallel implementations of a total variational image-denoising algorithm. Specifically, the primal-dual formulation of the TV-\(L^{1}\) model is implemented in both the shared memory and distributed memory paradigms using the Open Multi-Processing (OpenMP) API and the Message Passing Interface (MPI), respectively. Experiments on the Stampede supercomputer show good weak and strong scaling performance. Weak scaling performance suggests that the distributed implementation may be particularly suited to very large problems where the overhead associated with passing and receiving messages is much less than the local work.


Imaging acquisition involves many hardware and software stages that introduce error sources. This is seen as visual artifacts in the image, typically recognized as noise. This can be especially noticeable in images acquired with low levels of illumination, such as night photography.

Two common manifestations of noise in digital images are Gaussian noise and salt-and-pepper noise. Gaussian noise is typically associated with errors in detection. It produces pixel values that vary within a quasi-normally distributed range about the ”true” value at that point in the image. Salt-and-pepper noise typically arises from transmission errors, and the pixel value is recorded as either fully on or fully off (in grayscale, white or black). Removing these artifacts can be desirable from an aesthetic perspective or in order to pre-process images for other workflows. Common techniques to address this include Gaussian blurring, which takes the convolution of an image with a Gaussian kernel window, and median filtering, which replaces pixels with the median value of a sliding window. These techniques can reduce noise; however, they are also susceptible to blurring edges of features.

The total variation technique was introduced in 1992 by Rudin, Osher and Fatemi (ROF) (Rudin 1992) as an alternative denoising method. The method works by iteratively constructing a function \(u\) on a domain \(\Omega\) that minimizes its energy with an input function \(f\):

\begin{equation} \min_{u}\int_{\Omega}\left\|{\nabla{u}}\right\|+\lambda\int_{\Omega}(u-f)^{2}\\ \end{equation}

with \(\|\cdot\|\) being the \(L^{2}\) norm. The first term is the total variation of the image. It is a regularizer for the minimization function. The second term also includes an \(L^{2}\) norm, which means that the problem is convex and has a unique solution (Chambolle 2009).

Replacing the second term above with an \(L^{1}\) norm leads to the TV-\(L^{1}\) model:

\begin{equation} \min_{u}\int_{\Omega}\left\|{\nabla{u}}\right\|+\lambda\int_{\Omega}\left|{u-f}\right|\\ \end{equation}

The TV-\(L^{1}\) model is not a (necessarily) convex model; however, when applied to discrete images, the model tends to offer better performance at removing salt-and-pepper noise from images than the ROF model and is thus an important image processing technique. It is a standard method implemented in many computer vision packages, including OpenCV (opencv: {Open} {Sourc...).


For discrete images in a TV-\(L^{1}\) model, \(\Omega\subset\mathbb{R}^{2}\) and the integrals become sums over the pixels. The discretization of the TV-\(L^{1}\) problem is well suited for parallelization. At all stages of the calculation, only immediate neighbors are needed. Each stage of the calculation is also completely inside of a loop over each pixel, making the shared memory OpenMP implementation particularly straightforward.

A pseudocode representation of the algorithm is given below. The interpretation of the parameters are not included here; the function descriptions are provided to illustrate the algorithm dependencies. For full details of the parameters, see (Chambolle 2010).


\label{euclid}Primal-dual TV-\(L^{1}\) denoising (after Chambolle and Pock) {algorithmic}[1] \ProcedureTVL1 \State\(X_{0}\leftarrow image\) \State\((px,py)\leftarrow\nabla(X_{0})\) \For\(iter:=0,maxiter\) \ForAll\((i,j)\in X_{iter}\): \State\((dx,dy)\leftarrow\nabla(X_{iter})\) \State\((px,py)\leftarrow\textbf{{project}}(px+\sigma dx,py+\sigma dy)\) \State\(nablat\leftarrow\nabla^{T}(px,py)\) \State\(X^{\prime}\leftarrow\textbf{{shrink}}(X_{iter}-\tau nablat,\lambda\tau)\) \State\(X_{iter+1}\leftarrow X_{iter}+(X_{iter}-X^{\prime})\) \EndFor\EndFor\Return\(X_{maxiter}\)\CommentThe final update. \EndProcedure

The calculation of the gradient for an image \(u\) on \(i\) by \(j\) pixels is given by

\begin{equation} \nabla{u_{i,j}}=\left(\begin{matrix}dx_{i,j}\\ dy_{i,j}\end{matrix}\right)=\left(\begin{matrix}u_{i+1,j}-u_{i,j}\\ u_{i,j+1}-u_{i,j}\end{matrix}\right)\\ \end{equation}

The project function takes the partial gradients \(dx\) and \(dy\) and projects them into a unit circle: If the length of the local gradient is larger than unity, the gradient is clipped by the unit circle; otherwise it is unchanged.

\begin{aligned} \nonumber \\ \textbf{anorm}_{i,j} & =\sqrt{dx^{2}_{i,j}+dy^{2}_{i,j}} \\ \textbf{project}\left(\begin{matrix}dx_{i,j}\\ dy_{i,j}\end{matrix}\right) & =\begin{cases}\left(\begin{matrix}dx_{i,j}\\ dy_{i,j}\end{matrix}\right)&\textbf{anorm}_{i,j}\leq 1\\ \frac{1}{\textbf{anorm}_{i,j}}\left(\begin{matrix}dx_{i,j}\\ dy_{i,j}\end{matrix}\right)&\textrm{otherwise}\end{cases}\\ \end{aligned}

The reverse transformation from projected gradients to pixel values is constructed directly from the gradient definition above.

\begin{equation} \nabla^{T}\left(\begin{matrix}dx_{i,j}\\ dy_{i,j}\end{matrix}\right)=dx_{i-1,j}-dx_{i,j}+dy_{i,j-1}-dy_{i,j}\\ \end{equation}

Finally, the shrink function clips the allowable difference of the transformed image from the \(iter\)-th solution \(X_{iter}\). The solution is then updated from this clipped value.