Related work

Linear algebra on GPUs

Graphics processing units (GPUs) have evolved, driven by the video game market, to perform more specific kinds of computation faster than the general purpose CPU. They are particularly well suited for data-parallel computing in a single-instruction multiple-device (SIMD) fashion \cite{flynn}. The advantages of such a pipeline architecture for linear algebra operations have been noted early \cite{survey}. Today, with the commoditization of powerful GPUs, the scientific computing community has developed multiple libraries and toolkits for GPU-enabled linear algebra, including cuBLAS \cite{cublas}, CULA \cite{cula}, Magma \cite{magma}, ViennaCL \cite{viennacl}. Paralleling the basic linear algebra subsystem (BLAS) \cite{blas}, such libraries implement general matrix-vector and matrix-matrix multiplication operators (gemv, gemm). CUDA documentation \cite{cudadoc} describes a simple implementation of gemm that takes advantage of the device memory structure; there are research efforts dedicated to exploring the bottlenecks and optimizing matrix multiplication in general \cite{volkov2008benchmarking} as well as for specific cases such as sparse matrices \cite{sparse,bell2008efficient}, but we are not aware of research on implicitly-defined matrices with repeated values. In fact, \cite{efficiency} argue that matrix-matrix multiplication can be suboptimal on GPUs because the higher cost of data fetching; our work will investigate whether this problem can be significantly alleviated through hash-based data compression.