Authorea

Ben Hirsch edited CUDA_Implementation.tex about 11 years ago

Commit id: 7d6b4fa69e8f1c32986421f390fbc53ee5e267bd

deletions | additions

\item Sample the locus effect variance \item Accumulate posterior mean of probability distribution \end{enumerate} \end{enumerate} \subsection{Parallelizing Linear Algebra using cuBLAS} The downside to the first approach is that while each locus' effect computation gets its own thread, each thread performs a wide variety of matrix and vector math. Linear algebra is a field ripe for parallelization as each item in a matrix or vector computation such as multiplication, dot products, or scalar multiplication can be computed independently of the others. cuBLAS is a library that offers parallelization of the BLAS (Basic Linear Algebra Subroutines) using CUDA \cite{cublas}. We chose to examine the speedup by moving the matrix and vector math to the GPU and staying with the sequential loop over each locus on the CPU. The algorithm overview stays the same as in the overview of the BayesC algorithm, with the only difference being that rather than using a CPU-bound library for Linear Algebra math called Eigen, we instead used cuBLAS to do all of the data manipulation.