Authorea

James Shirley edited CUDA_Implementation.tex about 11 years ago

Commit id: 5ad01f5832847d3407094c6a6fcbe11947802d53

deletions | additions

Using cuBLAS is fairly simple. For example, to parallelize a dot product on two single precision floating arrays you can use the cublasSdot() function. This function simply takes 2 floating point arrays, the size of the vectors, and a return float array as well as some helper parameters. The obvious benefit to using functions such as this is that it takes out the tedious work of moving memory to the GPU. For our working set we have used cuBLAS to parallelize a dot product in the sampleEffectsBayesC function where the currentDataLocus needs to be combined with the current RInverseY using a dot product. The original code base did this like so: \verb|float float rhsModeli = (currentDataLocus.dot(RInverseY)) + diagLhs(i) * oldSamplei;| oldSamplei; To do this in cuBLAS you can do the following: \verb|float float rhsModeli; cublasSdot(cublasHandle,numObs,currentDataLocus.data(),0,RInverseY.data(),0,&rhsModeli); rhsModeli += (diagLhs(i) * oldSamplei);| oldSamplei); You must also setup the handle that cuBLAS uses to access the GPU, this can be done like so: \verb|cublasHandle cublasHandle = 0; cublasStatus_t cublasStatus; cublasStatus = cublasCreate(&cublasHandle);| cublasCreate(&cublasHandle); Although in this example it looks like using cuBLAS is more complicated normally you could not simply call dot() on a float array, that is only allowed above because currentDataLocus and RInverseY are matvec objects.