James Shirley generating latex version of article  about 11 years ago

Commit id: 9577e441a0e9ffa325e7ac849305850b656d9a0c

deletions | additions      

       

\section{BayesC In CUDA}  CUDA gives the programmer flexibility in how they parallelize their code. It provides powerful thread level parallelism that allows effective "unrolling" of loops and running each iteration simultaneously (conceptually). We targeted the inner loop which samples the effect each loci has for parallelization.  \subsection{Loci Parallelization Overview}  \begin{enumerate}  \item Move Genotype and Phenotype data to GPU  \item Launch kernel with one thread for each loci.  \item Reform Matrices and Vectors on the GPU  \item Compute effect of the loci  \item Move data back to CPU  \end{enumerate}  \subsection{CUDA Library Usage}  A shortcoming of the current CUDA architecture is that kernels can not launch other kernels and so.           

\documentclass[]{article}  \usepackage{amssymb,amsmath}  \usepackage{ifxetex,ifluatex}  \ifxetex  \usepackage{fontspec,xltxtra,xunicode}  \defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}  \else  \ifluatex  \usepackage{fontspec}  \defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}  \else  \usepackage[utf8]{inputenc}  \fi  \fi  \ifxetex  \usepackage[setpagesize=false, % page size defined by xetex  unicode=false, % unicode breaks when used with xetex  xetex,  colorlinks=true,  linkcolor=blue]{hyperref}  \else  \usepackage[unicode=true,  colorlinks=true,  linkcolor=blue]{hyperref}  \fi  \hypersetup{breaklinks=true, pdfborder={0 0 0}}  \setlength{\parindent}{0pt}  \setlength{\parskip}{6pt plus 2pt minus 1pt}  \setlength{\emergencystretch}{3em} % prevent overfull lines  \setcounter{secnumdepth}{0}  \author{James Shirley}  \begin{document}  \newcommand{\truncateit}  {[}1{]}\truncate{0.8\textwidth}{#1}  \newcommand{\scititle}  {[}1{]}  \textbf{Abstract}. NVidia's CUDA framework has brought supercomputing to  the masses allowing programmers to take advantage of the highly parallel  capabilities of their Graphics Processing Units. We analyzed a popular  Genomic Selection software's codebase and identified key areas where it  could benefit from parallelization. Using the CUDA C++ language  extensions, we did just that and found X speedup.  \section{Introduction}  Affordable Graphics Processing Units (GPUs) have revolutionized the  personal computing industry. GPUs offer massively parallel, many-core  processing capabilities at an affordable cost. NVidia's CUDA (Compute  Unified Device Architecture) is a framework and an extension to the C  language that gives programmers the ability to utilize the parallel  architecture of the GPUs for general purpose programming. The general  purpose programming language effectively gives the programmer a  commodity supercomputer.\cite{1}  The high-performance of general purpose graphics processing units  (GPGPUs) has made it an attractive target for numerous numerical  applications in science and engineering. GenSel is a piece of software  written mainly by Rohan Fernando in C++ that performs analyses related  to Genomic Selection using information about animals' Genotypes and  Phenotypes to make inferences on the effects of each marker loci on the  phenotypic output (?). It uses Bayesian analyses with MCMC methods to  compute the posterior probabilities.  Programmers have had success parallelizing algorithms using Monte Carlo  Markov Chain (MCMC) methods in the past. This paper presents a  description of where the GenSel software can be parallelized as well as  some preliminary results of parallelizing the BayesC method.  \section{BayesC Algorithm}  Genomic selection involves using Pure Bred (PB) animals to improve  performance when cross-breeding or breeding with other PB animals.  Evaluating each animal for cross breeding performance involves  estimating the effect of Single Nucleotide Polymorphism (SNP) on  crossbred performance, using the phenotypes and genotypes from  crossbreeds, and correlating them to purebred performance.  \subsection{Bayesian Estimation of SNP Effects}  Marker effects were estimated using the BayesC algorithm presented by  Kizilkaya et al \cite{2}. The algorithm uses MCMC methods  \subsection{Algorithm Overview}  \begin{enumerate}  \item  For i in {[}1..chainLength{]}  \begin{enumerate}  \item  Sample Residual Variance  \item  Sample the Intercept  \item  For each j in {[}0..numberOfLoci{]}:  \begin{enumerate}  \item  Adjust Phenotypes for the current locus j  \item  Calculate variance for the current locus  \item  Sample from a uniform distribution  \item  If probability is less than random variable: Something  \item  Else: Something else  \end{enumerate}  \item  Sample the locus effect variance  \item  Accumulate posterior mean of probability distribution  \end{enumerate}  \end{enumerate}  \section{BayesC In CUDA}  CUDA gives the programmer flexibility in how they parallelize their  code. It provides powerful thread level parallelism that allows  effective "unrolling" of loops and running each iteration simultaneously  (conceptually). We targeted the inner loop which samples the effect each  loci has for parallelization.  \subsection{Loci Parallelization Overview}  \begin{enumerate}  \item  Move Genotype and Phenotype data to GPU  \item  Launch kernel with one thread for each loci.  \item  Reform Matrices and Vectors on the GPU  \item  Compute effect of the loci  \item  Move data back to CPU  \end{enumerate}  \subsection{CUDA Library Usage}  A shortcoming of the current CUDA architecture is that kernels can not  launch other kernels and so.  \section{Results}  \section{Conclusion}  \end{document}