this is for holding javascript data
James Shirley generating latex version of article
about 11 years ago
Commit id: 9577e441a0e9ffa325e7ac849305850b656d9a0c
deletions | additions
diff --git a/CUDA Implementation.tex b/CUDA Implementation.tex
index e006927..e69de29 100644
--- a/CUDA Implementation.tex
+++ b/CUDA Implementation.tex
...
\section{BayesC In CUDA}
CUDA gives the programmer flexibility in how they parallelize their code. It provides powerful thread level parallelism that allows effective "unrolling" of loops and running each iteration simultaneously (conceptually). We targeted the inner loop which samples the effect each loci has for parallelization.
\subsection{Loci Parallelization Overview}
\begin{enumerate}
\item Move Genotype and Phenotype data to GPU
\item Launch kernel with one thread for each loci.
\item Reform Matrices and Vectors on the GPU
\item Compute effect of the loci
\item Move data back to CPU
\end{enumerate}
\subsection{CUDA Library Usage}
A shortcoming of the current CUDA architecture is that kernels can not launch other kernels and so.
diff --git a/Final Paper.tex b/Final Paper.tex
new file mode 100644
index 0000000..30e054f
--- /dev/null
+++ b/Final Paper.tex
...
\documentclass[]{article}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\ifxetex
\usepackage{fontspec,xltxtra,xunicode}
\defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}
\else
\ifluatex
\usepackage{fontspec}
\defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}
\else
\usepackage[utf8]{inputenc}
\fi
\fi
\ifxetex
\usepackage[setpagesize=false, % page size defined by xetex
unicode=false, % unicode breaks when used with xetex
xetex,
colorlinks=true,
linkcolor=blue]{hyperref}
\else
\usepackage[unicode=true,
colorlinks=true,
linkcolor=blue]{hyperref}
\fi
\hypersetup{breaklinks=true, pdfborder={0 0 0}}
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\setcounter{secnumdepth}{0}
\author{James Shirley}
\begin{document}
\newcommand{\truncateit}
{[}1{]}\truncate{0.8\textwidth}{#1}
\newcommand{\scititle}
{[}1{]}
\textbf{Abstract}. NVidia's CUDA framework has brought supercomputing to
the masses allowing programmers to take advantage of the highly parallel
capabilities of their Graphics Processing Units. We analyzed a popular
Genomic Selection software's codebase and identified key areas where it
could benefit from parallelization. Using the CUDA C++ language
extensions, we did just that and found X speedup.
\section{Introduction}
Affordable Graphics Processing Units (GPUs) have revolutionized the
personal computing industry. GPUs offer massively parallel, many-core
processing capabilities at an affordable cost. NVidia's CUDA (Compute
Unified Device Architecture) is a framework and an extension to the C
language that gives programmers the ability to utilize the parallel
architecture of the GPUs for general purpose programming. The general
purpose programming language effectively gives the programmer a
commodity supercomputer.\cite{1}
The high-performance of general purpose graphics processing units
(GPGPUs) has made it an attractive target for numerous numerical
applications in science and engineering. GenSel is a piece of software
written mainly by Rohan Fernando in C++ that performs analyses related
to Genomic Selection using information about animals' Genotypes and
Phenotypes to make inferences on the effects of each marker loci on the
phenotypic output (?). It uses Bayesian analyses with MCMC methods to
compute the posterior probabilities.
Programmers have had success parallelizing algorithms using Monte Carlo
Markov Chain (MCMC) methods in the past. This paper presents a
description of where the GenSel software can be parallelized as well as
some preliminary results of parallelizing the BayesC method.
\section{BayesC Algorithm}
Genomic selection involves using Pure Bred (PB) animals to improve
performance when cross-breeding or breeding with other PB animals.
Evaluating each animal for cross breeding performance involves
estimating the effect of Single Nucleotide Polymorphism (SNP) on
crossbred performance, using the phenotypes and genotypes from
crossbreeds, and correlating them to purebred performance.
\subsection{Bayesian Estimation of SNP Effects}
Marker effects were estimated using the BayesC algorithm presented by
Kizilkaya et al \cite{2}. The algorithm uses MCMC methods
\subsection{Algorithm Overview}
\begin{enumerate}
\item
For i in {[}1..chainLength{]}
\begin{enumerate}
\item
Sample Residual Variance
\item
Sample the Intercept
\item
For each j in {[}0..numberOfLoci{]}:
\begin{enumerate}
\item
Adjust Phenotypes for the current locus j
\item
Calculate variance for the current locus
\item
Sample from a uniform distribution
\item
If probability is less than random variable: Something
\item
Else: Something else
\end{enumerate}
\item
Sample the locus effect variance
\item
Accumulate posterior mean of probability distribution
\end{enumerate}
\end{enumerate}
\section{BayesC In CUDA}
CUDA gives the programmer flexibility in how they parallelize their
code. It provides powerful thread level parallelism that allows
effective "unrolling" of loops and running each iteration simultaneously
(conceptually). We targeted the inner loop which samples the effect each
loci has for parallelization.
\subsection{Loci Parallelization Overview}
\begin{enumerate}
\item
Move Genotype and Phenotype data to GPU
\item
Launch kernel with one thread for each loci.
\item
Reform Matrices and Vectors on the GPU
\item
Compute effect of the loci
\item
Move data back to CPU
\end{enumerate}
\subsection{CUDA Library Usage}
A shortcoming of the current CUDA architecture is that kernels can not
launch other kernels and so.
\section{Results}
\section{Conclusion}
\end{document}