Authorea

James Shirley edited cuda_overview.tex about 11 years ago

Commit id: 34b2a7114dd53104e9d04bd6d577101e32dc2653

deletions | additions

\subsection{CUDA Memory Model} Similar to a CPU's architecture, the GPU also has multiple types of memory available to it. At the highest level we have global and constant memory. When moving non-primitive values from the host to the GPU the developer must either copy into the GPU's global or constant memory. Constant memory is significantly fast (quote to come) than global memory however it is immutable. Once the memory is on the GPU the developer can either access it directly from all threads and all blocks or subdivide the memory into local or shared memory. Local memory is accessible on a per thread basis and is the fastest type of memory available whereas shared memory is still relatively fast and can be accessed from all threads in the same block. There are techniques for loading and storing memory that will lead to a speedup in memory operations however we will not go into these techniques in this paper.