terracuda

Summary

We will create a highly abstract CUDA API for Lua with an aim at programmers unfamiliar with GPU-level parallelism.

Background

Lua is a fast, lightweight, and embeddable scripting language found in places like Wikipedia, World of Warcraft, Photoshop Lightroom, and more. Lua's simple syntax and dynamic typing also make it an ideal language for novice programmers. Traditionally, languages like Lua find themselves abstracted miles above low-level parallel frameworks like CUDA, and consequently GPU parallelism was limited to programmers using a systems language like C++. Frameworks like Terra, however, work to close that gap, making low-level programming accessible in a high-level interface. However, these interfaces still require a number of calls to C libraries and intimate knowledge of the CUDA library. For example, the following code runs a simple CUDA kernel in Terra:

terra foo(result : &float)
    var t = tid()
    result[t] = t
end

local R = terralib.cudacompile({ bar = foo })

terra run_cuda_code(N : int)
    var data : &float
    C.cudaMalloc([&&opaque](&data),sizeof(float)*N)
    var launch = terralib.CUDAParams { 1,1,1, N,1,1, 0, nil }
    R.bar(&launch,data)
    var results : &float = [&float](C.malloc(sizeof(float)*N))
    C.cudaMemcpy(results,data,sizeof(float)*N,2)
    return results;
end

results = run_cuda_code(16)

Other high-level CUDA bindings like PyCUDA and JCuda suffer the same problem.

The Challenge

The problem is challenging foremost on the level of architecture. Designing an API is never easy, and attempting to expose GPU-level parallelism to a language as high-level as Lua requires a great deal of care to be usable while still being useful. Creating such an API requires significant knowledge of the abstraction layers between Lua, C, and CUDA as well as knowledge of the typical use cases for high-level parallelism.

My partner and I know neither Terra nor LLVM (which Terra compiles to), so creating these high-level bindings requires a great deal of initial investment. The existing interface between Terra and CUDA is sketchy at best, so we will need to implement significant new functionality into Terra in order for the Circle Renderer to function properly.