We will create a highly abstract CUDA API for Lua with an aim at programmers unfamiliar with GPU-level parallelism.


Lua is a fast, lightweight, and embeddable scripting language found in places like Wikipedia, World of Warcraft, Photoshop Lightroom, and more. Lua's simple syntax and dynamic typing also make it an ideal language for novice programmers. Traditionally, languages like Lua find themselves abstracted miles above low-level parallel frameworks like CUDA, and consequently GPU parallelism was limited to programmers using a systems language like C++. Frameworks like Terra, however, work to close that gap, making low-level programming accessible in a high-level interface. However, these interfaces still require a number of calls to C libraries and intimate knowledge of the CUDA library. For example, the following code runs a simple CUDA kernel in Terra:

terra foo(result : &float)
    var t = tid()
    result[t] = t

local R = terralib.cudacompile({ bar = foo })

terra run_cuda_code(N : int)
    var data : &float
    var launch = terralib.CUDAParams { 1,1,1, N,1,1, 0, nil }
    var results : &float = [&float](C.malloc(sizeof(float)*N))
    return results;

results = run_cuda_code(16)

Other high-level CUDA bindings like PyCUDA and JCuda suffer the same problem.