terracuda

Summary

We will create a highly abstract CUDA API for Lua with an aim at programmers unfamiliar with GPU-level parallelism.

Background

Lua is a fast, lightweight, and embeddable scripting language found in places like Wikipedia, World of Warcraft, Photoshop Lightroom, and more. Lua's simple syntax and dynamic typing also make it an ideal language for novice programmers. Traditionally, languages like Lua find themselves abstracted miles above low-level parallel frameworks like CUDA, and consequently GPU parallelism was limited to programmers using a systems language like C++. Frameworks like Terra, however, work to close that gap, making low-level programming accessible in a high-level interface. However, these interfaces still require a number of calls to C libraries and intimate knowledge of the CUDA library. For example, the following code runs a simple CUDA kernel in Terra:

terra foo(result : &float)
    var t = tid()
    result[t] = t
end

local R = terralib.cudacompile({ bar = foo })

terra run_cuda_code(N : int)
    var data : &float
    C.cudaMalloc([&&opaque](&data),sizeof(float)*N)
    var launch = terralib.CUDAParams { 1,1,1, N,1,1, 0, nil }
    R.bar(&launch,data)
    var results : &float = [&float](C.malloc(sizeof(float)*N))
    C.cudaMemcpy(results,data,sizeof(float)*N,2)
    return results;
end

results = run_cuda_code(16)

Other high-level CUDA bindings like PyCUDA and JCuda suffer the same problem.