Sunday, March 9, 2014
Custom OpenCL functions in C++ with Boost.Compute
Due to OpenCL's C99-based programming and compilation model, defining and using custom functions from C++ on the GPU can be challenging. However, Boost.Compute provides a few utilities to simplify function creation and execution from C++ (without ever having to touch a raw source code string!).
The BOOST_COMPUTE_FUNCTION() macro creates function objects in C++ which can then be executed on the GPU by OpenCL with the Boost.Compute algorithms (e.g. transform(), sort()).
As arguments, the macro takes the function's return type, name, argument list, and source. The first three arguments (return type, name, and argument list) are all C++ types/expressions. The source argument contains the body of the function which will be inserted into an OpenCL program when executed with one of the Boost.Compute algorithms.
The return type, name, and argument list are used by Boost.Compute to automatically generate the OpenCL function declaration as well as to instantiate the C++ function<> object with the correct signature. This ensures type-safety in C++ (e.g. calling the function with the wrong number of arguments will result in a C++ compile-time error).
The following example shows how to create a comparison function which can be passed to the sort() algorithm in order to sort a list of vectors by their length.
The BOOST_COMPUTE_CLOSURE() is similar to the function macro but additionally allows for a set of C++ variables to be "captured" and then used in the OpenCL function source. This is similar to passing variables to C++11 lambda functions with the capture list. For now, only value types (e.g. float, int4) can be captured. In the future I plan on extending this to allow memory-object types (e.g. buffer, image2d) as well.
The following example shows how to create a function which determines if a 2D cartesian point is contained in a circle with its radius and center given in C++.
As you can see, the C++ center and radius variables have been captured by the closure function and made available for use in the OpenCL source code. Under the hood this is accomplished by invisibly passing the captured values to OpenCL when the function is invoked.
In addition to these macros, Boost.Compute also contains a lambda-expression framework which allows for one-line C++ expressions to be converted to OpenCL source-code and executed on the GPU. This is similar to the Boost.Lambda library and based on the Boost.Proto library.
The previous example showing how to sort vectors by their length could also be written using a lambda-expression as follows:
Together these macros and the lambda-expression framework provide a powerful way to create OpenCL functions interspersed with C++ code (and, notably, doesn't require a special compiler or any non-standard compiler extensions!).
Edit: As of this commit on 4/20/2014 the BOOST_COMPUTE_FUNCTION() macro now uses a list of arguments including their type and name. The old, auto-generated names (e.g. _1, _2) are no longer used. The new version allows for clearer code with more descriptive variable names. The examples above have been updated to reflect the new API.