In this blog post I go over how to use C++ templates directly in OpenCL kernels with this extension. I hope this is useful to others as, to me at least, how to use this extension is not immediately apparent.
First, we define our templated function (the prototypical square() function template):
inline T square(const T x)
return x * x;
Next we define a templated kernel which calls the square() function:
__kernel void square_kernel(__global T *data)
const uint i = get_global_id(0);
data[i] = square(data[i]);
Now, in order to use the templated kernel, we must explicitly instantiate it and give it a name that can be used to call it from the host. We do that by explicitly stating the template-types and using the mangled_name attribute as so:
__kernel void square_kernel(__global int *data);
The kernel can be instantiated multiple times for different types (though each must be given a unique mangled name). For example, to define the square kernel for float's:
__kernel void square_kernel(__global float *data);
That's it for the OpenCL code. To compile the program on the host you must pass the "-x clc++" compile option to the clBuildProgram() function. In Boost.Compute, that is done like so (where source is a string containing all of the OpenCL source code from above and context is the OpenCL context object):
compute::program square_program =
compute::program::build_with_source(source, context, "-x clc++");
Once built, the templated kernels can be instantiated by passing their mangled name to the clCreateKernel() function. In Boost.Compute, you would do the following:
compute::kernel square_int_kernel(square_program, "square_kernel_int");
The square_int_kernel can now be used just like any other kernel object (e.g. passed to clEnqueueNDRangeKernel()for execution on the device).
A fully implemented and compilable example demonstrating the code above can be found here.