OpenCLLink Programming

Programming OpenCL in the Wolfram Language is simple since the user need not write C wrapper codewhich can be quite verbose, difficult to understand, and hard to debug. Using OpenCLLink also guarantees compatibility as new versions of the standard are released.

In this section a brief introduction is given to OpenCL programming. The section uses OpenCLFunctionLoad, which allows users to load OpenCL code and use it from within the Wolfram Language.

OpenCLFunctionLoadload an OpenCL function from source into the Wolfram Language

OpenCL programming in the Wolfram Language.

Users are advised to read the much more detailed tutorial "CUDALink Programming".

OpenCL Programming

An OpenCL program is a small piece of code that performs a computation on each element of an input list. This first program will add 2 to each element.

__kernel void addTwo_kernel(__global mint * arry, mint len) {
    
    int index = get_global_id(0);

    if (index >= len) return;

arry[index] += 2;
}

The following is the decomposition of the above program into sections.

_kernel void addTwo_kernel (_global int arry, int len) {

The _kernel construct declares the function to be run on the OpenCL GPU. The rest are function arguments with pointers having the _global prefix.

int index = get_global_id (0);

This gets the index value of the thread executing the function. The index values range from 0 to the number of threads launched.

if (index >= len) return;

This makes sure that the program does not write to memory beyond the length of the input array. Since the number of threads launched is in multiples of the block size, this conditional statement is needed if the size of the input array is not a multiple of the block size.

arry[index] += 2;

This adds two to each element.

Loading Program into the Wolfram Language

Once the program is written, it can be loaded into the Wolfram Language using OpenCLLink. This is done using OpenCLFunctionLoad.

Load the OpenCLLink application.

First, assign the program to a string.

This loads the function. The arguments to OpenCLFunctionLoad are the source code, the name of the function to load, the function signature, and the block dimension.

This runs the function.

The result is the set of output list elements.

Porting CUDA to OpenCL

Since OpenCLLink handles the C wrapper code required in OpenCL programming, allowing the user to concentrate on the OpenCL kernel code, this is the only code needed to be ported from CUDA.

In terms of OpenCL program porting, there are one-to-one function renames between CUDA and OpenCL. The following table gives the correspondence.

In this section, use the above table to port the following CUDA code to OpenCL.

__global__ void myKernel(mint * global0Id, mint * global1Id, mint width, mint height) {
int xIndex = threadIdx.x + blockDim.x * blockIdx.x;
int yIndex = threadIdx.y + blockDim.y * blockIdx.y;
int index = xIndex + yIndex*width;
    if (xIndex < width && yIndex < height) {
     global0Id[index] = threadIdx.x;
global1Id[index] = threadIdx.y;
}
}

The following is the translation of the CUDA code into OpenCL.

The following was changed when porting:

Terminology

Users should note that there are some differences in terminology between CUDA and OpenCL. In the Wolfram Language, the best description of both is combined.

The following table gives the translation in terminology:

CUDAOpenCL
streaming multiprocessordevice
multiprocessorcompute unit
global memoryglobal memory
shared memorylocal memory
local memoryprivate memory
kernelprogram
blockwork group
threadwork item

Memory

The behavior of the memory manager is the same between CUDALink and OpenCLLink. The memory manager is discussed in depth in the CUDALink Memory Guide.

Since memory is bound to one link, CUDALink and OpenCLLink memories are not interchangeable.