OpenCLLink Programming

OpenCL Programming	Porting CUDA to OpenCL
Loading Program into the Wolfram Language	Memory

Programming OpenCL in the Wolfram Language is simple since the user need not write C wrapper code—which can be quite verbose, difficult to understand, and hard to debug. Using OpenCLLink also guarantees compatibility as new versions of the standard are released.

In this section a brief introduction is given to OpenCL programming. The section uses OpenCLFunctionLoad, which allows users to load OpenCL code and use it from within the Wolfram Language.

OpenCLFunctionLoad

load an OpenCL function from source into the Wolfram Language

OpenCL programming in the Wolfram Language.

Users are advised to read the much more detailed tutorial "CUDALink Programming".

OpenCL Programming

An OpenCL program is a small piece of code that performs a computation on each element of an input list. This first program will add 2 to each element.

__kernel void addTwo_kernel(__global mint * arry, mint len) {
    
    int index = get_global_id(0);

    if (index >= len) return;
    
    arry[index] += 2;
}

The following is the decomposition of the above program into sections.

_kernel void addTwo_kernel (_global int ⋆ arry, int len) {

The _kernel construct declares the function to be run on the OpenCL GPU. The rest are function arguments with pointers having the _global prefix.

int index = get_global_id (0);

This gets the index value of the thread executing the function. The index values range from 0 to the number of threads launched.

if (index >= len) return;

This makes sure that the program does not write to memory beyond the length of the input array. Since the number of threads launched is in multiples of the block size, this conditional statement is needed if the size of the input array is not a multiple of the block size.

arry[index] += 2;

This adds two to each element.

Loading Program into the Wolfram Language

Once the program is written, it can be loaded into the Wolfram Language using OpenCLLink. This is done using OpenCLFunctionLoad.

Load the OpenCLLink application.

First, assign the program to a string.

This loads the function. The arguments to OpenCLFunctionLoad are the source code, the name of the function to load, the function signature, and the block dimension.

This runs the function.

The result is the set of output list elements.

Porting CUDA to OpenCL

Since OpenCLLink handles the C wrapper code required in OpenCL programming, allowing the user to concentrate on the OpenCL kernel code, this is the only code needed to be ported from CUDA.

In terms of OpenCL program porting, there are one-to-one function renames between CUDA and OpenCL. The following table gives the correspondence.

In this section, use the above table to port the following CUDA code to OpenCL.

__global__ void myKernel(mint * global0Id, mint * global1Id, mint width, mint height) {
    int xIndex = threadIdx.x + blockDim.x * blockIdx.x;
    int yIndex = threadIdx.y + blockDim.y * blockIdx.y;
    int index = xIndex + yIndex*width;
    if (xIndex < width && yIndex < height) {
       global0Id[index] = threadIdx.x;
       global1Id[index] = threadIdx.y;
    }
}

The following is the translation of the CUDA code into OpenCL.

The following was changed when porting:

_global_voidmyKernel became _kernel voidmyKernel

int ⋆ global0Id, int ⋆ global1Id became _global int ⋆ global0Id, _global int ⋆ global1Id

threadIdx.x + blockDim.x ⋆ blockIdx.x became get_global_id (0)

threadIdx.y + blockDim.y ⋆ blockIdx.y became get_global_id (1)

threadIdx.x became get_local_id (0)

threadIdx.y became get_local_id (1)

In terms of loading the code, the only change that is needed is to replace CUDAFunctionLoad with OpenCLFunctionLoad.

This sets the input parameters.

This launches the function.

This visualizes the results.

Terminology

Users should note that there are some differences in terminology between CUDA and OpenCL. In the Wolfram Language, the best description of both is combined.

The following table gives the translation in terminology:

	CUDA	OpenCL
	streaming multiprocessor	device
	multiprocessor	compute unit
	global memory	global memory
	shared memory	local memory
	local memory	private memory
	kernel	program
	block	work group
	thread	work item

Memory

The behavior of the memory manager is the same between CUDALink and OpenCLLink. The memory manager is discussed in depth in the CUDALink Memory Guide.

Since memory is bound to one link, CUDALink and OpenCLLink memories are not interchangeable.

Top