OpenCLFunctionLoad
OpenCLFunctionLoad["src",fun,argtypes,blockdims]
compiles the string src and makes fun available in the Wolfram Language as an OpenCLFunction.
OpenCLFunctionLoad[File[srcfile],fun,argtypes,blockdim]
compiles the source code file srcfile and then loads fun as an OpenCLFunction.
OpenCLFunctionLoad[File[libfile],fun,argtypes,blockdim]
loads fun as an OpenCLFunction. from the previously compiled library libfile.
Details and Options
- The OpenCLLink application must be loaded using Needs["OpenCLLink`"].
- If libfile is a dynamic library, then the dynamic library function fun is loaded.
- Possible argument and return types, and their corresponding OpenCL type, include:
-
_Integer mint Wolfram Language integer "Integer32" int 32-bit integer "Integer64" long/long long 64-bit integer _Real Real_t GPU real type "Double" double machine double "Float" float machine float {base, rank, io} OpenCLMemory memory of specified base type, rank, and input/output option "Local" "Shared"mint local or shared memory parameter {"Local" "Shared", type} mint local or shared memory parameter - In the specification {base, rank, io}, valid settings of io are "Input", "Output", and "InputOutput".
- The argument specification {base} is equivalent to {base,_,"InputOutput"}, and {base,rank} is equivalent to {base,rank,"InputOutput"}.
- The rank can be omitted by using {base,_,io} or {base,io}.
- Possible base types are:
-
_Integer _Real _Complex "Byte" "Bit16" "Integer32" "Byte[2]" "Bit16[2]" "Integer32[2]" "Byte[4]" "Bit16[4]" "Integer32[4]" "Byte[8]" "Bit16[8]" "Integer32[8]" "Byte[16]" "Bit16[16]" "Integer32[16]" "UnsignedByte" "UnsignedBit16" "UnsignedInteger" "UnsignedByte[2]" "UnsignedBit16[2]" "UnsignedInteger[2]" "UnsignedByte[4]" "UnsignedBit16[4]" "UnsignedInteger[4]" "UnsignedByte[8]" "UnsignedBit16[8]" "UnsignedInteger[8]" "UnsignedByte[16]" "UnsignedBit16[16]" "UnsignedInteger[16]" "Double" "Float" "Integer64" "Double[2]" "Float[2]" "Integer64[2]" "Double[4]" "Float[4]" "Integer64[4]" "Double[8]" "Float[8]" "Integer64[8]" "Double[16]" "Float[16]" "Integer64[16]" - OpenCLFunctionLoad can be called more than once with different arguments.
- Functions loaded by OpenCLFunctionLoad run in the same process as the Wolfram Language kernel.
- Functions loaded by OpenCLFunctionLoad are unloaded when the Wolfram Language kernel exits.
- Block dimensions can be either a list or an integer denoting how many threads per block to launch.
- The maximum size of block dimensions is returned by the "Maximum Work Group Size" property of OpenCLInformation.
- On launch, if the number of threads is not specified (as an extra argument to OpenCLFunction), then the dimension of the element with largest rank and dimension is chosen. For images, the rank is set to 2.
- On launch, if the number of threads is not a multiple of the block dimension, then it is incremented to be a multiple of the block dimension.
- The following options can be given:
-
"CompileOptions" {} compile options passed directly to the OpenCL compiler "Defines" Automatic defines passed to the OpenCL preprocessor "Device" $OpenCLDevice OpenCL device used in computation "IncludeDirectories" {} directories to include in the compilation "Platform" $OpenCLPlatform OpenCL platform used in computation "ShellCommandFunction" None function to call with the shell commands used for compilation "ShellOutputFunction" None function to call with the shell output of running the compilation commands "TargetPrecision" Automatic precision used in computation "WorkingDirectory" Automatic the directory in which temporary files will be generated
Examples
open allclose allBasic Examples (5)Summary of the most common use cases
First, load the OpenCLLink application:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-dvgu9k
Define the OpenCL source code to load:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-dix3zk

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-b2k0qc


https://wolfram.com/xid/0isq3flowdud5n74bny881he6-l8catb
Calls the function with the arguments:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-6wjo6
Plot the result using ArrayPlot:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-6c2yj5

Define the path to the OpenCL source file from the "SupportFiles/vectorAdd.cl":

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-cjxj9c

Compile and load the OpenCL function from the file:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-cr596a


https://wolfram.com/xid/0isq3flowdud5n74bny881he6-jmd2fh

Locate the example OpenCLLink library "addTwo_Dobule":

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-f1c2nr

Load the library using OpenCLFunctionLoad:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-hsj5m4
The function adds two to an input list:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-ed7tas

The source code for this example is bundled with OpenCLLink:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-go264y

An extra argument can be given when calling OpenCLFunction. The argument denotes the number of threads to launch (or the global work group size). Using the previous example:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-cs6mvn

This loads the OpenCL function from the file:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-juf3

This calls the function with 32 threads, which results in only the first 32 values in the vector add being computed:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-hdgtbt

If code contains syntax errors, then a "compilation failed" error is returned:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-ch7ddt

The "ShellOutputFunction" option can be used to print the build log:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-dyz540

The above error states that there is a typo in the code, with a z after the 0 in the code:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bdyu1p

Scope (2)Survey of the scope of standard use cases
Templated Function (1)
Templated functions can be simulated using macros. Leave as an undefined macro:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-nux43o
Set the macro to
during compilation:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-g7px


https://wolfram.com/xid/0isq3flowdud5n74bny881he6-f3wad

Shared or Local Memory (1)
OpenCLFunctionLoad can be used to specify "Local" or "Shared" memory on launch. The following code uses shared memory to store global memory for gradient computation:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bjuef1
This specifies the input arguments, with the last argument being "Shared" for shared memory. The block size is set to 256:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-km361w

This computes the flattened length of a grayscale image:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bee3hm
This invokes the function. The shared memory size is set to (blockSize+2)⋆sizeof (int) and the number of launch threads is set to the flattened length of the image:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-pbnqzg

A nicer way of specifying the shared memory size is using types:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-hacg9m

Using shared memory types, you need not pass in the size of the type:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bfbkno

Applications (10)Sample problems that can be solved with this function
Image Input (1)
The input can be images; here you write code that performs linear interpolation between images (this can be done using ImageCompose):

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-ize2d7
This loads OpenCLFunction from the source code above:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-cf90uv

This sets the height, width, and channel values. It also allocates memory for the output:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-fmxebe

This calls the function with {width,height} threads:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-b6jdp

This gets the memory and displays it as an image:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-igf4qv

You can take the above and make a function OpenCLImageLinearCombine:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-lnwzq9
The function now has similar syntax to ImageCompose:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-39wd

A Manipulate can be used to play with the interpolation coefficients:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-cjao5c

Effects can be made; in this example, a smooth animation is viewed:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-919e1

Uniform Random Number Generation (1)
Uniform random number generators are common seed problems in many applications. This implements uniform random number generators in OpenCL:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-6hiqb

This loads the source as an OpenCLFunction. This algorithm uses an image to provide an upper bound to the random number:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-ck1sn0

This calls OpenCLFunction; note that you can pass images directly into an OpenCLFunction so long as it can be interpreted using the appropriate specified type:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-vrim2

Notice that this is not a regular duck image; it is a 4-channel image with alpha channel set to 1 (using SetAlphaChannel):

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-g4ur4z

The random output can be used to detect important edges in an image:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-fm27d1

Random Number Generation Using the Mersenne Twister (1)
The Mersenne Twister is another uniform random number generator algorithm (more sophisticated than the one mentioned above). The implementation is located here:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-d84o29

This loads OpenCLFunction; you specify the type _Real, which means that the Real type is dependent on the CPU capabilities (whether it supports double precision or not):

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-u3efc

This sets up the Mersenne Twister's input and output parameters (for more information, refer to the algorithm description):

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-dscs93

This invokes OpenCLFunction:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-flxpbs

This plots the output's results:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-jwzkjc


https://wolfram.com/xid/0isq3flowdud5n74bny881he6-id9577

There is almost an 11× increase in speed:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-cmnqr0

Prefix Sum Algorithm (1)
The scan, or prefix sum, algorithm is similar to FoldList and is a very useful primitive algorithm that can be used in a variety of scenarios. The OpenCL implementation is found in:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-j0jfac

This loads the three kernels used in computation:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-i0l2j
This generates random input data:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-haj3tp
This allocates the output buffer:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-odg7fy
This computes the block and grid dimensions:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-imsy46
A temporary buffer is needed in computation:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-gv3flm
This performs the scan operation:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-zajes
This retrieves the output buffer:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bpzz7i

This deallocates the OpenCLMemory elements:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-jb9a64
Matrix Operations (1)
Matrix transpose is a fundamental algorithm in many applications. This specifies the inputs:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-f2v1b
This loads OpenCLFunction:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-t5ko2

This calls OpenCLFunction:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-brz2lr
This shows the MatrixForm of the result:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-4uqcz

The result agrees with the Wolfram Language:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-cu84qw

Matrix Multiplication (1)
Matrix multiplication is implemented here:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-kbhsg4


https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bc7p4e
This loads OpenCLFunction; note it is specified that the input must be rank 2:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-efumj9

This creates random input and allocates the output:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bh6566

This calls OpenCLFunction:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-byody3

This gets the output memory using OpenCLMemoryGet:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-g7ryrf

The output agrees with the Wolfram Language:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-mymz7d

Fast Fourier Transform (1)
The one-dimensional discrete fast Fourier transform can be implemented using OpenCLLink; this implementation assumes that the input is a power of 2:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-kta8r

This loads OpenCLFunction using OpenCLFunctionLoad:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-b79qf1

This creates input and output lists:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-faw8ln
This calls the output memory and creates a complex list, displaying only the first 50 elements:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-pffo6

The result agrees with Fourier:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-ci7qx3

Financial Derivative (1)
Black–Scholes models financial derivative investments and is implemented in OpenCL:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-d987ze

This loads OpenCLFunction:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-skspx

This assigns the input parameters:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-l7kojx
This invokes OpenCLFunction:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-ll9lsh


https://wolfram.com/xid/0isq3flowdud5n74bny881he6-c0igvs

The result agrees with the output of FinancialDerivative:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-i4djch

For timing, the number of options to be valuated is increased:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-knztu9
On the C2050, it takes 1/100 of a second to valuate 2048 options:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-c1rwbc

On a Core i7 950, FinancialDerivative takes 1.13 seconds. This is a speedup of 280×. Note that increasing the number of options will exhibit more speedups:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-cp8ocy

Gaussian Filter (1)
Recursive Gaussian is used to approximate the Gaussian filter. The Gaussian matrix is separable:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bcauo

It can be written as the outer product of two 1D Gaussians:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-kdo1w9

Locate the implementation of the recursive Gaussian:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-dqz9rh

Load two functions using OpenCLFunctionLoad:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-q5sxq
Specifies the value in the Gaussian
:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-eozwvz
Calculate the normal distribution:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-mirxko

The Wolfram Language can plot the distribution:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-tppv0

Calculate the recursive Gaussian parameters:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-mmxvhv
Allocate OpenCLMemory for the input, output, and temporary storage:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bnmjyb
Perform the Gaussian horizontally, then transpose, then perform the Gaussian vertically, and finally transpose to get the full Gaussian:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bivzqd
Reconstruct the image from the data:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-hzpns8


https://wolfram.com/xid/0isq3flowdud5n74bny881he6-caac8i

And notice a 4× performance boost:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bjizga

Sorting (1)
Bitonic sort sorts a given set of integers. It is similar in principle to merge sort. The OpenCL implementation only works on lists of length of a power of 2 and can be found here:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-cgwlu6


https://wolfram.com/xid/0isq3flowdud5n74bny881he6-5h3u0

This sets the length of the input and loads it. The direction denotes whether to sort from highest to lowest or lowest to highest. In this case, you sort from lowest to highest:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-mdjhax

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-cvb8s

This calls bitonic sort, similar to merge sort; multiple calls are needed for a full sort:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-c7hmzl
The output list is retrieved sorted:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-pccb9

Possible Issues (5)Common pitfalls and unexpected behavior
The maximum work item sizes (block dimensions) are returned by OpenCLInformation:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-h0obz

On some systems, this can be limited to 1.
To use double-precision operations in the OpenCL code, the user must place the following pragmas in the code header:
#ifdef USING_DOUBLE_PRECISIONQ
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#endif /* USING_DOUBLE_PRECISIONQ */
Errors in the function call can place OpenCLLink in an unusable state. This is a side effect of allowing users to write arbitrary kernels. Infinite loops, buffer overflows, etc. in the kernel code can make both OpenCLLink and the video driver unstable. In an extreme case, this may crash the display driver, but usually it just makes further evaluation of OpenCL code return invalid results.
Bugs in some OpenCL implementations may cause the kernel to crash if one of the "IncludeDirectories" contains a space.
Use of memory modifiers such as is not supported by OpenCLLink. Memory passed into an OpenCLFunction must be
.
Interactive Examples (5)Examples with interactive outputs
Mandelbrot Set (1)
The Mandelbrot set plots all points satisfying the recurrence equation with
a complex number. The following implements the set in OpenCL (a slightly more complicated coloring strategy is used to ensure colors have smooth transition):

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-fpfe3h

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-daweyh


https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bwo2m6

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-gae8gd


https://wolfram.com/xid/0isq3flowdud5n74bny881he6-dgijpw


https://wolfram.com/xid/0isq3flowdud5n74bny881he6-njf1w

Julia Set (1)
The Mandelbrot set is a restricted form of the Julia set; here is the code for the Julia set:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-scs1h
This defines the input memory and parameters:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-mb7u29
This loads OpenCLFunction:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-mqrc2a
This computes the Julia set and plots it using ReliefPlot:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-c5j10z

This computes the Julia set and displays it as a grayscale image:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-2tawu

Image Adjustment (1)
ImageAdjust rescales the image to input high and low values. Gamma correction is also considered. The following defines a simplified version of ImageAdjust in OpenCL:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bfo9f5
This loads OpenCLFunction:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-c4uzlf
This defines a simple Wolfram Language wrapper function to make the OpenCL function have similar syntax to ImageAdjust:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-bahhhm
This adjusts the image by rescaling the values between 0.3 and 0.8 to 0.0 and 1.0:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-edzxkh

This adjusts the image by rescaling the values using Manipulate:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-gt1sws

This adjusts the image by rescaling the values between 0.3 and 0.8 to 0.0 and 1.0:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-hamjdh

Bouncing Ball (1)
In this example, you compute the position of each particle in a box with varying initial forces. You delegate the particle physics simulation to OpenCL, while all the rest is done in the Wolfram Language:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-e42ymr
This defines the OpenCL code and loads the function into the Wolfram Language:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-qlpny

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-ehdx15

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-cp9b56
N-Body Simulation (1)
The N-body simulation is a classic Newtonian problem. This implements it in OpenCL:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-hyehcj
This loads OpenCLFunction:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-iwnx5u

The number of particles, time step, and epsilon distance are chosen:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-d5npco
This sets the input and output memories:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-h0a1ek
This calls the NBody function:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-fhxghw

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-icyd85

This shows the result as a Dynamic:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-ll80vc

Neat Examples (1)Surprising or curious use cases
SymbolicC (1)
OpenCLLink can use SymbolicC's code generation capabilities. To use SymbolicC, the user needs to load it:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-0wsz71
OpenCLLink can use SymbolicC's code generation capabilities; here a method toSymbolicC is defined that takes a Wolfram Language statement and translates it to a SymbolicC expression (it cannot translate all Wolfram Language commands, but they can be added by the user):

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-py2g9o
Wolfram Language expressions can be transformed:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-u8ztub

To translate to C, the user uses ToCCodeString:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-xg7p5u

You can tie this with OpenCLLink's symbolic code generation capabilities to create an OpenCLMapSource function:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-s6x2vz
OpenCLMapSource can work with pure Wolfram Language functions:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-rwr8qn

You can also use the code to work with predefined Wolfram Language functions:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-3ckyaz

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-d94l6i

The above code can then be loaded using OpenCLFunctionLoad:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-hhu9r

The function can be evaluated:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-mv5eq

To make this general, you can implement an OpenCLMap function:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-kk67b7
The function can be evaluated. Here, the addTwo function is implemented:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-hluyiw

Here, the BitNot operator is used:

https://wolfram.com/xid/0isq3flowdud5n74bny881he6-chz99y

Wolfram Research (2010), OpenCLFunctionLoad, Wolfram Language function, https://reference.wolfram.com/language/OpenCLLink/ref/OpenCLFunctionLoad.html.
Text
Wolfram Research (2010), OpenCLFunctionLoad, Wolfram Language function, https://reference.wolfram.com/language/OpenCLLink/ref/OpenCLFunctionLoad.html.
Wolfram Research (2010), OpenCLFunctionLoad, Wolfram Language function, https://reference.wolfram.com/language/OpenCLLink/ref/OpenCLFunctionLoad.html.
CMS
Wolfram Language. 2010. "OpenCLFunctionLoad." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/OpenCLLink/ref/OpenCLFunctionLoad.html.
Wolfram Language. 2010. "OpenCLFunctionLoad." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/OpenCLLink/ref/OpenCLFunctionLoad.html.
APA
Wolfram Language. (2010). OpenCLFunctionLoad. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/OpenCLLink/ref/OpenCLFunctionLoad.html
Wolfram Language. (2010). OpenCLFunctionLoad. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/OpenCLLink/ref/OpenCLFunctionLoad.html
BibTeX
@misc{reference.wolfram_2025_openclfunctionload, author="Wolfram Research", title="{OpenCLFunctionLoad}", year="2010", howpublished="\url{https://reference.wolfram.com/language/OpenCLLink/ref/OpenCLFunctionLoad.html}", note=[Accessed: 29-March-2025
]}
BibLaTeX
@online{reference.wolfram_2025_openclfunctionload, organization={Wolfram Research}, title={OpenCLFunctionLoad}, year={2010}, url={https://reference.wolfram.com/language/OpenCLLink/ref/OpenCLFunctionLoad.html}, note=[Accessed: 29-March-2025
]}