# Introduction
This feature is not supported on the Wolfram Cloud.

*CUDALink* allows the Wolfram Language to use the CUDA parallel computing architecture on Graphical Processing Units (GPUs). It contains functions that use CUDA-enabled GPUs to boost performance in a number of areas, such as linear algebra, financial simulation, and image processing. *CUDALink* also integrates CUDA with existing Wolfram Language development tools, allowing a high degree of automation and control.

## Getting Started

To use any *CUDALink* functions, the application has to be loaded.

In[1]:= |

CUDAQ tells you whether a CUDA-capable device is available and can be used.

In[2]:= |

Out[2]= |

If CUDAQ returns False, then *CUDALink* will not work. For more information, read "*CUDALink *Setup".

CUDAInformation tells you more information about the graphics processing unit. Here the GPU is a Quadro FX unit with 96 cores and 1GB of graphics memory.

In[3]:= |

Out[3]= |

In the Wolfram Language you can make matrices of real random numbers with the RandomReal command.

In[4]:= |

You can multiply the matrix with itself, as shown below.

In[5]:= |

Out[5]= |

You can use the graphics processor to do the multiplication using CUDADot.

In[6]:= |

Out[6]= |

It is even faster to load the data onto the GPU with CUDAMemoryLoad. The result is a CUDAMemory expression; this can be used as a handle to the data for more computations.

In[7]:= |

Out[7]= |

Now you can pass the CUDAMemory to CUDADot, which stores the result on the GPU and returns a new CUDAMemory expression.

In[8]:= |

Out[8]= |

You can retrieve the data from the graphics processor with CUDAMemoryGet. Here, the dimensions of the result are shown to be as expected.

In[9]:= |

Out[9]= |

## CUDA and *CUDALink*

CUDA is used as the computing engine for NVIDIA graphics processing units (GPUs); it provides a programming interface that can be called by software applications such as the Wolfram Language with *CUDALink*. It allows CUDA GPUs to be used for parallel computations, allowing many concurrent threads to run.

*CUDALink* allows Wolfram Language users to call the CUDA programming layer directly; it also provides users higher-level functions, provided in a number of CUDA libraries, for solutions in areas such as high-performance core linear algebra and Fourier transforms.

*CUDALink* Application Areas

*CUDALink* provides functions in various application areas. These include carefully tuned linear algebra, discrete Fourier transforms, and image processing algorithms. This section gives an introduction to some of these applications.

### Image Processing

*CUDALink* offers many image processing algorithms that have been carefully tuned to run on GPUs. These include the binary image operations (CUDAImageAdd, CUDAImageSubtract, CUDAImageMultiply, and CUDAImageDivide), the morphology operators (CUDAErosion, CUDADilation, CUDAOpening, and CUDAClosing), and image convolution (CUDAImageConvolve).

To use any of the *CUDALink* functionality, you first need to include the *CUDALink* application, as shown below.

In[1]:= |

Now you can apply CUDA-based image processing functions directly to images. This example carries out channel-wise multiplication of two input images using CUDAImageMultiply.

In[2]:= |

Out[2]= |

Since the CUDA functions are Wolfram Language functions, they can be used in conjunction with other functions (like Manipulate). In the following example, a linear interpolation is carried out with Manipulate being used to vary the interpolation parameter.

In[3]:= |

Out[3]= |

CUDA functions can also be used with the Wolfram Language's curated data, Import and Export functions, as well as its visualization functions. In the following, the core computation is done with CUDA, while using the Wolfram Language for all the other functions.

In[4]:= |

Out[5]= |

### Fourier Analysis

The functions CUDAFourier and CUDAInverseFourier carry out Fourier transforms and inverse Fourier transforms using CUDA.

To use any of the *CUDALink* functionality, you first need to include the *CUDALink* application, as shown below.

In[1]:= |

In[2]:= |

Out[3]= |

If the input to a CUDA function is a CUDAMemory handle, the result will also be a CUDAMemory handle.

In[4]:= |

Out[5]= |

You can then retrieve the data from the GPU with CUDAMemoryGet.

In[6]:= |

Out[6]= |

This general principle applies to all *CUDALink* functions. It makes it easy to test and develop, and then to work by keeping data on the GPU, which improves efficiency.

*CUDALink* Programming

A key feature of *CUDALink* is how it makes it easy to develop new GPU programs and integrate them into your Wolfram Language* *work. This requires that you have installed a C compiler.

To use any of the *CUDALink* functionality, you first need to include the *CUDALink* application, as shown below.

In[14]:= |

Here, a simple CUDA function is loaded. It takes an input vector and doubles each element.

In[15]:= |

Out[15]= |

CUDAFunctionLoad requires that you have a C compiler installed. If this does not work for you, try to consult "C Compiler".

Here is an input vector that will be used to call the function.

In[16]:= |

This calls the CUDAFunction.

In[17]:= |

Out[17]= |

This loads data to the GPU and calls your function on the data. The result is a CUDAMemory expression.

In[5]:= |

Out[6]= |

This retrieves the result from the GPU.

In[7]:= |

Out[7]= |

You can get information on your function with CUDAFunctionInformation, as shown in the following.

In[18]:= |

Out[18]= |