CUDALink`
CUDALink`

CUDAMemoryAllocate

CUDAMemoryAllocate[type,dim]

gives CUDAMemory with specified type and single dimension.

CUDAMemoryAllocate[type,{dim1,dim2,}]

gives CUDAMemory with specified type and dimensions.

Details

  • The CUDALink application must be loaded using Needs["CUDALink`"].
  • Possible types for CUDAMemoryAllocate are:
  • IntegerRealComplex
    "Byte""Bit16""Integer32"
    "Byte[2]""Bit16[2]""Integer32[2]"
    "Byte[3]""Bit16[3]""Integer32[3]"
    "Byte[4]""Bit16[4]""Integer32[4]"
    "UnsignedByte""UnsignedBit16""UnsignedInteger"
    "UnsignedByte[2]""UnsignedBit16[2]""UnsignedInteger[2]"
    "UnsignedByte[3]""UnsignedBit16[3]""UnsignedInteger[3]"
    "UnsignedByte[4]""UnsignedBit16[4]""UnsignedInteger[4]"
    "Double""Float""Integer64"
    "Double[2]""Float[2]""Integer64[2]"
    "Double[3]""Float[3]""Integer64[3]"
    "Double[4]""Float[4]""Integer64[4]"
  • The following options can be given:
  • "Device"$CUDADeviceCUDA device used in computation
    "TargetPrecision"Automaticprecision used in computation

Examples

open allclose all

Basic Examples  (4)

First, load the CUDALink application:

This allocates a rank 3 tensor with each dimension 10:

Information about memory can be retrieved via CUDAMemoryInformation:

This unloads the memory:

For a single dimension, the length can be an integer:

Link CUDAMemoryLoad; different types are supported:

Adding memory as Real or Complex gets the type based on whether the device supports double precision or not:

In this case, the CUDA device has double-precision support:

The behavior can be forced to change by setting the "TargetPrecision":

Applications  (1)

This sets all elements in a list to 0:

This allocates the required memory:

This loads the function:

This runs the function:

This shows information about the memory; note that the "DeviceStatus" is "Synchronized":

This gets the memory from the GPU:

This shows information about the memory; note that the "DeviceStatus" and "HostStatus" are "Synchronized":

Possible Issues  (1)

Getting memory from the GPU for unset allocated memory returns random results: