CUDAFunctionLoad


loads CUDAFunction from scr and makes fun available in Mathematica.


loads CUDAFunction from srcfile and makes fun available in Mathematica.


loads CUDAFunction from libfile and makes fun available in Mathematica.

更多信息更多信息

  • The CUDALink application must be loaded using Needs["CUDALink`"].
  • Possible argument and return types, and their corresponding CUDA type, include:
  • _IntegermintMathematica integer
    "Integer32"int32-bit integer
    "Integer64"long/long long64-bit integer
    _RealReal_tGPU real type
    "Double"doublemachine double
    "Float"floatmachine float
    {base, rank, io}CUDAMemorymemory of specified base type, rank, and input/output option
    "Local" | "Shared"mintlocal or shared memory parameter
    {"Local" | "Shared", type}mintlocal or shared memory parameter
  • Valid io is , , and .
  • If is passed, then is used by default. If is passed, then is used.
  • The rank can be omitted by using or .
  • Possible base types are:
  • _Integer_Real_Complex
    "Byte""Bit16""Integer32"
    "Byte[2]""Bit16[2]""Integer32[2]"
    "Byte[3]""Bit16[3]""Integer32[3]"
    "Byte[4]""Bit16[4]""Integer32[4]"
    "UnsignedByte""UnsignedBit16""UnsignedInteger"
    "UnsignedByte[2]""UnsignedBit16[2]""UnsignedInteger[2]"
    "UnsignedByte[3]""UnsignedBit16[3]""UnsignedInteger[3]"
    "UnsignedByte[4]""UnsignedBit16[4]""UnsignedInteger[4]"
    "Double""Float""Integer64"
    "Double[2]""Float[2]""Integer64[2]"
    "Double[3]""Float[3]""Integer64[3]"
    "Double[4]""Float[4]""Integer64[4]"
  • can be called more than once with different arguments.
  • Functions loaded by run in the same process as the Mathematica kernel.
  • Functions loaded by are unloaded when the Mathematica kernel exits.
  • Block dimensions can be either a list or an integer denoting how many threads per block to launch.
  • If libfile is a dynamic library, then the dynamic library function fun is loaded.
  • libfile can be a CUDA PTX, CUDA CUBIN, or a library file.
  • The maximum size of block dimensions is returned by the property of CUDAInformation.
  • On launch, if the number of threads is not specified (as an extra argument to the CUDAFunction) then the dimension of the element with largest rank and dimension is chosen. For images, the rank is set to 2.
  • On launch, if the number of threads is not a multiple of the block dimension, then it is incremented to be a multiple of the block dimension.
  • The following options can be given:
  • "CleanIntermediate"Automaticwhether temporary files should be deleted
    "CompileOptions"{}compile options passed directly to the NVCC compiler
    "CompilerInstallation"Automaticlocation of the CUDA Toolkit installation
    "CreateCUBIN"Truewhether to compile code to a CUDA binary
    "CreatePTX"Falsewhether to compile code to CUDA bytecode
    "CUDAArchitecture"Automaticarchitecture for which to compile CUDA code
    "Defines"{}defines passed to the NVCC preprocessor
    "Device"$CUDADeviceCUDA device used in computation
    "IncludeDirectories"{}directories to include in the compilation
    "ShellCommandFunction"Nonefunction to call with the shell commands used for compilation
    "ShellOutputFunction"Nonefunction to call with the shell output of running the compilation commands
    "SystemDefines"Automaticsystem defines passed to the NVCC preprocessor
    "TargetDirectory"Automaticthe directory in which CUDA files should be generated
    "TargetPrecision"Automaticprecision used in computation
    "WorkingDirectory"Automaticthe directory in which temporary files will be generated
    "XCompilerInstallation"Automaticthe directory where NVCC will find the C compiler is installed

范例范例打开所有单元关闭所有单元

基本范例 (7)基本范例 (7)

First, load the CUDALink application:

In[1]:=
Click for copyable input

This code adds 2 to a given vector:

In[2]:=
Click for copyable input

This compiles and runs the CUDA code defined above:

In[3]:=
Click for copyable input
Out[3]=

This defines the length of the output list:

In[4]:=
Click for copyable input

The following defines the input and output vectors. These are regular Mathematica lists that have the same type as defined in the CUDA kernel code's signature:

In[5]:=
Click for copyable input

This runs the function with the specified input:

In[6]:=
Click for copyable input

This prints the first 20 values of the result:

In[7]:=
Click for copyable input
Out[7]=

CUDA files can be passed in. This gets the path to the CUDA function file:

In[1]:=
Click for copyable input
Out[1]=

File names are enclosed as lists:

In[2]:=
Click for copyable input
Out[2]=

This defines the input parameters:

In[3]:=
Click for copyable input
Out[3]=

This calls the function:

In[4]:=
Click for copyable input
Out[4]=

An extra argument can be given when calling the CUDAFunction. The argument denotes the number of threads to launch (or grid dimension times block dimension). This gets the source files containing the CUDA implementation:

In[1]:=
Click for copyable input
Out[1]=

This loads the CUDA function from the file:

In[2]:=
Click for copyable input
Out[2]=

This calls the function with 32 threads, which results in only the first 32 values in the vector add being computed:

In[3]:=
Click for copyable input
Out[3]=

CUDA binaries can be passed in. This compiles a CUDA function to a binary using NVCCCompiler. The must be set to True if the user wishes to use the unmangled name:

In[1]:=
Click for copyable input
Out[1]=
In[2]:=
Click for copyable input
Out[2]=

For floating-precision support, is defined based on hardware and :

In[1]:=
Click for copyable input

With no options, uses the highest floating-point precision available on the device. In this case, it is double precision:

In[2]:=
Click for copyable input
Out[2]=

Notice how the macros and are defined. To avoid detection, you can pass in the or options. This is equivalent to the above:

In[3]:=
Click for copyable input
Out[3]=

To force the use of single precision, pass the value to :

In[4]:=
Click for copyable input
Out[4]=

The type is detected based on the target precision. To force the use of a specific type, pass either or as type:

In[5]:=
Click for copyable input
Out[5]=

CUDALink libraries can be loaded. This gets the path to an example CUDA library:

In[1]:=
Click for copyable input
Out[1]=

This makes sure that the file exists, since the precompiled library extension is operating system dependent:

In[2]:=
Click for copyable input
Out[2]=

This loads the library using :

In[3]:=
Click for copyable input
Out[3]=

The function adds two to an input list:

In[4]:=
Click for copyable input
Out[4]=

The source code for this example is bundled with CUDALink:

In[5]:=
Click for copyable input
Out[5]=

The can be used to give information on compile failures. This source code has a syntax error:

In[1]:=
Click for copyable input

This loads the function:

Setting "ShellOutputFunction"->Print gives the build log:

In this case, the variable was misspelled.

New to Mathematica? Find your learning path »
Have a question? Ask support »