|
SOLUTIONS
|
CUDAFunctionLoad
loads CUDAFunction from scr and makes fun available in Mathematica.
![]()
loads CUDAFunction from srcfile and makes fun available in Mathematica.
![]()
loads CUDAFunction from libfile and makes fun available in Mathematica.
更多信息更多信息
- The CUDALink application must be loaded using Needs["CUDALink`"].
- Possible argument and return types, and their corresponding CUDA type, include:
-
_Integer mint Mathematica integer "Integer32" int 32-bit integer "Integer64" long/long long 64-bit integer _Real Real_t GPU real type "Double" double machine double "Float" float machine float {base, rank, io} CUDAMemory memory of specified base type, rank, and input/output option "Local" | "Shared" mint local or shared memory parameter {"Local" | "Shared", type} mint local or shared memory parameter - Valid io is
,
, and
. - If
is passed, then
is used by default. If
is passed, then
is used. - The rank can be omitted by using
or
. - Possible base types are:
-
_Integer _Real _Complex "Byte" "Bit16" "Integer32" "Byte[2]" "Bit16[2]" "Integer32[2]" "Byte[3]" "Bit16[3]" "Integer32[3]" "Byte[4]" "Bit16[4]" "Integer32[4]" "UnsignedByte" "UnsignedBit16" "UnsignedInteger" "UnsignedByte[2]" "UnsignedBit16[2]" "UnsignedInteger[2]" "UnsignedByte[3]" "UnsignedBit16[3]" "UnsignedInteger[3]" "UnsignedByte[4]" "UnsignedBit16[4]" "UnsignedInteger[4]" "Double" "Float" "Integer64" "Double[2]" "Float[2]" "Integer64[2]" "Double[3]" "Float[3]" "Integer64[3]" "Double[4]" "Float[4]" "Integer64[4]"
can be called more than once with different arguments.- Functions loaded by
run in the same process as the Mathematica kernel. - Functions loaded by
are unloaded when the Mathematica kernel exits. - Block dimensions can be either a list or an integer denoting how many threads per block to launch.
- If libfile is a dynamic library, then the dynamic library function fun is loaded.
- libfile can be a CUDA PTX, CUDA CUBIN, or a library file.
- The maximum size of block dimensions is returned by the
property of CUDAInformation. - On launch, if the number of threads is not specified (as an extra argument to the CUDAFunction) then the dimension of the element with largest rank and dimension is chosen. For images, the rank is set to 2.
- On launch, if the number of threads is not a multiple of the block dimension, then it is incremented to be a multiple of the block dimension.
- The following options can be given:
-
"CleanIntermediate" Automatic whether temporary files should be deleted "CompileOptions" {} compile options passed directly to the NVCC compiler "CompilerInstallation" Automatic location of the CUDA Toolkit installation "CreateCUBIN" True whether to compile code to a CUDA binary "CreatePTX" False whether to compile code to CUDA bytecode "CUDAArchitecture" Automatic architecture for which to compile CUDA code "Defines" {} defines passed to the NVCC preprocessor "Device" $CUDADevice CUDA device used in computation "IncludeDirectories" {} directories to include in the compilation "ShellCommandFunction" None function to call with the shell commands used for compilation "ShellOutputFunction" None function to call with the shell output of running the compilation commands "SystemDefines" Automatic system defines passed to the NVCC preprocessor "TargetDirectory" Automatic the directory in which CUDA files should be generated "TargetPrecision" Automatic precision used in computation "WorkingDirectory" Automatic the directory in which temporary files will be generated "XCompilerInstallation" Automatic the directory where NVCC will find the C compiler is installed
范例范例打开所有单元关闭所有单元
基本范例 (7)基本范例 (7)
First, load the CUDALink application:
| In[1]:= |
This code adds 2 to a given vector:
| In[2]:= |
This compiles and runs the CUDA code defined above:
| In[3]:= |
| Out[3]= |
This defines the length of the output list:
| In[4]:= |
The following defines the input and output vectors. These are regular Mathematica lists that have the same type as defined in the CUDA kernel code's signature:
| In[5]:= |
This runs the function with the specified input:
| In[6]:= |
This prints the first 20 values of the result:
| In[7]:= |
| Out[7]= |
CUDA files can be passed in. This gets the path to the CUDA function file:
| In[1]:= |
| Out[1]= |
File names are enclosed as lists:
| In[2]:= |
| Out[2]= |
This defines the input parameters:
| In[3]:= |
| Out[3]= |
| In[4]:= |
| Out[4]= | ![]() |
An extra argument can be given when calling the CUDAFunction. The argument denotes the number of threads to launch (or grid dimension times block dimension). This gets the source files containing the CUDA implementation:
| In[1]:= |
| Out[1]= |
This loads the CUDA function from the file:
| In[2]:= |
| Out[2]= |
This calls the function with 32 threads, which results in only the first 32 values in the vector add being computed:
| In[3]:= |
| Out[3]= | ![]() |
CUDA binaries can be passed in. This compiles a CUDA function to a binary using NVCCCompiler. The
must be set to True if the user wishes to use the unmangled name:
| In[1]:= |
| Out[1]= |
| In[2]:= |
| Out[2]= |
For floating-precision support,
is defined based on hardware and
:
| In[1]:= |
With no options,
uses the highest floating-point precision available on the device. In this case, it is double precision:
Notice how the macros
and
are defined. To avoid detection, you can pass in the
or
options. This is equivalent to the above:
To force the use of single precision, pass the
value to
:
The type
is detected based on the target precision. To force the use of a specific type, pass either
or
as type:
CUDALink libraries can be loaded. This gets the path to an example CUDA library:
| In[1]:= |
| Out[1]= |
This makes sure that the file exists, since the precompiled library extension is operating system dependent:
| In[2]:= |
| Out[2]= |
This loads the library using
:
| In[3]:= |
| Out[3]= |
The function adds two to an input list:
| In[4]:= |
| Out[4]= |
The source code for this example is bundled with CUDALink:
| In[5]:= |
| Out[5]= |
The
can be used to give information on compile failures. This source code has a syntax error:
| In[1]:= |
Setting "ShellOutputFunction"->Print gives the build log:













