SpatialTransformationLayer
SpatialTransformationLayer[{h,w}]
represents a net layer that applies an affine transformation to an input of size c×h0×w0 and returns an output of size c×h×w.
Details and Options
- SpatialTransformationLayer exposes the following ports for use in NetGraph etc.:
-
"Input" a 3-dimensional array "Parameters" a vector of length 6 "Output" a 3-dimensional array - SpatialTransformationLayer[…][<"Input"->in,"Parameters"param >] explicitly computes the output from applying the layer.
- SpatialTransformationLayer[…][<"Input"->{in1,in2,…},"Parameters"->{param1,param2,…} >] explicitly computes output for each of the ini and parami.
- When given a NumericArray as input, the output will be a NumericArray.
- SpatialTransformationLayer is typically used inside NetGraph to focus the attention of a later convolutional network on the best part of the image to perform a specific task.
- When it cannot be inferred from other layers in a larger net, the option "Input"->{d1,d2,d3} can be used to fix the input dimensions of SpatialTransformationLayer.
- The six components of the vector provided to the port "Parameters", {zh,sh,th,sv,zv,tv}, represent the parameters in the affine transformation matrix, where zi represents zoom, si skewness and ti translation, and the subscripts h and v indicate horizontal and vertical. The identity transformation is obtained when "Parameters" is {1,0,0,0,1,0}.
- Options[SpatialTransformationLayer] gives the list of default options to construct the layer. Options[SpatialTransformationLayer[…]] gives the list of default options to evaluate the layer on some data.
- Information[SpatialTransformationLayer[…]] gives a report about the layer.
- Information[SpatialTransformationLayer[…],prop] gives the value of the property prop of SpatialTransformationLayer[…]. Possible properties are the same as for NetGraph.
Examples
open allclose allBasic Examples (2)
Create a SpatialTransformationLayer with output size 30×30:
Create a SpatialTransformationLayer that expects an input of size 1×3×3 and returns an output of size 1×2×2:
Scope (1)
Create a SpatialTransformationLayer whose input is an image and whose output is an image:
Apply the SpatialTransformationLayer to an image with a factor-2 zoom transformation:
Apply the SpatialTransformationLayer using a sequence of zooms:
Applications (1)
Train a digit recognizer on the MNIST database of handwritten digits using a convolutional neural network with a SpatialTransformationLayer. First obtain the training and test data:
Define a function to apply extra padding and random translations to the training and test data:
Create new training and test data using the function (this should take about a minute):
Create a network that uses the image to predict the best affine transformation to apply to the image to extract the digit:
Create a convolutional classification net to use the subimage extracted by the localization net:
Attach the classification network and the localization network to a spatial transformation layer:
If the classification network is removed, the effect of the spatial transformer can be visualized:
Apply the spatial transformer to some images from the validation set:
Properties & Relations (1)
Apply an AffineTransform to the coordinates of an image using ImageTransformation:
Construct an equivalent set of parameters for SpatialTransformationLayer:
Text
Wolfram Research (2017), SpatialTransformationLayer, Wolfram Language function, https://reference.wolfram.com/language/ref/SpatialTransformationLayer.html.
CMS
Wolfram Language. 2017. "SpatialTransformationLayer." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/SpatialTransformationLayer.html.
APA
Wolfram Language. (2017). SpatialTransformationLayer. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/SpatialTransformationLayer.html