SpatialTransformationLayer

SpatialTransformationLayer[{h,w}]

represents a net layer that applies an affine transformation to an input of size c×h0×w0 and returns an output of size c×h×w.

Details and Options

  • SpatialTransformationLayer exposes the following ports for use in NetGraph etc.:
  • "Input"a 3-dimensional array
    "Parameters"a vector of length 6
    "Output"a 3-dimensional array
  • SpatialTransformationLayer[][<|"Input"->in,"Parameters"param|>] explicitly computes the output from applying the layer.
  • SpatialTransformationLayer[][<|"Input"->{in1,in2,},"Parameters"->{param1,param2,}|>] explicitly computes output for each of the ini and parami.
  • When given a NumericArray as input, the output will be a NumericArray.
  • SpatialTransformationLayer is typically used inside NetGraph to focus the attention of a later convolutional network on the best part of the image to perform a specific task.
  • When it cannot be inferred from other layers in a larger net, the option "Input"->{d1,d2,d3} can be used to fix the input dimensions of SpatialTransformationLayer.
  • The six components of the vector provided to the port "Parameters", {zh,sh,th,sv,zv,tv}, represent the parameters in the affine transformation matrix, where zi represents zoom, si skewness and ti translation, and the subscripts h and v indicate horizontal and vertical. The identity transformation is obtained when "Parameters" is {1,0,0,0,1,0}.
  • Options[SpatialTransformationLayer] gives the list of default options to construct the layer. Options[SpatialTransformationLayer[]] gives the list of default options to evaluate the layer on some data.
  • Information[SpatialTransformationLayer[]] gives a report about the layer.
  • Information[SpatialTransformationLayer[],prop] gives the value of the property prop of SpatialTransformationLayer[]. Possible properties are the same as for NetGraph.

Examples

open allclose all

Basic Examples  (2)

Create a SpatialTransformationLayer with output size 30×30:

Create a SpatialTransformationLayer that expects an input of size 1×3×3 and returns an output of size 1×2×2:

Apply the layer to an input:

Scope  (1)

Create a SpatialTransformationLayer whose input is an image and whose output is an image:

Apply the SpatialTransformationLayer to an image with a factor-2 zoom transformation:

Apply the SpatialTransformationLayer using a sequence of zooms:

Applications  (1)

Train a digit recognizer on the MNIST database of handwritten digits using a convolutional neural network with a SpatialTransformationLayer. First obtain the training and test data:

Define a function to apply extra padding and random translations to the training and test data:

Create new training and test data using the function (this should take about a minute):

Create a network that uses the image to predict the best affine transformation to apply to the image to extract the digit:

Create a convolutional classification net to use the subimage extracted by the localization net:

Attach the classification network and the localization network to a spatial transformation layer:

Train the network:

If the classification network is removed, the effect of the spatial transformer can be visualized:

Apply the spatial transformer to some images from the validation set:

Obtain the accuracy of the network on the validation set:

Properties & Relations  (1)

Apply an AffineTransform to the coordinates of an image using ImageTransformation:

Construct an equivalent set of parameters for SpatialTransformationLayer:

Introduced in 2017
 (11.1)