CTCLossLayer

CTCLossLayer[]

represents a net layer that computes the connectionist temporal classification loss by comparing a sequence of class probability vectors with a sequence of indices representing the target classes.

Details and Options

  • CTCLossLayer[] represents a net that takes an input matrix representing a sequence of vectors and a target vector representing a sequence of integers and outputs a real value.
  • CTCLossLayer is typically used inside NetGraph.
  • CTCLossLayer exposes the following ports for use in NetGraph etc.:
  • "Input"a sequence of probability vectors of size c+1
    "Target"a sequence of integers between 1 and c
    "Output"a real number
  • The layer definition is based on Graves et al., "Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks", 2006.
  • The input should be a sequence of probability vectors of size c+1 where each vector sums to 1. The last element of each vector represents the probability of a special blank class, with the remaining elements representing the probability of the indexed classes 1 to c. The target is a sequence of integers between 1 and c. The target sequence cannot be longer than the input sequence.
  • CTCLossLayer[][<|"Input"in,"Target"target|>] explicitly computes the output from applying the layer.
  • CTCLossLayer[][<|"Input"->{in1,in2,},"Target"->{target1,target2,}|>] explicitly computes outputs for each of the ini and targeti.
  • When given a NumericArray as input, the output will be a NumericArray.
  • The size of the input is usually inferred automatically within a NetGraph.
  • CTCLossLayer[n,"Input"ishape,"Target"tshape] allows the shape of the input and target to be specified. Possible forms for ishape are:
  • NetEncoder[]encoder producing a sequence of vectors
    {len,c+1}sequence of len length-(c+1) vectors
    {len,Automatic}sequence of len vectors whose length is inferred
    {"Varying",c+1}varying number of vectors each of length c+1
    {"Varying",Automatic}varying number of vectors each of inferred length
  • Possible forms for tshape are:
  • NetEncoder[]encoder producing a sequence of integers
    {len2}sequence of len2 integers
    {"Varying"}varying number of integers
    RepeatingElement[Restricted[Integer,c]]varying number of integers in the range 1 to c
  • Options[CTCLossLayer] gives the list of default options to construct the layer. Options[CTCLossLayer[]] gives the list of default options to evaluate the layer on some data.
  • Information[CTCLossLayer[]] gives a report about the layer.
  • Information[CTCLossLayer[],prop] gives the value of the property prop of CTCLossLayer[]. Possible properties are the same as for NetGraph.

Examples

open allclose all

Basic Examples  (2)

Create a CTCLossLayer object:

Create a CTCLossLayer where the input is a matrix whose rows are probability vectors and the target is a vector of indices:

Apply it to an input and a target:

Applications  (1)

Train a net that classifies a vector of characters in an image. First generate training and test data, which consists of images of words and the corresponding word string:

Split the dataset into a test and a training set:

Take a RandomSample of the training set:

The list of characters used:

The decoder is a beam search decoder with a beam size of 50:

Define a net that takes an image and then treats the width dimension as a sequence dimension. A matrix whose rows are probability vectors over the width dimension is produced:

Define a CTCLossLayer with a character NetEncoder attached to the target port:

Train the net using the CTC loss:

Evaluate the trained net on images from the test set:

Obtain the top-5 decodings for an image, along with the negative log likelihood of each decoding:

Possible Issues  (1)

The size of each of the input probability vectors cannot be 1:

Introduced in 2018
 (11.3)