LongShortTermMemoryLayer

LongShortTermMemoryLayer[n]

represents a trainable recurrent layer that takes a sequence of vectors and produces a sequence of vectors, each of size n.

LongShortTermMemoryLayer[n,opts]

includes options for weights and other parameters.

Details and Options

  • LongShortTermMemoryLayer[n] represents a net that takes a sequence of vectors and outputs a sequence of the same length.
  • Each element of the input sequence is a vector of size k, and each element of the output sequence is a vector of size n.
  • The size k of the input vectors is usually inferred automatically within a NetGraph, NetChain, etc.
  • The input and output ports of the net represented by LongShortTermMemoryLayer[n] are:
  • "Input"a sequence of vectors of size k
    "Output"a sequence of vectors of size n
  • Given an input sequence {x1,x2,,xT}, the LSTM outputs a sequence of states {s1,s2,,sT} using the following recurrence relation:
  • input gateit=LogisticSigmoid[Wix.xt+Wis.st-1+bi]
    output gateot=LogisticSigmoid[Wox.xt+Wos.st-1+bo]
    forget gateft=LogisticSigmoid[Wfx.xt+Wfs.st-1+bf]
    memory gatemt=Tanh[Wmx.xt+Wms.st-1+bm]
    cell statect=ft*ct-1+it*mt
    statest=ot*Tanh[ct]
  • LongShortTermMemoryLayer[n] has the following state ports:
  • "State"a vector of size n
    "CellState"a vector of size n
  • Within a NetGraph, a connection of the form src->NetPort[layer,"state"] can be used to provide the initial value of "State" or "CellState" for a LongShortTermMemoryLayer, corresponding to s0 and c0 in the recurrence relation. The default initial values are zero vectors.
  • Within a NetGraph, a connection of the form NetPort[layer,"state"]->dst can be used to obtain the final value of "State" or "CellState" for a LongShortTermMemoryLayer, corresponding to sT and cT in the recurrence relation.
  • An initialized LongShortTermMemoryLayer[] that operates on vectors of size k contains the following trainable arrays:
  • "InputGateInputWeights"Wixmatrix of size n×k
    "InputGateStateWeights"Wismatrix of size n×n
    "InputGateBiases"bivector of size n
    "OutputGateInputWeights"Woxmatrix of size n×k
    "OutputGateStateWeights"Wosmatrix of size n×n
    "OutputGateBiases"bovector of size n
    "ForgetGateInputWeights"Wfxmatrix of size n×k
    "ForgetGateStateWeights"Wfsmatrix of size n×n
    "ForgetGateBiases"bfvector of size n
    "MemoryGateInputWeights"Wmxmatrix of size n×k
    "MemoryGateStateWeights"Wmsmatrix of size n×n
    "MemoryGateBiases"bmvector of size n
  • In LongShortTermMemoryLayer[n,opts], initial values can be given to the trainable arrays using a rule of the form "array"->value.
  • LongShortTermMemoryLayer[n,"Dropout"->spec] indicates that dropout regularization should be applied during training, in which units are probabilistically set to zero.
  • Specifying "Dropout"->None disables dropout during training.
  • Specifying "Dropout"->p uses an automatically chosen dropout method having dropout probability p.
  • Specifying "Dropout"->{"method1"->p1,"method2"->p2,} can be used to combine specific methods of dropout with the corresponding dropout probabilities. Possible methods include:
  • "VariationalInput"dropout applied to the gate contributions from the input, using the same pattern of units at each sequence step
    "VariationalState"dropout applied to the gate contributions from the previous state, using the same pattern of units at each sequence step
    "StateUpdate"dropout applied to the state update vector prior to it being added to the previous state, using a different pattern of units at each sequence step
  • The dropout methods "VariationalInput" and "VariationalState" are based on the Gal et al. 2016 method, while "StateUpdate" is based on the Semeniuta et al. 2016 method.
  • LongShortTermMemoryLayer[n,"Input"->shape] allows the shape of the input to be specified. Possible forms for shape are:
  • NetEncoder[]encoder producing a sequence of vectors
    {len,p}sequence of len length-p vectors
    {len,Automatic}sequence of len vectors whose length is inferred
    {"Varying",p}varying number of vectors each of length p
    {"Varying",Automatic}varying number of vectors each of inferred length

Examples

open allclose all

Basic Examples  (2)

Create a LongShortTermMemoryLayer that produces a sequence of length-3 vectors:

In[1]:=
Click for copyable input
Out[1]=

Create a randomly initialized LongShortTermMemoryLayer that takes a sequence of length-2 vectors and produces a sequence of length-3 vectors:

In[1]:=
Click for copyable input
Out[1]=

Apply the layer to an input sequence:

In[2]:=
Click for copyable input
Out[2]//MatrixForm=

Scope  (4)

Options  (2)

Applications  (2)

See Also

BasicRecurrentLayer  GatedRecurrentLayer  NetMapOperator  SequenceLastLayer  LinearLayer  NetChain  NetGraph  NetExtract

Introduced in 2017
(11.1)