LongShortTermMemoryLayer
represents a trainable recurrent layer that takes a sequence of vectors and produces a sequence of vectors, each of size n.
LongShortTermMemoryLayer[n,opts]
includes options for weights and other parameters.
Details and Options
- LongShortTermMemoryLayer[n] represents a net that takes an input matrix representing a sequence of vectors and outputs a sequence of the same length.
- Each element of the input sequence is a vector of size k, and each element of the output sequence is a vector of size n.
- The size k of the input vectors is usually inferred automatically within a NetGraph, NetChain, etc.
- The input and output ports of the net represented by LongShortTermMemoryLayer[n] are:
-
"Input" a sequence of vectors of size k "Output" a sequence of vectors of size n - Given an input sequence {x1,x2,…,xT}, the LSTM outputs a sequence of states {s1,s2,…,sT} using the following recurrence relation:
-
input gate it=LogisticSigmoid[Wix.xt+Wis.st-1+bi] output gate ot=LogisticSigmoid[Wox.xt+Wos.st-1+bo] forget gate ft=LogisticSigmoid[Wfx.xt+Wfs.st-1+bf] memory gate mt=Tanh[Wmx.xt+Wms.st-1+bm] cell state ct=ft*ct-1+it*mt state st=ot*Tanh[ct] - LongShortTermMemoryLayer[n] has the following state ports:
-
"State" a vector of size n "CellState" a vector of size n - Within a NetGraph, a connection of the form src->NetPort[layer,"state"] can be used to provide the initial value of "State" or "CellState" for a LongShortTermMemoryLayer, corresponding to s0 and c0 in the recurrence relation. The default initial values are zero vectors.
- Within a NetGraph, a connection of the form NetPort[layer,"state"]->dst can be used to obtain the final value of "State" or "CellState" for a LongShortTermMemoryLayer, corresponding to sT and cT in the recurrence relation.
- NetStateObject can be used to create a net that will remember values for the state of LongShortTermMemoryLayer that update when the net is applied to inputs.
- An initialized LongShortTermMemoryLayer[…] that operates on vectors of size k contains the following trainable arrays:
-
"InputGateInputWeights" Wix matrix of size n×k "InputGateStateWeights" Wis matrix of size n×n "InputGateBiases" bi vector of size n "OutputGateInputWeights" Wox matrix of size n×k "OutputGateStateWeights" Wos matrix of size n×n "OutputGateBiases" bo vector of size n "ForgetGateInputWeights" Wfx matrix of size n×k "ForgetGateStateWeights" Wfs matrix of size n×n "ForgetGateBiases" bf vector of size n "MemoryGateInputWeights" Wmx matrix of size n×k "MemoryGateStateWeights" Wms matrix of size n×n "MemoryGateBiases" bm vector of size n - In LongShortTermMemoryLayer[n,opts], initial values can be given to the trainable arrays using a rule of the form "array"->value.
- The following training parameters can be included:
-
"Dropout" None dropout regularization, in which units are probabilistically set to zero LearningRateMultipliers Automatic learning rate multipliers for the trainable arrays - Specifying "Dropout"->None disables dropout during training.
- Specifying "Dropout"->p uses an automatically chosen dropout method having dropout probability p.
- Specifying "Dropout"->{"method1"->p1,"method2"->p2,…} can be used to combine specific methods of dropout with the corresponding dropout probabilities. Possible methods include:
-
"VariationalWeights" dropout applied to the recurrent connections between weight matrices (default) "VariationalInput" dropout applied to the gate contributions from the input, using the same pattern of units at each sequence step "VariationalState" dropout applied to the gate contributions from the previous state, using the same pattern of units at each sequence step "StateUpdate" dropout applied to the state update vector prior to it being added to the previous state, using a different pattern of units at each sequence step - The dropout methods "VariationalInput" and "VariationalState" are based on the Gal et al. 2016 method, while "StateUpdate" is based on the Semeniuta et al. 2016 method and "VariationalWeights" is based on the Merity et al. 2017 method.
- LongShortTermMemoryLayer[n,"Input"->shape] allows the shape of the input to be specified. Possible forms for shape are:
-
NetEncoder[…] encoder producing a sequence of vectors {len,k} sequence of len length-k vectors {len,Automatic} sequence of len vectors whose length is inferred {"Varying",k} varying number of vectors each of length k {"Varying",Automatic} varying number of vectors each of inferred length - When given a NumericArray as input, the output will be a NumericArray.
- Options[LongShortTermMemoryLayer] gives the list of default options to construct the layer. Options[LongShortTermMemoryLayer[…]] gives the list of default options to evaluate the layer on some data.
- Information[LongShortTermMemoryLayer[…]] gives a report about the layer.
- Information[LongShortTermMemoryLayer[…],prop] gives the value of the property prop of LongShortTermMemoryLayer[…]. Possible properties are the same as for NetGraph.
Examples
open allclose allBasic Examples (2)
Create a LongShortTermMemoryLayer that produces a sequence of length-3 vectors:
Create a randomly initialized LongShortTermMemoryLayer that takes a sequence of length-2 vectors and produces a sequence of length-3 vectors:
Scope (4)
Create a randomly initialized LongShortTermMemoryLayer that takes a string and produces a sequence of length-2 vectors:
Apply the layer to an input string:
Thread the layer over a batch of inputs:
Create a randomly initialized net that takes a sequence of length-2 vectors and produces a single length-3 vector:
Thread the layer across a batch of inputs:
Create a NetGraph that allows the initial state and cell state of a LongShortTermMemoryLayer to be set:
Create a NetGraph that allows the final state and cell state of a LongShortTermMemoryLayer to be obtained:
Options (2)
"Dropout" (2)
Create a LongShortTermMemoryLayer with the dropout method specified:
Create a randomly initialized LongShortTermMemoryLayer with specified dropout probability:
Evaluate the layer on a sequence of vectors:
Dropout has no effect during evaluation:
Use NetEvaluationMode to force the training behavior of dropout:
Multiple evaluations on the same input can give different results:
Applications (2)
Create training data consisting of strings that describe two-digit additions and the corresponding numeric result:
Create a network using stacked LongShortTermMemoryLayer layers that reads the input string and predicts the numeric result:
Apply the trained network to a list of inputs:
Create training data based on strings containing x's and y's and either Less, Greater or Equal by comparing the number of x's and y's. The training data consists of all possible sentences up to length 8:
Create a network containing a LongShortTermMemoryLayer to read an input string and predict one of Less, Greater or Equal:
Properties & Relations (1)
NetStateObject can be used to create a net that remembers the state of LongShortTermMemoryLayer:
Each evaluation modifies the state stored inside the NetStateObject:
Text
Wolfram Research (2017), LongShortTermMemoryLayer, Wolfram Language function, https://reference.wolfram.com/language/ref/LongShortTermMemoryLayer.html (updated 2020).
CMS
Wolfram Language. 2017. "LongShortTermMemoryLayer." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2020. https://reference.wolfram.com/language/ref/LongShortTermMemoryLayer.html.
APA
Wolfram Language. (2017). LongShortTermMemoryLayer. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/LongShortTermMemoryLayer.html