represents a trainable net layer that generates a sequence of weighted sums of its input vectors using a sequence of query vectors fed to it.


computes each output using net to weight the inputs.

Details and Options

  • SequenceAttentionLayer[net] represents a net that takes two sequences of vectors and outputs another sequence of vectors.
  • SequenceAttentionLayer[net] takes an input sequence {e1,e2,,en} and a query sequence {q1,q2,,qm}, and uses a query element q to produce the corresponding output element o=wiei, where w=softmax(s) is a weight vector and si=f(ei,q) is a scalar score computed by the subnet net.
  • The above definition of SequenceAttentionLayer is based on the variant described in Bahdanau et al., Neural Machine Translation by Jointly Learning to Align and Translate, 2015.
  • In SequenceAttentionLayer[net], the scoring network net can be one of:
  • "Dot"a NetGraph computing s=Dot[e,q]
    "Bilinear"a NetGraph computing s=Dot[e,W,q] where W is a learnable matrix
    NetGraph[]a specific NetGraph that takes "Input" and "Query" values and produces a scalar "Output" value
  • NetExtract can be used to extract net from a SequenceAttentionLayer[net] object.
  • SequenceAttentionLayer is typically used inside NetGraph.
  • SequenceAttentionLayer exposes the following ports for use in NetGraph etc.:
  • "Input"a sequence of vectors {e1,e2,,en} of size d1
    "Query"a sequence of vectors {q1,q2,,qm} of size d2
    "Output"a sequence of vectors {o1,o2,,om} of size d1
  • SequenceAttentionLayer[][<|"Input"in,"Query"query|>] explicitly computes the output from applying the layer.
  • SequenceAttentionLayer[][<|"Input"->{in1,in2,},"Query"->{query1,query2,}|>] explicitly computes outputs for each of the ini and queryi.
  • The size of the input and query vectors is usually inferred automatically within a NetGraph.
  • SequenceAttentionLayer[,"Input"->shape1,"Query"->shape2] allows the shape of the inputs to be specified. Possible forms for shapei are:
  • {len,p}sequence of len length-p vectors
    {len,Automatic}sequence of len vectors whose length is inferred
    {"Varying",p}varying number of vectors each of length p
    {"Varying",Automatic}varying number of vectors each of inferred length


open allclose all

Basic Examples  (2)

Create a SequenceAttentionLayer:

Click for copyable input

Create a randomly initialized SequenceAttentionLayer that takes a sequence of input vectors of size 2 and a sequence of query vectors of size 1:

Click for copyable input

Apply the layer to an input:

Click for copyable input

The layer threads across a batch of sequences of different lengths:

Click for copyable input

Scope  (2)

Applications  (1)

Properties & Relations  (1)

Possible Issues  (1)

See Also

BasicRecurrentLayer  LongShortTermMemoryLayer  GatedRecurrentLayer  NetChain  NetGraph  NetExtract

Introduced in 2017