produces the elementary net of the folded net fnet, exposing the recurrent states.


  • A folded net is a net iterating over a sequence unidirectionally by repeating the same operation, such as recurrent nets and unidirectional transformers.
  • NetUnfold is typically used to extract the repeating operation, in order to efficiently generate sequences from a trained decoder that can be used in applications such as text and audio generation, text translation and more.
  • With a recurrent network with state equations and output equation for and training parameters , the unfolded net corresponds to just a single step of this recurrence and .
  • In particular, NetUnfold exposes the recurrent states of the following folded layers:
  • BasicRecurrentLayer[]one-state vector
    GatedRecurrentLayer[]one-state vector
    LongShortTermMemoryLayer[]two-state vectors, among which one internal cell state
    NetFoldOperator[net,{"out1""in1",,"outn""inn"},]n-state vectors
    AttentionLayer[,"Mask""Causal"]two-state sequences, which are the previous keys and values
  • Exposed states of recurrent layers are vectors that are typically initialized with zeros. Exposed states of transformers are sequences of vectors with a variable length, which are typically initialized with empty sequences.
  • NetUnfold can also be applied to a folded net that is followed by an operation on the last element of its output sequence. In such cases, the corresponding SequenceLastLayer is dropped.
  • NetUnfold can be seen as the inverse operation of NetFoldOperator.


open allclose all

Basic Examples  (1)

Get the core operation folded in a GatedRecurrentLayer:

Scope  (5)

Unfold a single recurrent layer:

Unfold an attention layer with causal masking:

Unfold a chain of recurrent operations:

Unfold a recurrent net model:

Unfold a transformer model:

Applications  (1)

Implementing efficient text generation. First, get a trained language model:

The most straightforward function to stochastically generate text is the following:

The problem of this function is that it has quadratic time complexity, because the model is fed several times with the same input:

NetUnfold permits you to avoid recomputing the same activations twice, by exposing the states:

Write an efficient stochastic text generation based on this unfolded net:

This efficient text generation has linear time complexity:

Properties & Relations  (2)

NetUnfold is the inverse operation of NetFoldOperator:

Any SequenceLastLayer after a recursion is automatically removed:

Wolfram Research (2021), NetUnfold, Wolfram Language function,


Wolfram Research (2021), NetUnfold, Wolfram Language function,


Wolfram Language. 2021. "NetUnfold." Wolfram Language & System Documentation Center. Wolfram Research.


Wolfram Language. (2021). NetUnfold. Wolfram Language & System Documentation Center. Retrieved from


@misc{reference.wolfram_2022_netunfold, author="Wolfram Research", title="{NetUnfold}", year="2021", howpublished="\url{}", note=[Accessed: 03-July-2022 ]}


@online{reference.wolfram_2022_netunfold, organization={Wolfram Research}, title={NetUnfold}, year={2021}, url={}, note=[Accessed: 03-July-2022 ]}