"SubwordTokens" (Net Decoder)

NetDecoder[NetEncoder[{"SubwordTokens", }]]

represents a decoder that converts a sequence of probability vectors to a string according to the specifications of the given "SubwordTokens" NetEncoder.


  • NetDecoder[][input] applies the decoder to an input to produce an output.
  • NetDecoder[][{input1,input2,}] applies the decoder to a list of inputs to produce a list of outputs.
  • The input to the decoder input is either a vector of probabilities or a sequence of probability vectors. Each probability vector sums to 1. The length of each probability vector is the number of elements in the token list of the parent NetEncoder.
  • For each input probability vector, the decoder outputs a token by picking the element of the token list with the highest associated probability.
  • NetDecoder[][input] returns a string.
  • Only the "BPE" method is currently supported. NetDecoder[NetEncoder[{"SubwordTokens", }]] will produce a "BPE" decoder regardless of the method of the parent encoder.
  • The suboption "WhitespaceTrimming" of the "BPE" method is inherited from the "WhitespacePadding" suboptions of the parent encoder, if present. When set to Left or Right, the decoder will trim a single whitespace from the beginning or the end of the output string, respectively, if present. When set to None, no trimming will be performed.
  • If the parent encoder does not support "WhitespacePadding", "WhitespaceTrimming" will be None.
  • Properties
  • NetDecoder[][data,prop] can be used to calculate a specific property for the input data.
  • When a "SubwordTokens" decoder is attached to a net, net[data,prop] or net[data,"oport"->prop] can be used to calculate a specific property of the decoded output.
  • The "SubwordTokens" decoder only supports the bypass property. Setting prop to None bypasses decoding and returns the input to the decoder.


Basic Examples  (2)

Create a BPE decoder:

Create a BPE decoder:

Decode a random sequence of probability vectors: