"SubwordTokens" (Net Decoder)
NetDecoder[NetEncoder[{"SubwordTokens",… }]]
represents a decoder that converts a sequence of probability vectors to a string according to the specifications of the given "SubwordTokens" NetEncoder.
Details
- NetDecoder[…][input] applies the decoder to an input to produce an output.
- NetDecoder[…][{input1,input2,…}] applies the decoder to a list of inputs to produce a list of outputs.
- The input to the decoder input is either a vector of probabilities or a sequence of probability vectors. Each probability vector sums to 1. The length of each probability vector is the number of elements in the token list of the parent NetEncoder.
- For each input probability vector, the decoder outputs a token by picking the element of the token list with the highest associated probability.
- NetDecoder[…][input] returns a string.
- Only the "BPE" method is currently supported. NetDecoder[NetEncoder[{"SubwordTokens",… }]] will produce a "BPE" decoder regardless of the method of the parent encoder.
- The suboption "WhitespaceTrimming" of the "BPE" method is inherited from the "WhitespacePadding" suboptions of the parent encoder, if present. When set to Left or Right, the decoder will trim a single whitespace from the beginning or the end of the output string, respectively, if present. When set to None, no trimming will be performed.
- If the parent encoder does not support "WhitespacePadding", "WhitespaceTrimming" will be None.
- NetDecoder[…][data,prop] can be used to calculate a specific property for the input data.
- When a "SubwordTokens" decoder is attached to a net, net[data,prop] or net[data,"oport"->prop] can be used to calculate a specific property of the decoded output.
- The "SubwordTokens" decoder only supports the bypass property. Setting prop to None bypasses decoding and returns the input to the decoder.