"UTF8" (Net Encoder)

NetEncoder["UTF8"]

represents an encoder that converts a string to a sequence of integers corresponding to the UTF-8 encoding of its characters.

NetEncoder[{"UTF8",form}]

represents an encoder that converts a string to the output type form according to the UTF-8 encoding of its characters.

Details

  • NetEncoder[][input] applies the encoder to an input string to produce an output.
  • NetEncoder[][{input1,input2,}] applies the encoder to a list of input strings to produce a list of outputs.
  • When form is "Index" (the default), the output of the encoder consists of integer codes in the range 1 to 248 corresponding to characters in the input string. One character can produce multiple integers.
  • When form is "UnitVector", the output of the encoder consists of 248-dimensional unit vectors, where the i^(th) vector is in the pi^(th) direction, where pi is the code corresponding to the i^(th) character.
  • An encoder can be attached to an input port of a net by specifying "port"->NetEncoder[] when constructing the net.

Examples

open allclose all

Basic Examples  (1)

Create a UTF-8 encoder:

Encode a string of characters:

Scope  (1)

Create a UTF-8 encoder that returns unit vectors:

Encode a string of characters:

Encode a string of non-ASCII characters:

Properties & Relations  (1)

NetEncoder["UTF8"][input] is equivalent to ToCharacterCode[input,"UTF8"]+1:

Possible Issues  (1)

A UTF-8 encoder will encode some Unicode characters using multiple integers: