"Characters" (Net Encoder)

NetEncoder["Characters"]

represents an encoder that converts characters in an ASCII string to a sequence of integer codes.

NetEncoder[{"Characters",table}]

represents an encoder that converts characters in a string composed of characters in the list table.

NetEncoder[{"Characters",table,form}]

represents an encoder that converts characters in a string to the output type form.

NetEncoder[{"Characters",,"param"value,}]

represents an encoder in which additional parameters have been specified.

Details

  • NetEncoder[][input] applies the encoder to a string to produce an output.
  • NetEncoder[][{input1,input2,}] applies the encoder to a list of strings to produce a list of outputs.
  • The mapping from characters to codes specified by table can have the following forms:
  • "c1c2"map each character ci to successive available codes
    "c1c2"nmap all characters ci to code n
    "c1c2"Automaticmap all characters ci to the next available code
    n;;mspecmap characters between n and m to spec
    {spec1,spec2,}assign codes in sequence from the speci
  • The following symbolic character groups can be used in the table:
  • Automaticall printable ASCII characters, plus space, tab and newline
    LetterCharacterthe letters a through z and A through Z
    DigitCharacterthe digits 0 through 9
    WordCharacterthe union of LetterCharacter and DigitCharacter
    PunctuationCharacterall visible ASCII punctuation characters
    WhitespaceCharacterspace, tab and newline
    StartOfStringvirtual character that occurs before the beginning of the string
    EndOfStringvirtual character that occurs after the end of the string
    _any otherwise unassigned character
  • NetEncoder["Characters"] is suitable for typical English prose and consists of all printable ASCII characters, as well as tab, space and newline.
  • NetEncoder["Characters"] is equivalent to NetEncoder[{"Characters",{"\t","\n",FromCharacterCode[Range[32,126]]}}].
  • When form is "Index" (the default), the output of the encoder consists of integer codes corresponding to characters in the input string.
  • When form is "UnitVector", the output of the encoder consists of n-dimensional unit vectors, where the i^(th) vector is in the pi^(th) direction, where pi is the code corresponding to the i^(th) character.
  • An encoder can be attached to an input port of a net by specifying "port"->NetEncoder[] when constructing the net.
  • NetEncoder[{"Characters",}][["Alphabet"]] produces a list of the characters recognized by the encoder.
  • NetDecoder[NetEncoder[{"Characters",}]] produces a NetDecoder[{"Characters",}] with the same encoding as the given encoder.
  • Parameters
  • With the parameter "IgnoreCase"True, uppercase and lowercase letters will be encoded to the same value. The default value is "IgnoreCase"False.
  • With the default parameter setting "TargetLength"->All, all characters found in the input string are encoded.
  • With the parameter "TargetLength"->n, the first n tokens found in the input string are encoded, with padding applied if fewer than n tokens are found. If EndOfString is present in the token list, the padding value is the integer code associated with it; otherwise, the code associated with the last token is used.

Examples

open allclose all

Basic Examples  (1)

Create a character encoder:

Encode a string of characters:

Scope  (7)

Use the default character encoder to encode a string:

For the default character encoder, non-ASCII letters trigger an error:

Create an encoder that sends unknown characters to a special code:

Specify that the sequence should be padded or trimmed to be 6 elements long:

Specify that case should not matter:

Give a specific alphabet for a character encoder:

Encode to unit vectors instead:

Map sets of characters to single codes:

Map sets of characters to successive codes:

Introduce extra codes for the start and end of the string:

Properties & Relations  (2)

Extract the list of characters recognized by the default "Characters" encoder:

Produce a NetDecoder from a NetEncoder:

Introduced in 2018
 (11.3)
 |
Updated in 2019
 (12.0)