"Characters" (Net Encoder)
NetEncoder["Characters"]
represents an encoder that converts characters in an ASCII string to a sequence of integer codes.
NetEncoder[{"Characters",table}]
represents an encoder that converts characters in a string composed of characters in the list table.
NetEncoder[{"Characters",table,form}]
represents an encoder that converts characters in a string to the output type form.
NetEncoder[{"Characters",…,"param"value,…}]
represents an encoder in which additional parameters have been specified.
Details
- NetEncoder[…][input] applies the encoder to a string to produce an output.
- NetEncoder[…][{input1,input2,…}] applies the encoder to a list of strings to produce a list of outputs.
- The mapping from characters to codes specified by table can have the following forms:
-
"c1c2…" map each character ci to successive available codes "c1c2…"n map all characters ci to code n "c1c2…"Automatic map all characters ci to the next available code n;;mspec map characters between n and m to spec {spec1,spec2,…} assign codes in sequence from the speci - The following symbolic character groups can be used in the table:
-
Automatic all printable ASCII characters, plus space, tab and newline LetterCharacter the letters a through z and A through Z DigitCharacter the digits 0 through 9 WordCharacter the union of LetterCharacter and DigitCharacter PunctuationCharacter all visible ASCII punctuation characters WhitespaceCharacter space, tab and newline StartOfString virtual character that occurs before the beginning of the string EndOfString virtual character that occurs after the end of the string _ any otherwise unassigned character - NetEncoder["Characters"] is suitable for typical English prose and consists of all printable ASCII characters, as well as tab, space and newline.
- NetEncoder["Characters"] is equivalent to NetEncoder[{"Characters",{"\t","\n",FromCharacterCode[Range[32,126]]}}].
- When form is "Index" (the default), the output of the encoder consists of integer codes corresponding to characters in the input string.
- When form is "UnitVector", the output of the encoder consists of n-dimensional unit vectors, where the i vector is in the pi direction, where pi is the code corresponding to the i character.
- An encoder can be attached to an input port of a net by specifying "port"->NetEncoder[…] when constructing the net.
- NetEncoder[{"Characters",…}][["Alphabet"]] produces a list of the characters recognized by the encoder.
- NetDecoder[NetEncoder[{"Characters",…}]] produces a NetDecoder[{"Characters",…}] with the same encoding as the given encoder.
- With the parameter "IgnoreCase"True, uppercase and lowercase letters will be encoded to the same value. The default value is "IgnoreCase"False.
- With the default parameter setting "TargetLength"->All, all characters found in the input string are encoded.
- With the parameter "TargetLength"->n, the first n tokens found in the input string are encoded, with padding applied if fewer than n tokens are found. If EndOfString is present in the token list, the padding value is the integer code associated with it; otherwise, the code associated with the last token is used.
Parameters
Examples
open allclose allScope (7)
Use the default character encoder to encode a string:
For the default character encoder, non-ASCII letters trigger an error:
Create an encoder that sends unknown characters to a special code:
Specify that the sequence should be padded or trimmed to be 6 elements long:
Specify that case should not matter:
Give a specific alphabet for a character encoder:
Encode to unit vectors instead:
Map sets of characters to single codes:
Properties & Relations (2)
Extract the list of characters recognized by the default "Characters" encoder:
Produce a NetDecoder from a NetEncoder: