ToCharacterCode

ToCharacterCode["string"]

gives a list of the integer codes corresponding to the characters in a string.

ToCharacterCode["string","encoding"]

gives integer codes according to the specified encoding.

Details

  • ToCharacterCode handles both ordinary and special characters.
  • ToCharacterCode["string"] and ToCharacterCode["string","Unicode"] return standard internal character codes used by the Wolfram Language, which are the same on all computer systems.
  • For characters on an ordinary American English keyboard, the character codes follow the ASCII standard.
  • For common European languages, they follow the ISO Latin1 standard.
  • For other characters, they follow the Unicode standard.
  • The Wolfram System defines various additional characters in private Unicode space, with character codes between 57344 and 63743.
  • Character codes returned by ToCharacterCode["string"] lie between 0 and 1114112.
  • Encodings supported in ToCharacterCode["string","encoding"] include the values in $CharacterEncodings in addition to "Unicode".
  • If a particular character has no character code in a given encoding, ToCharacterCode returns None in place of a character code.
  • ToCharacterCode[{"s1","s2",}] gives a list of the lists of integer codes for each of the si.

Examples

open allclose all

Basic Examples  (2)

Find ASCII or Unicode character codes:

Reassemble a string from character codes:

Get the byte values corresponding to the UTF8 encoding of a string:

Reassemble the string from its UTF8 encoding:

Scope  (3)

Find the code points of several strings:

Use a character encoding:

Get the codes of all printable ASCII characters:

Some ISO Latin-1 letters:

Some characters in the private use area:

Properties & Relations  (5)

ToCharacterCode always returns a list:

The default encoding is "Unicode":

If a particular character code does not exist in the specified encoding, None is returned for it:

FromCharacterCode is the inverse of ToCharacterCode:

This is also true for lists of strings:

The allowed values of the second argument are given by $CharacterEncodings:

Neat Examples  (1)

"Plot" a string:

Introduced in 1991
 (2.0)
 |
Updated in 1996
 (3.0)
1999
 (4.0)
2007
 (6.0)
2019
 (12.0)