ToCharacterCode

ToCharacterCode["string"]

gives a list of the integer codes corresponding to the characters in a string.

ToCharacterCode["string","encoding"]

gives integer codes according to the specified encoding.

Details

  • ToCharacterCode handles both ordinary and special characters.
  • ToCharacterCode["string"] and ToCharacterCode["string","Unicode"] return standard internal character codes used by the Wolfram Language, which are the same on all computer systems.
  • For characters on an ordinary American English keyboard, the character codes follow the ASCII standard.
  • For common European languages, they follow the ISO Latin1 standard.
  • For other characters, they follow the Unicode standard.
  • The Wolfram System defines various additional characters in private Unicode space, with character codes between 57344 and 63743.
  • Character codes returned by ToCharacterCode["string"] lie between 0 and 1114111.
  • Encodings supported in ToCharacterCode["string","encoding"] include the values in $CharacterEncodings in addition to "Unicode".
  • If a particular character has no character code in a given encoding, ToCharacterCode returns None in place of a character code.
  • ToCharacterCode[{"s1","s2",}] gives a list of the lists of integer codes for each of the si.

Examples

open allclose all

Basic Examples  (2)

Find ASCII or Unicode character codes:

Reassemble a string from character codes:

Get the byte values corresponding to the UTF8 encoding of a string:

Reassemble the string from its UTF8 encoding:

Scope  (3)

Find the code points of several strings:

Use a character encoding:

Get the codes of all printable ASCII characters:

Some ISO Latin-1 letters:

Some characters in the private use area:

Some emojis:

Properties & Relations  (6)

ToCharacterCode always returns a list:

The default encoding is "Unicode":

If a particular character code does not exist in the specified encoding, None is returned for it:

FromCharacterCode is the inverse of ToCharacterCode:

This is also true for lists of strings:

The allowed values of the second argument are given by $CharacterEncodings:

Some characters are composed of multiple codes:

Neat Examples  (1)

"Plot" a string:

Wolfram Research (1991), ToCharacterCode, Wolfram Language function, https://reference.wolfram.com/language/ref/ToCharacterCode.html (updated 2019).

Text

Wolfram Research (1991), ToCharacterCode, Wolfram Language function, https://reference.wolfram.com/language/ref/ToCharacterCode.html (updated 2019).

CMS

Wolfram Language. 1991. "ToCharacterCode." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2019. https://reference.wolfram.com/language/ref/ToCharacterCode.html.

APA

Wolfram Language. (1991). ToCharacterCode. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/ToCharacterCode.html

BibTeX

@misc{reference.wolfram_2023_tocharactercode, author="Wolfram Research", title="{ToCharacterCode}", year="2019", howpublished="\url{https://reference.wolfram.com/language/ref/ToCharacterCode.html}", note=[Accessed: 18-March-2024 ]}

BibLaTeX

@online{reference.wolfram_2023_tocharactercode, organization={Wolfram Research}, title={ToCharacterCode}, year={2019}, url={https://reference.wolfram.com/language/ref/ToCharacterCode.html}, note=[Accessed: 18-March-2024 ]}