Raw Character Encodings

The Wolfram Language always allows you to refer to special characters by using names such as [Alpha] or explicit hexadecimal codes such as :03b1. And when the Wolfram Language writes out files, it by default uses these names or hexadecimal codes.

But sometimes you may find it convenient to use raw encodings for at least some special characters. What this means is that rather than representing special characters by names or explicit hexadecimal codes, you instead represent them by raw bit patterns appropriate for a particular computer system or particular font.

$CharacterEncoding=Noneuse printable ASCII names for all special characters
$CharacterEncoding="name"use the raw character encoding specified by name
$SystemCharacterEncodingthe default raw character encoding for your particular computer system

Setting up raw character encodings.

When you press a key or combination of keys on your keyboard, the operating system of your computer sends a certain bit pattern to the Wolfram System. How this bit pattern is interpreted as a character within the Wolfram System will depend on the character encoding that has been set up.

The notebook front end for the Wolfram System typically takes care of setting up the appropriate character encoding automatically for whatever font you are using. But if you use the Wolfram System with a textbased interface or via files or pipes, then you may need to set $CharacterEncoding explicitly.

By specifying an appropriate value for $CharacterEncoding you will typically be able to get the Wolfram Language to handle raw text generated by whatever languagespecific text editor or operating system you use.

You should realize, however, that while the standard representation of special characters used in the Wolfram Language is completely portable across different computer systems, any representation that involves raw character encodings will inevitably not be.

"PrintableASCII"printable ASCII characters only
"ASCII"all ASCII including control characters
"ISOLatin1"characters for common western European languages
"ISOLatin2"characters for central and eastern European languages
"ISOLatin3"characters for additional European languages (e.g. Catalan, Turkish)
"ISOLatin4"characters for other additional European languages (e.g. Estonian, Lappish)
"ISOLatinCyrillic"English and Cyrillic characters
"AdobeStandard"Adobe standard PostScript font encoding
"MacintoshRoman"Macintosh roman font encoding
"WindowsANSI"Windows standard font encoding
"Symbol"symbol font encoding
"ZapfDingbats"Zapf dingbats font encoding
"ShiftJIS"shiftJIS for Japanese (mixture of 8 and 16bit)
"EUC"extended Unix code for Japanese (mixture of 8 and 16bit)
"UTF8"Unicode transformation format encoding
"Unicode"raw 16bit Unicode bit patterns

Some raw character encodings supported by the Wolfram Language.

The Wolfram System knows about various raw character encodings, appropriate for different computer systems and different languages. Copying of characters between the Wolfram System notebook interface and user interface environment on your computer generally uses the native character encoding for that environment. Wolfram Language characters which are not included in the native encoding will be written out using standard Wolfram Language full names or hexadecimal codes.

The Wolfram Language kernel can use any character encoding you specify when it writes or reads text files. By default, Put and PutAppend produce an ASCII representation for reliable portability of Wolfram Language files from one system to another.

This writes a string to the file tmp:
Click for copyable input
Special characters are written out using full names or explicit hexadecimal codes:
Click for copyable input

The Wolfram Language supports both 8 and 16bit raw character encodings. In an encoding such as "ISOLatin1", all characters are represented by bit patterns containing 8 bits. But in an encoding such as "ShiftJIS" some characters instead involve bit patterns containing 16 bits.

Most of the raw character encodings supported by the Wolfram Language include basic ASCII as a subset. This means that even when you are using such encodings, you can still give ordinary Wolfram Language input in the usual way, and you can specify special characters using [ and : sequences.

Some raw character encodings, however, do not include basic ASCII as a subset. An example is the "Symbol" encoding, in which the character codes normally used for a and b are instead used for and .

This gives the usual ASCII character codes for a few English letters:
Click for copyable input
In the "Symbol" encoding, these character codes are used for Greek letters:
Click for copyable input
ToCharacterCode["string"]generate codes for characters using the standard Wolfram Language encoding
ToCharacterCode["string","encoding"]generate codes for characters using the specified encoding
FromCharacterCode[{n1,n2,}]generate characters from codes using the standard Wolfram Language encoding
generate characters from codes using the specified encoding

Handling character codes with different encodings.

This gives the codes assigned to various characters by the Wolfram Language:
Click for copyable input
Here are the codes assigned to the same characters in the Macintosh roman encoding:
Click for copyable input
Here are the codes in the Windows standard encoding. There is no code for [Pi] in that encoding:
Click for copyable input

The character codes used internally by the Wolfram Language are based on Unicode. But externally the Wolfram Language by default always uses plain ASCII sequences such as [Name] or :nnnn to refer to special characters. By telling it to use the raw "Unicode" character encoding, however, you can get the Wolfram Language to read and write characters in raw 16bit Unicode form.