Raw Character Encodings

Mathematica always allows you to refer to special characters by using names such as or explicit hexadecimal codes such as . And when Mathematica writes out files, it by default uses these names or hexadecimal codes.

But sometimes you may find it convenient to use raw encodings for at least some special characters. What this means is that rather than representing special characters by names or explicit hexadecimal codes, you instead represent them by raw bit patterns appropriate for a particular computer system or particular font.

$CharacterEncoding=Noneuse printable ASCII names for all special characters
$CharacterEncoding="name"use the raw character encoding specified by name
$SystemCharacterEncodingthe default raw character encoding for your particular computer system

Setting up raw character encodings.

When you press a key or combination of keys on your keyboard, the operating system of your computer sends a certain bit pattern to Mathematica. How this bit pattern is interpreted as a character within Mathematica will depend on the character encoding that has been set up.

The notebook front end for Mathematica typically takes care of setting up the appropriate character encoding automatically for whatever font you are using. But if you use Mathematica with a text-based interface or via files or pipes, then you may need to set $CharacterEncoding explicitly.

By specifying an appropriate value for $CharacterEncoding you will typically be able to get Mathematica to handle raw text generated by whatever language-specific text editor or operating system you use.

You should realize, however, that while the standard representation of special characters used in Mathematica is completely portable across different computer systems, any representation that involves raw character encodings will inevitably not be.

"PrintableASCII"printable ASCII characters only
"ASCII"all ASCII including control characters
"ISOLatin1"characters for common western European languages
"ISOLatin2"characters for central and eastern European languages
"ISOLatin3"characters for additional European languages (e.g. Catalan, Turkish)
"ISOLatin4"characters for other additional European languages (e.g. Estonian, Lappish)
"ISOLatinCyrillic"English and Cyrillic characters
"AdobeStandard"Adobe standard PostScript font encoding
"MacintoshRoman"Macintosh roman font encoding
"WindowsANSI"Windows standard font encoding
"Symbol"symbol font encoding
"ZapfDingbats"Zapf dingbats font encoding
"ShiftJIS"shift-JIS for Japanese (mixture of 8- and 16-bit)
"EUC"extended Unix code for Japanese (mixture of 8- and 16-bit)
"UTF-8"Unicode transformation format encoding
"Unicode"raw 16-bit Unicode bit patterns

Some raw character encodings supported by Mathematica.

Mathematica knows about various raw character encodings, appropriate for different computer systems and different languages. Copying of characters between the Mathematica notebook interface and user interface environment on your computer generally uses the native character encoding for that environment. Mathematica characters which are not included in the native encoding will be written out using standard Mathematica full names or hexadecimal codes.

The Mathematica kernel can use any character encoding you specify when it writes or reads text files. By default, Put and PutAppend produce an ASCII representation for reliable portability of Mathematica language files from one system to another.

This writes a string to the file .
In[1]:=
Click for copyable input
Special characters are written out using full names or explicit hexadecimal codes.
In[2]:=
Click for copyable input
Out[2]=

Mathematica supports both 8- and 16-bit raw character encodings. In an encoding such as , all characters are represented by bit patterns containing 8 bits. But in an encoding such as some characters instead involve bit patterns containing 16 bits.

Most of the raw character encodings supported by Mathematica include basic ASCII as a subset. This means that even when you are using such encodings, you can still give ordinary Mathematica input in the usual way, and you can specify special characters using and sequences.

Some raw character encodings, however, do not include basic ASCII as a subset. An example is the encoding, in which the character codes normally used for and are instead used for and .

This gives the usual ASCII character codes for a few English letters.
In[3]:=
Click for copyable input
Out[3]=
In the encoding, these character codes are used for Greek letters.
In[4]:=
Click for copyable input
Out[4]=
ToCharacterCode["string"]generate codes for characters using the standard Mathematica encoding
ToCharacterCode["string","encoding"]generate codes for characters using the specified encoding
FromCharacterCode[{n1,n2,...}]generate characters from codes using the standard Mathematica encoding
FromCharacterCode[{n1,n2,...},"encoding"]
generate characters from codes using the specified encoding

Handling character codes with different encodings.

This gives the codes assigned to various characters by Mathematica.
In[5]:=
Click for copyable input
Out[5]=
Here are the codes assigned to the same characters in the Macintosh roman encoding.
In[6]:=
Click for copyable input
Out[6]=
Here are the codes in the Windows standard encoding. There is no code for \[Pi] in that encoding.
In[7]:=
Click for copyable input
Out[7]=

The character codes used internally by Mathematica are based on Unicode. But externally Mathematica by default always uses plain ASCII sequences such as or to refer to special characters. By telling it to use the raw character encoding, however, you can get Mathematica to read and write characters in raw 16-bit Unicode form.

New to Mathematica? Find your learning path »
Have a question? Ask support »