**2.7.8 Advanced Topic: Raw Character Encodings**

Mathematica always allows you to refer to special characters by using names such as \[Alpha] or explicit hexadecimal codes such as \:03b1. And when Mathematica writes out files, it by default uses these names or hexadecimal codes.

But sometimes you may find it convenient to use raw encodings for at least some special characters. What this means is that rather than representing special characters by names or explicit hexadecimal codes, you instead represent them by raw bit patterns appropriate for a particular computer system or particular font.

Setting up raw character encodings.

When you press a key or combination of keys on your keyboard, the operating system of your computer sends a certain bit pattern to Mathematica. How this bit pattern is interpreted as a character within Mathematica will depend on the character encoding that has been set up.

The notebook front end for Mathematica typically takes care of setting up the appropriate character encoding automatically for whatever font you are using. But if you use Mathematica with a text-based interface or via files or pipes, then you may need to set $CharacterEncoding explicitly.

By specifying an appropriate value for $CharacterEncoding you will typically be able to get Mathematica to handle raw text generated by whatever language-specific text editor or operating system you use.

You should realize, however, that while the standard representation of special characters used in Mathematica is completely portable across different computer systems, any representation that involves raw character encodings will inevitably not be.

Some raw character encodings supported by *Mathematica*.

Mathematica knows about various raw character encodings, appropriate for different computer systems and different languages.

Any character that is included in a particular raw encoding will be written out in raw form by Mathematica if you specify that encoding. But characters which are not included in the encoding will still be written out using standard Mathematica full names or hexadecimal codes.

In addition, any character included in a particular encoding can be given in raw form as input to Mathematica if you specify that encoding. Mathematica will automatically translate the character to its own standard internal form.

This writes a string to the file tmp.
In[1]:= **"a b c \[EAcute] \[Alpha] \[Pi] \:2766" >> tmp**

Special characters are by default written out using full names or explicit hexadecimal codes.
In[2]:= **!!tmp**

"a b c é â¦"

This tells Mathematica to use a raw character encoding appropriate for Macintosh roman fonts.
In[3]:= **$CharacterEncoding = "MacintoshRoman"**

Out[3]=

Now those special characters that can will be written out in raw form.
In[4]:= **"a b c \[EAcute] \[Alpha] \[Pi] \:2766" >> tmp**

You can only read the raw characters if you have a system that uses the Macintosh roman encoding.
In[5]:= **!!tmp**

"a b c

â¦"

This tells Mathematica to use no raw encoding by default.
In[6]:= **$CharacterEncoding = None**

Out[6]=

You can still explicitly request raw encodings to be used in certain functions.
In[7]:= **Get["tmp", CharacterEncoding->"MacintoshRoman"]**

Out[7]=

Mathematica supports both 8- and 16-bit raw character encodings. In an encoding such as "ISOLatin1", all characters are represented by bit patterns containing 8 bits. But in an encoding such as "ShiftJIS" some characters instead involve bit patterns containing 16 bits.

Most of the raw character encodings supported by Mathematica include basic ASCII as a subset. This means that even when you are using such encodings, you can still give ordinary Mathematica input in the usual way, and you can specify special characters using \[ and \: sequences.

Some raw character encodings, however, do not include basic ASCII as a subset. An example is the "Symbol" encoding, in which the character codes normally used for a and b are instead used for and .

This gives the usual ASCII character codes for a few English letters.
In[8]:= **ToCharacterCode["abcdefgh"]**

Out[8]=

In the "Symbol" encoding, these character codes are used for Greek letters.
In[9]:= **FromCharacterCode[%, "Symbol"]**

Out[9]=

Handling character codes with different encodings.

This gives the codes assigned to various characters by Mathematica.
In[10]:= **ToCharacterCode["abc\[EAcute]\[Pi]"]**

Out[10]=

Here are the codes assigned to the same characters in the Macintosh roman encoding.
In[11]:= **ToCharacterCode["abc\[EAcute]\[Pi]", "MacintoshRoman"]**

Out[11]=

Here are the codes in the Windows standard encoding. There is no code for \[Pi] in that encoding.
In[12]:= **ToCharacterCode["abc\[EAcute]\[Pi]", "WindowsANSI"]**

Out[12]=

The character codes used internally by Mathematica are based on Unicode. But externally Mathematica by default always uses plain ASCII sequences such as \[Name] or \:xxxx to refer to special characters. By telling it to use the raw "Unicode" character encoding, however, you can get Mathematica to read and write characters in raw 16-bit Unicode form.