# Computer Arithmetic Package

The arithmetic used by *Mathematica* is a mixture of variable-precision software arithmetic and whatever is provided by the manufacturer of the floating-point hardware (or the designer of the compiler, if there is no floating-point hardware). If you want to learn about the basic ideas of computer floating-point arithmetic in general or examine the machine arithmetic on your machine, you can use . This allows you to examine arithmetic with various bases, precisions, and rounding rules.

ComputerNumber[x] | convert the ordinary number x to a computer number in the arithmetic currently in effect |

ComputerNumber[sign,mantissa,exp] | form the computer number with sign sign, mantissa mantissa, and exponent exp |

the complete data object that makes up a computer number | |

NaN | a nonrepresentable number in the current arithmetic |

Computer numbers and non-numbers in .

Much of the information carried around in the data object that makes up a computer number is redundant. In particular, the first three arguments contain exactly the same information as the fourth argument. The redundancy exists partly for the sake of efficiency and partly to allow the user access to the various fields. The fifth argument has nothing to do with the computer number itself. It instead represents what the value of the number would be without the cumulative effects of all the roundoff errors that went into the computer number. It is computed using *Mathematica*'s high-precision arithmetic and can generally be regarded as the correct value of the number. Comparing the computer number with this number gives the error in the computer number.

In[1]:= |

In[2]:= |

Out[2]= |

In[3]:= |

Out[3]= |

In[4]:= |

Out[4]= |

In[5]:= |

Out[5]//InputForm= | |

In[6]:= |

Out[6]= |

In[7]:= |

Out[7]= |

SetArithmetic[n] | set the arithmetic to be n digits, base 10 |

SetArithmetic[n,b] | set the arithmetic to be n digits, base b |

Arithmetic[] | give the parameters of the arithmetic currently in effect |

Changing the type of arithmetic to be used.

In[8]:= |

Out[8]= |

In[9]:= |

Out[9]= |

In[10]:= |

Out[10]= |

There are several options that can be used when setting the arithmetic. In addition to changing the precision and the base, you can control the type of rounding used, the magnitude of the numbers allowed, whether mixed-mode arithmetic is to be allowed, and whether division is to be done with a single rounding operation.

option name | default value | |

RoundingRule | RoundToEven | the type of rounding to be used |

ExponentRange | {-50,50} | the range of exponents that is to be allowed |

MixedMode | False | whether mixed-mode arithmetic is to be allowed |

IdealDivision | False | whether correctly rounded division is to be used |

Options for SetArithmetic.

It should be noted that correctly rounded division is implemented by the function IdealDivide. This function can be used whether or not the arithmetic is set to automatically use correct division. It is difficult to get to give correctly rounded division since the is converted to before the package ever sees it. To get around this, $PreRead is defined to convert into x~IdealDivide~y before the parser has a chance to change it to . If you want to use $PreRead for your own purposes this will interfere with your definition.

RoundToEven | round to the nearest representable number and, in the case of a tie, round to the one represented by an even mantissa |

RoundToInfinity | round to the nearest representable number and, in the case of a tie, round away from 0 |

Truncation | simply discard excess digits, much as Floor does for positive numbers |

In[11]:= |

Out[11]= |

In[12]:= |

Out[12]= |

In[13]:= |

Out[13]= |

There are many things about computer arithmetic that are very different from ordinary arithmetic. Most of these differences are quite easy to demonstrate.

In[14]:= |

Out[14]= |

In[15]:= |

Out[15]= |

In[16]:= |

Out[16]= |

In[17]:= |

Out[17]//FullForm= | |

In[18]:= |

Out[18]//FullForm= | |

In[19]:= |

Out[19]//FullForm= | |

In[20]:= |

Out[20]//FullForm= | |

In[21]:= |

Out[21]= |

In[22]:= |

Out[22]= |

In[23]:= |

Out[23]= |

In[24]:= |

Out[24]= |

In[25]:= |

Out[25]= |

In[26]:= |

Out[26]= |

In[27]:= |

Out[27]= |

In[28]:= |

Out[28]= |

In[29]:= |

Out[29]= |

In[30]:= |

Out[30]= |

In[31]:= |

Out[31]= |

In[32]:= |

Out[32]= |

In[33]:= |

Out[33]= |

In[34]:= |

In[35]:= |

Out[35]= |

In[36]:= |

Out[36]= |

*Mathematica* uses both the machine arithmetic that is provided by the machine and its own arbitrary-precision arithmetic, which is the same on all machines (except for the variations in the size of the largest representable number and the amount of memory on the machine). This package also deals with the floating-point arithmetic provided by the machine.

Numbers on a computer comprise a discrete set. There are gaps between the numbers and when you do arithmetic the result often is not representable. When the result falls between two representable numbers, the best that can be done is to use the closest representable number as the result instead of the correct result. The set of numbers that can be represented on a computer in floating-point format is commonly referred to as the set of machine numbers. This package allows you to vary certain fundamental parameters that control the arithmetic.

The difference between two consecutive machine numbers is called an ulp (unit in the last place, i.e., one digit in the least significant place). The size of an ulp varies depending on where you are in the set of machine numbers. Between 1 and 2 an ulp is equal to $MachineEpsilon. Between 2 and 4 it is equal to 2$MachineEpsilon. Ideally no function should ever return a result with error exceeding half of an ulp since the distance from the true result to the nearest machine number is always less than half of an ulp, the worst case being when it is exactly halfway between two machine numbers. It is relatively easy to achieve this ideal for the four arithmetic operations and for the square root function. For the elementary transcendental functions it becomes more difficult due to the fact that they cannot be evaluated in a single operation and the combined effects of all the rounding errors would have to be made less than half an ulp. Nevertheless, it is quite possible to design algorithms for these functions such that the error is never more than one ulp.

The Computer Arithmetic Package provides four functions that are useful in examining the machine floating-point arithmetic on your machine.

Ulp[a] | give the size of an ulp near a |

MachineError[f,x->a] | give the error in evaluating f at using machine arithmetic |

MicroscopePlot[f,{x,a}] | plot f near using machine arithmetic |

MicroscopicErrorPlot[f,{x,a}] | plot the error in evaluating f near using machine arithmetic |

Functions for examining machine floating-point arithmetic.

The default neighborhood of the point used in MicroscopePlot and MicroscopicErrorPlot is from 30 ulps to the left of a to 30 ulps to the right of a, where the ulp used is that defined at , that is, Ulp[a]. Near powers of 2 and near 0 it is not clear what should be done since the size of an ulp changes. Near powers of 2 it is not much of a problem in the abscissa since an ulp is chosen to be the smaller of the two values and the resulting machine numbers just get included multiple times due to rounding effects. You can change the default size of the neighborhood by including a third value in the neighborhood specification.

A problem arises with MicroscopePlot and MicroscopicErrorPlot near places where the function value becomes infinite. Since the function is changing by orders of magnitude very rapidly, it is difficult to choose a good scale that displays the information in which you are interested. In such cases you may have to examine the function several hundreds of ulps away from the point you really want.

MicroscopePlot[f,{x,a,n}] | plot f from x=a-n Ulp[a] to x=a+n Ulp[a] using machine arithmetic |

MicroscopicErrorPlot[f,{x,a,n}] | plot the error in evaluating f from x=a-n Ulp[a] to x=a+n Ulp[a] using machine arithmetic |

Controlling the size of the neighborhood.

The option Joined controls the way the plot is drawn. It cannot be overemphasized that the functions discussed here map the machine numbers to the machine numbers. For purposes here, the real numbers in between do not even exist. Thus the plots produced by MicroscopePlot and MicroscopicErrorPlot should be nothing but a set of plotted points. However, it is easier to see what is happening if the points are joined with straight line segments. Finally, if you want to see what a function does as a map from the real numbers to the real numbers, that is, rounding the real argument to the nearest machine number and applying the function to this result, you can do so, but when you see large errors, do not conclude that the function is to blame; it cannot do any better than deal with the machine numbers that it gets as arguments.

option name | default value | |

Joined | True | the way the points are to be connected; either True, False, or Real |

Option for MicroscopePlot and MicroscopicErrorPlot.

In[37]:= |

Out[37]= |

In[38]:= |

Out[38]= |

In[39]:= |

Out[39]= |

In[40]:= |

In[41]:= |

Out[41]= |

In[42]:= |

Out[42]= |

In[43]:= |

Out[43]= |

In[44]:= |

Out[44]= |

In[45]:= |

Out[45]= |

In[46]:= |

Out[46]= |

In[47]:= |

Out[47]= |

In[48]:= |

Out[48]= |

In[49]:= |

Out[49]= |

In[50]:= |

Out[50]= |

In[51]:= |

Out[51]= |

In[52]:= |

Out[52]= |

One point needs to be made clear, since rounding errors are often misunderstood and blame is incorrectly placed on an algorithm that has nothing to do with the error. Algorithms are designed to map the set of machine numbers into the set of machine numbers. To consider other numbers does not make sense. As far as the computer is concerned there *are* no other numbers. So, for example, if you want to find the square root of , you might do it as follows.

In[53]:= |

Out[53]= |

In[54]:= |

Out[54]= |

In[55]:= |

Out[55]= |

In[56]:= |

Out[56]= |

Here the error is less than half an ulp. However, this error is due to both the square root function and the rounding error incurred in converting the number to the machine number . In fact, the square root function has an error of less than 0.1 ulps.

In[57]:= |

Out[57]= |

In[58]:= |

Out[58]= |