Computer Arithmetic Package

The arithmetic used by the Wolfram Language is a mixture of variable-precision software arithmetic and whatever is provided by the manufacturer of the floating-point hardware (or the designer of the compiler, if there is no floating-point hardware). If you want to learn about the basic ideas of computer floating-point arithmetic in general or examine the machine arithmetic on your machine, you can use ComputerArithmetic`. This allows you to examine arithmetic with various bases, precisions, and rounding rules.

ComputerNumber[x]convert the ordinary number x to a computer number in the arithmetic currently in effect
ComputerNumber[sign,mantissa,exp]form the computer number with sign sign, mantissa mantissa, and exponent exp
ComputerNumber [ sign , mantissa , exp , value , x ]
the complete data object that makes up a computer number
NaNa nonrepresentable number in the current arithmetic

Computer numbers and non-numbers in ComputerArithmetic.

Much of the information carried around in the data object that makes up a computer number is redundant. In particular, the first three arguments contain exactly the same information as the fourth argument. The redundancy exists partly for the sake of efficiency and partly to allow the user access to the various fields. The fifth argument has nothing to do with the computer number itself. It instead represents what the value of the number would be without the cumulative effects of all the roundoff errors that went into the computer number. It is computed using the Wolfram Language's high-precision arithmetic and can generally be regarded as the correct value of the number. Comparing the computer number with this number gives the error in the computer number.

This loads the package.
Here is the computer number representing the ordinary number 2.
This gives the computer number representation of .
Arithmetic works with computer numbers just as with ordinary numbers.
Here is the complete structure of c.
You can also enter just the sign, mantissa, and exponent.
But if your input does not make sense in the arithmetic currently in effect, you get NaN. Here the problem is that the mantissa is only three digits long.
SetArithmetic[n]set the arithmetic to be n digits, base 10
SetArithmetic[n,b]set the arithmetic to be n digits, base b
Arithmetic[]give the parameters of the arithmetic currently in effect

Changing the type of arithmetic to be used.

The default arithmetic is four digits in base 10 with a rounding rule of RoundToEven. Only numbers between and are allowed. Mixed-mode arithmetic is not allowed and division is not done with correct rounding. (It is performed as two operations: multiplication by the reciprocal. Each operation involves rounding errors.)
Now the arithmetic is set to be six digits in base 8.
The result is displayed in octal (with trailing zeros suppressed).

There are several options that can be used when setting the arithmetic. In addition to changing the precision and the base, you can control the type of rounding used, the magnitude of the numbers allowed, whether mixed-mode arithmetic is to be allowed, and whether division is to be done with a single rounding operation.

option name
default value
RoundingRuleRoundToEventhe type of rounding to be used
ExponentRange{-50,50}the range of exponents that is to be allowed
MixedModeFalsewhether mixed-mode arithmetic is to be allowed
IdealDivisionFalsewhether correctly rounded division is to be used

Options for SetArithmetic.

It should be noted that correctly rounded division is implemented by the function IdealDivide. This function can be used whether or not the arithmetic is set to automatically use correct division. It is difficult to get x/y to give correctly rounded division since the x/y is converted to x y^(-1) before the package ever sees it. To get around this, $PreRead is defined to convert x/y into x~IdealDivide~y before the parser has a chance to change it to x y^(-1). If you want to use $PreRead for your own purposes this will interfere with your definition.

RoundToEvenround to the nearest representable number and, in the case of a tie, round to the one represented by an even mantissa
RoundToInfinityround to the nearest representable number and, in the case of a tie, round away from 0
Truncationsimply discard excess digits, much as Floor does for positive numbers

Types of rounding available.

Mixed-mode arithmetic (any arithmetic operation involving an integer and a computer number) is not allowed.
This turns on mixed-mode arithmetic (and sets the arithmetic to six digits in hexadecimal).
Now the product is computed.

There are many things about computer arithmetic that are very different from ordinary arithmetic. Most of these differences are quite easy to demonstrate.

This sets the arithmetic back to the default.
Expressions are evaluated numerically before they become computer numbers.
If the separate parts are converted to computer numbers first, catastrophic cancellation of digits can occur.
Summing the terms from smallest to largest gives one answer.
As a general rule, it is better to sum from smallest to largest. You can see what the error is by comparing the mantissa (the second argument) to the correct value (the last argument).
Here is an example where summing from largest to smallest gives the better result.
The difference is slight, and such examples are rare.
Basic arithmetic is all that is implemented in the package. You could easily extend things to include elementary functions.
Here is the square root of 47.
It is a theorem that correctly rounded square roots of small integers will always square back to the original integer if the arithmetic is correct.
But a similar theorem for cube roots does not exist.
This changes the arithmetic to seven digits in base 10 with a rounding rule of RoundToEven and an exponent range of -50 to 50.
This number rounds down.
This number rounds up because it rounds toward the mantissa that is even (1000000) rather than the one that is odd (9999999).
Again it rounds toward the even mantissa (1000000 rather than 1000001).
The reciprocal of the reciprocal is not the original number; in fact it may be quite different.
This is multiplication by the reciprocal. It involves two rounding operations.
This is true division. It uses a single rounding operation. You can get this better type of division to work with / by setting the option IdealDivide to True when you use SetArithmetic.
Normal converts a computer number back to an ordinary rational number with exactly the same value.
Now you change the arithmetic again.
This suppresses error messages that will result from plotting non-numeric values in the following examples.
It is easy to plot the error in the computer number representation of each number.
You can zoom in to see the hole at zero.

The Wolfram Language uses both the machine arithmetic that is provided by the machine and its own arbitrary-precision arithmetic, which is the same on all machines (except for the variations in the size of the largest representable number and the amount of memory on the machine). This package also deals with the floating-point arithmetic provided by the machine.

Numbers on a computer comprise a discrete set. There are gaps between the numbers and when you do arithmetic the result often is not representable. When the result falls between two representable numbers, the best that can be done is to use the closest representable number as the result instead of the correct result. The set of numbers that can be represented on a computer in floating-point format is commonly referred to as the set of machine numbers. This package allows you to vary certain fundamental parameters that control the arithmetic.

The difference between two consecutive machine numbers is called an ulp (unit in the last place, i.e., one digit in the least significant place). The size of an ulp varies depending on where you are in the set of machine numbers. Between 1 and 2 an ulp is equal to $MachineEpsilon. Between 2 and 4 it is equal to 2$MachineEpsilon. Ideally no function should ever return a result with error exceeding half of an ulp since the distance from the true result to the nearest machine number is always less than half of an ulp, the worst case being when it is exactly halfway between two machine numbers. It is relatively easy to achieve this ideal for the four arithmetic operations and for the square root function. For the elementary transcendental functions it becomes more difficult due to the fact that they cannot be evaluated in a single operation and the combined effects of all the rounding errors would have to be made less than half an ulp. Nevertheless, it is quite possible to design algorithms for these functions such that the error is never more than one ulp.

The Computer Arithmetic Package provides four functions that are useful in examining the machine floating-point arithmetic on your machine.

Ulp[a]give the size of an ulp near a
MachineError[f,x->a]give the error in evaluating f at x=a using machine arithmetic
MicroscopePlot[f,{x,a}] plot f near x=a using machine arithmetic
MicroscopicErrorPlot[f,{x,a}]plot the error in evaluating f near x=a using machine arithmetic

Functions for examining machine floating-point arithmetic.

The default neighborhood of the point x=a used in MicroscopePlot and MicroscopicErrorPlot is from 30 ulps to the left of a to 30 ulps to the right of a, where the ulp used is that defined at x=a, that is, Ulp[a]. Near powers of 2 and near 0 it is not clear what should be done since the size of an ulp changes. Near powers of 2 it is not much of a problem in the abscissa since an ulp is chosen to be the smaller of the two values and the resulting machine numbers just get included multiple times due to rounding effects. You can change the default size of the neighborhood by including a third value in the neighborhood specification.

A problem arises with MicroscopePlot and MicroscopicErrorPlot near places where the function value becomes infinite. Since the function is changing by orders of magnitude very rapidly, it is difficult to choose a good scale that displays the information in which you are interested. In such cases you may have to examine the function several hundreds of ulps away from the point you really want.

MicroscopePlot[f,{x,a,n}]plot f from x=a-n Ulp[a] to x=a+n Ulp[a] using machine arithmetic
MicroscopicErrorPlot[f,{x,a,n}]plot the error in evaluating f from x=a-n Ulp[a] to x=a+n Ulp[a] using machine arithmetic

Controlling the size of the neighborhood.

The option Joined controls the way the plot is drawn. It cannot be overemphasized that the functions discussed here map the machine numbers to the machine numbers. For purposes here, the real numbers in between do not even exist. Thus the plots produced by MicroscopePlot and MicroscopicErrorPlot should be nothing but a set of plotted points. However, it is easier to see what is happening if the points are joined with straight line segments. Finally, if you want to see what a function does as a map from the real numbers to the real numbers, that is, rounding the real argument to the nearest machine number and applying the function to this result, you can do so, but when you see large errors, do not conclude that the function is to blame; it cannot do any better than deal with the machine numbers that it gets as arguments.

option name
default value
JoinedTruethe way the points are to be connected; either True, False, or Real

Option for MicroscopePlot and MicroscopicErrorPlot.

Here is the size of an ulp at 1.
The size of an ulp changes at powers of 2.
There is a large hole near 0.
Here are the combined errors of rounding to the nearest machine number and then taking the square root.
Here is the error of just the square root.
For another number close to , the error in the square root function is less than half an ulp.
The function is not provided by the computer manufacturer, so it is evaluated as a nested sequence of operations that are provided. This shows how much error is involved in the sequence, but this information is rarely useful. For testing purposes you normally want to look at an individual function evaluated at a machine number to avoid the effect of an ill-conditioned function magnifying prior error.
Here is the Log function evaluated at machine numbers near 7.
Here is the Log function evaluated at 10+1+10 machine numbers near 7 with the points joined by straight lines.
And here are the combined effects of rounding real numbers to the nearest machine number and taking the Log of the result. This is not a valid test of the Log function.
This shows the error in evaluating the sine function at 31 machine numbers near 16. Note how the size of an ulp changes at 16. Note also that rounding has moved some points to coincide so that there appear to be fewer than 31 points. The scale on the vertical axis represents ulps.
Here is the error in evaluating sine at 21 machine numbers near 31 with the points joined by straight lines.
And here are the combined effects of rounding real numbers to the nearest machine number and taking the sine of the result. This is not a valid test of the sine function. The error in evaluating the sine function at a machine number near 31 is much smaller than the combination of errors resulting from first rounding a real number to the nearest machine number and then taking the sine of that machine number.
Here you see how bad the square root function really is. It appears that near its error is bounded by about ulps.
You cannot examine this function exactly at 1 since it is singular there. You can, however, examine it near 1. The label for is printed as because the label is limited to 6 characters.

One point needs to be made clear, since rounding errors are often misunderstood and blame is incorrectly placed on an algorithm that has nothing to do with the error. Algorithms are designed to map the set of machine numbers into the set of machine numbers. To consider other numbers does not make sense. As far as the computer is concerned there are no other numbers. So, for example, if you want to find the square root of , you might do it as follows.

Start with a good numerical approximation to .
This finds the nearest machine number to a.
This finds the square root of the machine number b using the machine-precision algorithm for square root.
Here the error in c is measured in ulps. This result gives the error for the particular machine on which this example was run. Your machine may give a very different result.

Here the error is less than half an ulp. However, this error is due to both the square root function and the rounding error incurred in converting the number a to the machine number b. In fact, the square root function has an error of less than 0.1 ulps.

Here is the correct value for the square root of b.
This shows the error in the square root function.