The Software Engineering of Mathematica

Mathematica is one of the more complex software systems ever constructed. It is built from several million lines of source code, written in C/C++, Java, and Mathematica.

The C code in Mathematica is actually written in a custom extension of C which supports certain memory management and object-oriented features. The Mathematica code is optimized using Share and DumpSave.

In the Mathematica kernel the breakdown of different parts of the code is roughly as follows: language and system: 30%; numerical computation: 20%; algebraic computation: 20%; graphics and kernel output: 30%.

Most of this code is fairly dense and algorithmic: those parts that are in effect simple procedures or tables use minimal code since they tend to be written at a higher level—often directly in Mathematica.

The source code for the kernel, save a fraction of a percent, is identical for all computer systems on which Mathematica runs.

For the front end, however, a significant amount of specialized code is needed to support each different type of user interface environment. The front end contains about 700,000 lines of system-independent C++ source code, of which roughly 200,000 lines are concerned with expression formatting. Then there are between 50,000 and 100,000 lines of specific code customized for each user interface environment.

Mathematica uses a client-server model of computing. The front end and kernel are connected via MathLink—the same system as is used to communicate with other programs. MathLink supports multiple transport layers, including one based upon TCP/IP and one using shared memory.

The front end and kernel are connected via three independent MathLink connections. One is used for user-initiated evaluations. A second is used by the front end to resolve the values of Dynamic expressions. The third is used by the kernel to notify the front end of Dynamic objects which should be invalidated.

Within the C code portion of the Mathematica kernel, modularity and consistency are achieved by having different parts communicate primarily by exchanging complete Mathematica expressions.

But it should be noted that even though different parts of the system are quite independent at the level of source code, they have many algorithmic interdependencies. Thus, for example, it is common for numerical functions to make extensive use of algebraic algorithms, or for graphics code to use fairly advanced mathematical algorithms embodied in quite different Mathematica functions.

Since the beginning of its development in 1986, the effort spent directly on creating the source code for Mathematica is about a thousand developer-years. In addition, a comparable or somewhat larger effort has been spent on testing and verification.

The source code of Mathematica has changed greatly since Version 1 was released. The total number of lines of code in the kernel grew from 150,000 in Version 1 to 350,000 in Version 2, 600,000 in Version 3, 800,000 in Version 4, 1.5 million in Version 5, and 2.5 million in Version 6. In addition, at every stage existing code has been revised—so that Version 6 has only a small percent of its code in common with Version 1.

Despite these changes in internal code, however, the user-level design of Mathematica has remained compatible from Version 1 on. Much functionality has been added, but programs created for Mathematica Version 1 will almost always run absolutely unchanged under Version 6.

New to Mathematica? Find your learning path »
Have a question? Ask support »