Additional functionality related to this tutorial has been introduced in subsequent versions of the Wolfram Language. For the latest information, see Matrices and Linear Algebra.

Performance of Linear Algebra Computation

This tutorial covers issues related to performance.

Packed Arrays

One of the challenges of building Wolfram Language is to make sure that the system is general enough to support symbolic computation and fast enough so that numerical computation is efficient. These and other goals are described under "Design Principles of Wolfram Language". One technique that Wolfram Language provides is to use specialized representations for important types of Wolfram Language expression. In the case of linear algebra this involves what is known as Packed Array Technology, and it was first introduced in version 4.0.

An example shows a vector of real numbers:
Now insert a symbolic quantity into the vector. Updating elements of vectors and matrices is discussed under "Setting Pieces of a Matrix":
When the byte count is measured the vector is shown to use more memory:
In addition, a mathematical operation on the vector is much slower:

The first vector uses less memory and supports faster computations because it is built from a packed array. Packed arrays are a representation for rectangular tensors of machine integers, machine reals, and complex numbers with real and imaginary parts that are machine reals.

machinesized integers
machineprecision reals
machineprecision complex numbers

Packed array types.

There are a number of functions that are useful for working with packed arrays. Developer`PackedArrayQ tests if its argument is a packed array. This demonstrates that the result of the Table computation is a packed array:
If a real number is inserted into the array, the result is still packed:
If a computation is done with the packed array, the result is packed:
However, if a general symbolic quantity is inserted into the packed array, the result is not packed:
All the elements of the packed array must be of the same type. Therefore, if the integer 0 is inserted into a packed array of reals, the result is not packed:

Packed Array Functions

A number of functions for working with packed arrays are summarized.

Developer`PackedArrayQ[expr]return True if expr is a packed array
Developer`ToPackedArray[expr]convert expr to a packed array if possible
Developer`ToPackedArray[expr,t]convert expr to a packed array of type t if possible
Developer`FromPackedArray[expr]convert expr from a packed array
Developer`PackedArrayForm[expr]print packed arrays in expr in a summary form

Packed array functions.

The function Developer`ToPackedArray can be used to make a packed array:
If the elements of the list are not all of the same type the result will not be a packed array:
By specifying a type of Real, the integers are coerced to reals and the result is packed:
This shows that pack is a packed array of three reals:
This does not make a packed array because the argument is not a rectangular tensor:

Packed Array Operations

The important detail of packed arrays is that they work exactly like lists except for the packed array specific functions described in "Packed Array Functions". For example, SameQ on the packed and nonpacked version of a list will return True:
The packed and nonpacked versions will behave the same in the pattern matcher:

Many functions can work more efficiently with packed arrays compared with unpacked lists, but the results will be the same, whether a packed or an unpacked version is used. In order to maintain consistency, the system must, on occasion, unpack a packed array. Because this is a potential source of inefficiency, a message is available to let you know that something has become unpacked. You can enable this message with a system option.

This sets the system option and creates a packed array to use as an example:
Now if a real number is inserted into the packed array, it must be unpacked and a message is produced. This message can warn you that your code may not run as efficiently:
It is usually a good idea to turn the message off again:

Packed Array Summary

The main point about packed arrays is that Wolfram Language uses them automatically where appropriate. This is very much the case for the linear algebra functions, which naturally work with the same internal representation. Thus many functions return packed arrays even if the input is not packed. For example, the function Dot returns a packed array because internally it works with the packed version:

To make use of packed arrays, it is typically hard to exploit functions such as Developer`ToPackedArray. The main issue is to make sure that you have a uniformity of elements in your matrices; for example, that you are not mixing integers and reals.

Programming Efficiency

This section discusses techniques for writing efficient Wolfram Language programs that solve linear algebra problems.

Measuring Performance

If you want to study the performance of your Wolfram Language code there are a number of timing related functions you can use.

Timing[expr]evaluate expr and return a list of the CPU time needed, together with the result obtained
AbsoluteTiming[expr]evaluate expr giving the absolute time taken

Timing Wolfram Language operations.

The Timing function is particularly useful because it returns the CPU time needed for the computation:

Another useful function is TimeConstrained.

TimeConstrained[expr,t]try to evaluate expr, aborting the calculation after t seconds
TimeConstrained[expr,t,failexpr]return failexpr if the time constraint is not met

Timeconstrained calculation.

This helps you to develop and test your code, shutting it down if it takes more than a preset limit:

Vectorizing Loops

A common operation involves some type of iteration over the elements of a matrix. Because the matrices may be large it is obviously important to do this efficiently.

One of the most important ways to write efficient code is to avoid explicit part references, especially in inner loops. This is a major source of inefficiency in user code and can be avoided by using functions that work on all the elements of a matrix or a vector at once. These vectorized operations allow Wolfram Language to use highly optimized code that runs faster.

List Creation

This problem involves creating a list and then initializing it with certain values. For example, computing

        vi  Sin[i]

One way to do this would be to create the list and then to use a For loop that iterates over its elements to set them to a value.

    n = 10;    
    v = Table[0, {n}];
    For[i = 1, i <= n, i++, v[[i]] = Sin[2.0 Pi i]];

Slow way to initialize a list.

It is much faster to create the list and initialize it all at once; it is also neater.

    n = 10;
    v = Table[ Sin[2.0 Pi i], {i, 1, n}]

Faster way to initialize a list.

An even faster way is to use the optimized function Range to create the list and then to use vectorized operations.

    n = 10;
    v = Sin[2.0 Pi Range[1., n]]

An even faster way to initialize a list.

Commands such as Table and Range are very optimized for building lists and matrices. They are discussed under "Building Matrices".

List Updating

This problem involves updating the elements of a list; for example, computing the Sin of every element.

        vi  Sin[vi]

One way to do this involves iterating over the elements of the list replacing each with the updated version.

    Do[v[[i]] = Sin[v[[i]]], {i, n}];

Slow way to update a list.

It is much faster to compute the new values by using a vectorized operation. The code is also neater and tidier.

    v = Sin[v]

Faster way to update a list.

Another form of updating requires another argument that is not constant across the list.

        vi  vi/i^2

One way to do this involves iterating over the elements of the list, dividing each by the square of the index.

    n = Length[v];
    Do[ v[[i]] = v[[i]]/i^2 , {i, n}];

Slow way to update a list.

It is faster to compute the list of updates and then carry out a vectorized division.


Faster way to update a list.

The use of listable operations is discussed under "Listability".

Using Built-in Support

Wolfram Language has a very wide range of functions, including many for list processing that have important applications for writing matrix computations. Before you start to work it is worth looking to see if any of the built-in functions will do what you want. For example, if what you are doing involves a convolution or correlation then you should use the appropriate functions; they will almost certainly be faster.

Another important point to remember is that Wolfram Language can represent a wide range of mathematical objects. For example, Wolfram Language can work with the actual linear equations that matrices are used to represent. Sometimes the equational form can be more expressive and convenient and this can lead to greater efficiency. The topic is discussed under "Converting Equations to Sparse Arrays".

One area of built-in functions with many applications for linear algebra computations are those for structural manipulation of lists. Wolfram Language functions such as Part, Take, Drop, Transpose, PadLeft, PadRight, RotateLeft, and RotateRight are all heavily optimized and have many applications. These are discussed in more detail under "Structural Operations". It is usually good if you can write your computations to use these functions.

A worked example will now be given that discusses different techniques to extend a matrix by tiling copies. For example, the input matrix


should be padded out to return the following matrix.


Three different ways to solve the problem are demonstrated.

A Slow Way

The slowest way to do this is to write this in a step-by-step approach directly manipulating every element:

This way suffers from several speed deficiencies, the most severe being the inner loops over the part indices. These would be much faster if they were replaced by vectorized code. It was explained in "Measuring Performance" that explicit part references in an inner loop can lead to inefficient code.

One thing to note about the function is that it will work when given a sparse matrix as input, but it will return a dense matrix as output:

A Faster Way

A much faster way to replicate the matrix is to write it to use Wolfram Language vectorizing functions:

This method is much faster; it uses the function Part, which is very flexible and heavily optimized. It is described further under "Getting Pieces of a Matrix".

This function works for a sparse matrix as input, and it returns the result as a sparse array which maintains equivalent sparsity:

Also Fast but Neater

Another way to extend the matrix, which involves no programming, is to use the built-in function PadRight:

This method is also neater and is probably the fastest for small input matrices. For large matrices, it is similar to the speed of the Part example already shown. The use of PadRight is described under "Extending Matrices".

This function works for a sparse matrix as input, and it returns the result as a sparse array which maintains equivalent sparsity:

Matrix Contents

Matrices in Wolfram Language can be constructed from all the different types of object that Wolfram Language holds. They can contain machine-precision floating-point numbers, arbitrary-precision floating-point numbers, complex floating-point numbers, integers, rational numbers, and general symbolic quantities. The different types are discussed under "Matrix Types".

That Wolfram Language can work with these different types of matrix is a great strength of the system. However, it has a number of implications for the efficiency of numerical floating-point computation: first from a mix of symbolic and numerical entries, secondly from a mix of different types of numbers, and thirdly from integer matrices.

Mixed Symbolic/Numerical Matrices

If you mix symbolic and numerical elements in a matrix, the linear algebra functions will treat the matrix as symbolic as discussed under "Mixed Mode Matrices". It should be noted that techniques for working with symbolic matrices are typically slower than the heavily optimized techniques available for numerical matrices.

An example here compares matrix multiplication of two 200×200 matrices. The second computation, which involves a symbolic matrix, is much slower:

The solution is to make sure that your matrices only contain numbers.

Mixed Numerical Type Matrices

If you mix different types of numbers in your matrices the linear algebra functions will treat the matrix as numerical and will coerce the matrix to the lowest precision. This is discussed under "Mixed Mode Matrices". For linear algebra functions this will not have as severe an impact on performance as when working with symbolic matrices (this was demonstrated in the previous section).

This example compares matrix multiplication of two 200×200 matrices. The second matrix, which contains an integer, is slower for matrix/matrix multiplication. If the operation was something more costly, this difference might not be significant:
The cost of mixed numerical matrices comes because Wolfram Language cannot use its efficient storage techniques, as discussed in the section "Packed Arrays". This is demonstrated in the following example. Here, a vector of numbers is allocated and this is updated by some process. Because the updated values will be real numbers and the initial vector contained integers, the process cannot use packed arrays:
In this version the vector is initialized with real numbers. Now the computation can use packed arrays throughout and it runs faster:

The solution to this problem is to make sure you initialize with real numbers if you are going to work with real numbers.

Integer Matrices

A final type of problem involves working with purely integer matrices. Wolfram Language makes a distinction between working with integer and approximate real matrices; this is discussed under "Matrices in Wolfram Language". Because an integer matrix will use symbolic computation techniques, it will be slower. Of course, if Wolfram Language did not use integer matrix techniques with matrices of integers it would not be so general:
The difference between the two computations can be compared. The eigenvalues of the integer matrix are returned with the exact values; in this case this involves radicals. The eigenvalues of the real matrix are returned with approximate real values:

Of course, it is a strength of Wolfram Language that it can return the exact result for a computation involving an integer matrix. But it should be noted that many purely numerical systems would return the approximate result.

The solution is that to work with matrices of approximate real numbers, you should start with approximate real numbers.

Expression Efficiency

"Matrices as Wolfram Language Expressions" explained that Wolfram Language represents matrices as expressions, a uniform data type that is used throughout the system. It described the advantages that came from this uniformity in the system. There are a number of efficiency implications that arise from the use of Wolfram Language expressions. This section discusses these.

Updating of Matrices

A general principle of Wolfram Language expressions is that they cannot be changed directly. This immutability of expressions is very useful; it allows an expression to be shared in many locations and each use need not be concerned that some other part of the code will modify the expression. This means that when an expression is updated (this is described under "Setting Pieces of a Matrix"), it may be necessary to make a copy. In this example, a vector is made and assigned to two different symbols:
Now change the contents of one symbol:
However, the other one is left unchanged. This is because the expression was copied before modification:
This copying can have an implication for efficiency, especially if you are iterating over the elements of a matrix and updating it. If this is done and you also have other copies of the matrix, it will need to be copied. The principle is demonstrated in the following two examples. In this first example there are no extra copies of the vector and so it does not need to be copied and the loop runs faster:
In this example there are extra copies of the vector and so it does need to be copied at each step; the loop runs more slowly:
Another slow way to update elements is to use ReplacePart. This also has to copy at every step:

If you use the vectorizing operations described previously, this is a good way to avoid the iteration over the matrix and minimize the number of copying operations. If you really cannot avoid iterating over the matrix, it is a good idea to keep the loop as simple and tidy as possible so that you can avoid extra copies of the matrix.

The time for the overall process will scale with the square of the length of the list for the technique that has to copy at every step. With no copying the time will scale linearly. The longer the input list, the greater the impact, and the performance of the code will start to degrade.

Appending to Matrices

In Wolfram Language, expressions are implemented as arrays. This allows fast access to any particular elements of an expression. Because matrices are regular Wolfram Language expressions, they are also arrays. While arrays are very fast for accessing elements, they are slower for adding elements. Typically this requires a copy, and it should be avoided. The example that follows shows the cost of adding elements.

It is much more efficient to generate the entire vector at once:
If the arguments are not known all at once, other techniques may be used. These are demonstrated next. First, build a list of random numbers:
The first technique is using AppendTo:
It is much faster to accumulate the list, inserting empty lists that can be removed at the end with Flatten:
It may be inconvenient to use Flatten because this might interfere with the structure of the list. In this case it might be convenient to use Sow and Reap. This is more flexible than the Flatten approach:
All three methods generate the same result:

The time for the overall process will scale with the square of the length of the list for the technique that has to copy at every step. With no copying the time will scale linearly. The longer the input list, the greater the impact, and the performance of the code will start to degrade.