Data Structures—Wolfram Documentation

Wolfram Language & System Documentation Center

Data Structures

Non-polymorphic	Operations
Polymorphic

The Wolfram Compiler contains definitions for a variety of data structures. These are the same data structures that are available in the top-level Wolfram Language; they can be seen with a call to $DataStructures;

This section will look at data structures and working with them in compiled code. Many of the concepts here also apply to any product type created with the compiler. What is special about the Wolfram Compiler data structures is that they have already been written, so are ready to use without programmers having to build their own. They can be used in many areas, including building more specialized data structures, which are often built on top of data structures such as these.

The data structures included with the Wolfram Compiler provide various interesting opportunities. One feature is that many of them are mutable, i.e. the state of an instance can be modified. This is in contrast to regular Wolfram expressions, which are immutable other than through the value of expression symbols.

It is also possible to divide data structures between those that are polymorphic and those that are not polymorphic, i.e. they have types that do not take arguments. Examples of these are "BitVector", "BloomFilter" or "Counter". This is in contrast with polymorphic data structures, which have types that take arguments. An example of these is "DynamicArray", which is polymorphic in the data type it contains. This means you can have a dynamic array of expressions or a dynamic array of integers, and so on. "HashTable" is polymorphic in both the type of the keys and the values, so you can have a hash table with keys that are strings and values that are integers.

The discussion here will start with non-polymorphic data structures, since they are simpler. They will use "BitVector" data structure and explore how it can be used in compiled code.

This creates a bit vector:

You can set bits in the bit vector:

You can test bits to see if they are set. The 5 ^th bit was set, so this returns True:

This shows the mutability of the bit vector data structure. Setting the bit changed a specific instance of the data structure.

The 7 ^th bit was not set, so this returns False:

One important use of a bit vector is as a compressed list of Boolean values. For example, the following function iterates through a list; if a test on an element is True, then a bit is set in a bit vector:

This creates a list and generates a bit vector to see which elements are even:

You can extract the elements with the Pick function. Generating a bit vector and then using Pick is like Select:

You can compile the code for testing the list and returning a bit vector. This could be done by referencing the declaration that was made. This declaration takes a container and a function. The function type is written to use TypeEvaluate of Information to extract the type contained in the container:

Alternatively, it would also be possible to put the code for testList into the declaration like below:

This compiles a call to the testing function for a packed array of integers and using EvenQ as the test:

This runs the function on the list that was created above:

This runs Pick on the bit vector that was created to return the same result as above:

Interpreted code and compiled code are returning the same result. However, their relative performance is not the same. This can be demonstrated with a larger sample:

This is the interpreted version:

This is the compiled version:

So the compiled code has a very significant speed advantage.

The compiled version benefits from compilation of the test function. Since this is known to take machine integers, it can go straight to code that works for machine integers. This is not the case for the interpreted code. In addition, the return of a data structure (in this case a bit vector) from the compiled code is particularly efficient.

Non-polymorphic

Some data is not polymorphic. This means that its types do not take any parameters. This section will discuss data structures that are not polymorphic, using "BitVector" as an example.

Type

The type of a data structure is given by the name of the data structure. For a bit vector, the type is "BitVector".

Here, a compiled function that returns a bit vector is created. It uses CreateDataStructure to create the bit vector:

This shows the return type of the function. It is "BitVector":

This function takes a bit vector and returns it. It uses the type "BitVector" for the argument:

Here a bit vector is created by using CreateDataStructure:

When the compiled code is called with the bit vector argument, it gets returned:

Creating

As seen in the previous examples, a data structure can be created in compiled code by using CreateDataStructure:

This runs the compiled code, which returns a bit vector:

The data structure that is created is the same as one created in a top-level call to the Wolfram evaluator:

Operations

Many of the features of data structures are provided by their operations. For example, bit vectors support a range of operations, and these can be seen with Information:

These operations can be called at the top level:

They can also be called in compiled code:

This creates a new bit vector and passes it to the compiled code:

The bit vector modified in evaluated code is the same as that modified in compiled code:

The way that top-level and compiled functionality work in similar ways such as this is one of the strengths of working with the Wolfram Compiler.

You can see the functionality available to the "BitVector" type in the compiler using CompilerInformation:

Polymorphic

Many data structures are polymorphic. This means that they are parametrized in at least one type. For example, container data structures are parametrized by the type that they contain. This adds great power, because you have containers that are optimized to contain a particular type of object. It does add several new issues that have to be dealt with.

The "DynamicArray" data structure will be used as an example of a polymorphic data structure.

This creates a dynamic array data structure:

This appends an element to the data structure:

This defines a function that takes a data structure, an element and a test. If the test returns True, then the element is appended to the data structure:

This creates a dynamic array and iterates over a list, testing the elements:

These are the elements that are larger than 3:

This can be compiled with a suitable declaration. Notice that the type of the elements is given as "InertExpression". The test function is also set to take this type:

A compiled function can be created:

This runs the compiled code:

The reason that "InertExpression" was used for the element type is that the data structure was a dynamic array of "InertExpression" type. In fact, all of the polymorphic data structures are created for a parameter of "InertExpression". This is obviously very sensible: for a top-level feature, one would want it to work in the widest sense possible, and this would be for expressions. However, there are other dynamic arrays, those that contain other types.

You can see more of this if you compile some code that takes a "DynamicArray" and a machine integer and carry out an append operation:

Looking at the type, you can see that the dynamic array is one that takes integers, not expressions:

If you call the compiled code with a dynamic array created at top-level it fails. This is because the dynamic array created at the top level is one that contains expressions, whereas this compiled code requires a dynamic array of integers:

In Compiled Code

You can easily work with polymorphic data structures in compiled Wolfram code. In the following code, a "HashSet" data structure is created and then used to store distinct random numbers. The total number of entries in the hash set is returned:

For 58 random numbers, this shows how many were different:

This makes a plot of the results:

The key thing about this example is that the compiled code made use of the same set of data structures that are available in the top level using the same set of operations, even though the contents were different (integers instead of expressions).

Creating

It is useful to understand how polymorphic data structures are created in compiled code. Here is an example that uses CreateDataStructure:

The return type is actually a "DynamicArray" that contains "InertExpression":

Calling the function returns a dynamic array that is just the same as if it had been created in the top level:

You could equally well use the full type in CreateDataStructure:

You could also use the full type in the top level as well:

In compiled code, you can specify the full type for content that is not an expression. Here a dynamic array that contains the "Integer64" type is created:

This cannot be done in the top level:

In compiled code, if you create a dynamic array and then use it, it will adapt to the way it is being used. For example, in this case, appending a real number results in a dynamic array of "Real64":

You can use the function and get back a reference to a dynamic array, but it is not a dynamic array of "InertExpression", and it does not have all the nice behavior available for dynamic arrays of expressions:

For example, Information does not know about this object:

The result of the compiled code is equivalent to the result of a function that returns a type product with "DefaultSerializable", as was seen earlier. More techniques for working with these types of data structures will be seen in a following section.

Default Arguments

The Wolfram Compiler treats types for polymorphic data structures such as "DynamicArray" by giving them a default argument.

In the following, if the argument is not specified, then it will become an "InertExpression":

You can call the compiled code with a dynamic array created at the top level:

You can also give the fully qualified type, as in the following:

However, if the type was used with some other contents, then it will adapt to that. In the following, the type is given as "DynamicArray", but an "Integer64" is appended to it. This means that it is inferred to be a dynamic array of integers:

Being able to provide a default argument for these types is quite useful.

Non-expression Contents

The previous sections showed that polymorphic data structures could be made for types other than "InertExpression". However, they do not have all the nice support provided for the built-in data structures (which are created for "InertExpression"). This section shows what you can do with them.

First, you can, of course, work with them in purely internal code. The example given earlier showed how this can be done. The point is that you have a data structure that is optimized for the type you want to work with. If you want a hash set of integers, this will have faster operations than a hash set of expressions, because integers have a lower overhead than expressions.

These benefits can also be taken over to work with a mixture of compiled and evaluated code that makes working with the Wolfram Compiler in the Wolfram Language a powerful system.

This compiled code returns a hash set of integers. The only way to create a hash set of integers is returning from compiled code like this:

When the function it called, it returns a reference to the data structure:

To use it, you have to write other compiled functions. It is quite useful to write these as one collection of functions by compiling an association, as in the following:

Now you have code that can create a hash set:

You can test it for membership of an integer:

You can insert an integer:

Now the integer is found:

One thing that makes it different from the hash set created for expressions is that it will only work with integers. If you give it a real number, then there is an error:

If there is other functionality that you need, you can add it with a simple compiled function.

Operations

The previous section discussed how data structures can be passed in and out of compiled code. It also showed how they can be worked with directly in the Wolfram Language evaluator. This section will discuss some of the ways they can be used.

Iterating over Elements

Many of the data structures work by storing collections of data. They may store them in some type of order for ease of access, for example, "DynamicArray", "LinkedList", "HashSet", "HashTable", "Queue", "Stack" or "RedBlackTree".

One feature that the Wolfram Compiler provides is to support iteration over these data structures. This is conveniently set up by a Do loop.

The following compiles a Do loop that iterates over the elements in a "DynamicArray":

Create a dynamic array to use for demonstrating:

Call the function with the array, and the elements are printed out:

This lends itself very well to a polymorphic function declaration. The following is very similar to the previous example, except that it does not say what exactly the type of the argument is:

This compiles for the dynamic array:

The elements can be printed out like previously:

This uses the same declaration, but now it is being passed a "RedBlackTree":

Here, a red-black tree is created, and a number of insertions are made:

Calling the function on the red-black tree prints the elements:

The declaration also works for polymorphic data structures that do not contain expressions. For example, this function prints the elements of a "HashSet" of integers:

It is necessary to create the hash set with some compiled code:

Now the hash set can be created and fed to the function, which prints its elements:

Many of the functional constructs, such as Scan or Fold, are implemented using techniques like this.

Mutability

Many of the Wolfram Language data structures can be modified mutably. This means that their internal state can be changed by some of their operations. This stands in contrast to most Wolfram Language expressions, which cannot be mutated other than a limited of cases, such as through the values of symbols.

For an example, here is a list of numbers assigned to a symbol sym:

The ReplacePart function returns a new list with a modified element:

However, the value of the symbol has not changed. This is an example of expression immutability:

For another example, a new symbol sym1 is assigned to the previous symbol:

Now an assignment to the original symbol is made:

The value of the symbol has changed, and this is a form of mutability. It is the main form of mutability for Wolfram Language expressions:

However, the symbol that had been assigned has not changed its value:

By way of contrast, here is a "Value" data structure:

There is no equivalent of the ReplacePart function, but there are operations for setting the value stored.

This makes an assignment to the data structure:

Now a new value is set:

The data structure has the new value:

The data structure from the assignment also has the new value:

In fact, any product type created with the Wolfram Compiler can choose to have state-mutating operations. When operations for data structures are designed, they are often chosen to be mutating.

Updating Elements

One form of state-mutating operations is for data structures that contain collections of data to update individual elements. You can see examples of this in the top-level interface to data structures, which is often a good way to learn and experiment.

This example will use a "FixedArray" of strings. It should be noted that there is a dedicated data structure for working with lists of strings: "StringVector". A string vector is optimized for holding large collections of small strings, but it does not allow mutation of its contents. If you need a data structure that holds a collection of strings, which is fast to pass in out of the Wolfram evaluator and which allows mutation, then a "FixedArray" of strings is probably useful.

First, a definition of a function that creates an array of strings:

Here is a function that returns the elements from an array of strings:

Now an array of strings for a given length is created:

When the elements are extracted, they are seen to be empty strings. This uses InputForm to show that they are zero-length strings:

This function allows updating of one element. Note how it makes an assignment to a local variable before assigning the elements. This inconvenience is due to a weakness in the compiler, but has no efficiency effect:

In a loop in top-level Wolfram Language, the elements are updated:

This shows how the elements are updated.

Now a function that swaps elements is created and applied to the array:

The elements show that the array was updated in place, with the first two elements swapping positions:

This function updates all the elements in a loop. It uses a loop over the elements but also uses a counter to allow in-place updating of the elements:

The elements are all modified:

These examples show how the contents of container data structures can be updated in a variety of different ways.

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

Data Structures

Non-polymorphic

Type

Creating

Operations

Polymorphic

In Compiled Code

Creating

Default Arguments

Non-expression Contents

Operations

Iterating over Elements

Mutability

Updating Elements