Introduction to Neural Nets

LeNet and MNIST

This tutorial gives a brief overview of the Wolfram Language neural net framework by showing how to train a net that takes an input image of a handwritten single-digit number and then predicts the number. The dataset we are training on is the classic MNIST dataset, and we will train a variant of LeNet, one of the first convolutional nets, which is already available in the Wolfram Neural Net Repository.

Obtain the MNIST dataset, which contains 60,000 training and 10,000 test images:
In[1]
Click for copyable input
Display a few random examples from the training set:
In[2]
Click for copyable input
Out[2]
Obtain a pre-trained version of LeNet from the Wolfram Neural Net Repository:
In[3]
Click for copyable input
Out[3]
Classify a list of images using the pre-trained net:
In[4]
Click for copyable input
Out[4]

It is extremely easy to train a network like LeNet from scratch. NetTrain takes care of many details of the training process automatically, such as selecting an appropriate loss function, attaching encoders and decoders and choosing a batch size. Here is what it looks like.

Train LeNet from scratch:
In[5]
Click for copyable input
Click for copyable input
Out[5]

However, this is not the end of the tutorial.

To give an overview of the fundamentals of deep learning in the Wolfram Language, we will now do this the hard way, by building LeNet out of its component layers, picking a loss function, defining a training network, attaching encoders and decoders, and finally training and evaluating the network. Understanding the general principles behind this particular task will put you well on the way to wielding the Wolfram Language to tackle sophisticated learning tasks easily and efficiently.

Layers

The simplest building blocks of neural nets are layers, which you can think of as simple functions that transform tensors (arrays of numbers).

Have a look at the pre-trained LeNet model again (we clicked the button on the display form to show the constituent layers):

In[1]
Click for copyable input
Out[1]

The net is composed of a variety of layers, such as a ConvolutionLayer, PoolingLayer, etc. Each of these layers accomplishes different tasks, in this case, tasks related to computer vision.

Take a look at the last layer of the net.

Extract the last layer of the net using NetExtract:
In[2]
Click for copyable input
Out[2]

We can see a lot of information about this layer. For example, it expects as input a vector of length 10 and returns the same. Like any layer, we can apply this layer to an input to get an output:

In[3]
Click for copyable input
Out[3]

The purpose of a SoftmaxLayer is to produce probabilities that sum to 1.

Sum up the previous output:
In[4]
Click for copyable input
Out[4]

Let us construct a new layer.

Create a LinearLayer that takes as input a vector of length 2 and produces as output a vector of length 3:
In[5]
Click for copyable input
Out[5]

Notice that in the summary box above, there is an "uninitialized" caption indicating that the net contains learnable parameters that have not yet been provided.

Apply the uninitialized layer to an input vector. This will fail:
In[6]
Click for copyable input
Out[6]

Only certain layers, for example, ConvolutionLayer and LinearLayer, have learnable parameters. Such layers always have the icon in the display form. Layers with the icon , by contrast, do not contain any learnable parameters.

We can supply the random values for the learnable parameters using NetInitialize.

Initialize the layer:
In[7]
Click for copyable input
Out[7]
Apply the initialized layer to an input vector:
In[8]
Click for copyable input
Out[8]
Obtain the weights and biases from the initialized layer:
In[9]
Click for copyable input
Out[9]
In[10]
Click for copyable input
Out[10]

So far, we have seen layers that have exactly one input. Some layers have more than one input. For example, MeanSquaredLossLayer compares two arrays, called the input and the target, and produces a single number that represents Mean[(input-target)^2].

In[11]
Click for copyable input
Out[11]

The inputs of the layer are named and must be supplied in an association when the net is applied.

Apply the layer to two inputs:
In[12]
Click for copyable input
Out[12]

While some layers introduce functionality that is unique to the neural net framework, others mirror the functionality of existing Wolfram Language symbols. For example, FlattenLayer behaves similarly to Flatten, DotLayer behaves similarly to Dot, etc.

The full list of available layers is:

In[13]
Click for copyable input

More Properties of Layers (Advanced)

This section summarizes a few key properties of neural net layers in the Wolfram Language.

Net Encoders

Fundamentally, because they must be differentiable, neural net layers operate on numeric tensors. However, we often want to train and use nets on other data, such as images, audio, text, etc. To do this, we can use a NetEncoder to translate this data to numeric tensors.

To translate the images of digits in the MNIST dataset, we can use an "Image" encoder. Let us first look at some simple examples of encoders in action.

Create an image NetEncoder that produces a 1×12×12 tensor:
In[1]
Click for copyable input
Out[1]
Apply the image NetEncoder to an image:
In[2]
Click for copyable input
Out[2]

The image encoder conforms the image to have the specified colorspace, dimensions, etc. before it is converted to a tensor.

Apply the image NetEncoder on a large color image:
In[3]
Click for copyable input
Out[3]
In[4]
Click for copyable input
Out[4]

While encoders can be used independently from nets, as we have done, it is more common to attach the encoder to a layer. This can be done when creating the layer or afterward. Here is an example of creating a layer with an attached encoder.

Attach an image NetEncoder to a PoolingLayer via the "Input" option:
In[5]
Click for copyable input
Out[5]
Apply the PoolingLayer directly to an image, which will use the image NetEncoder to translate the image to a tensor for PoolingLayer to operate on:
In[6]
Click for copyable input
Out[6]
Convert the output back to an image:
In[7]
Click for copyable input
Out[7]

The actual images in MNIST are grayscale images of size 28×28. Let us create our final image encoder now. Later, we will attach this NetEncoder when we construct LeNet from scratch.

Create an "Image" encoder for MNIST:
In[8]
Click for copyable input
Out[8]
The dimensions of this encoder match the dimensions of the images in the MNIST dataset:
In[9]
Click for copyable input
Out[9]
In[10]
Click for copyable input
Out[10]

Net Decoders

The output of a neural net is often a prediction. For a regression problem, this prediction is typically a point estimate, which means it is a single number representing the value the net thinks is most likely for the task. Such outputs do not typically need to be decoded.

For classification problems, however, the output of the net is typically a vector whose components represent the probability of each class. For example, a net that classifies images of foods as "hot dog", "pizza", or "salad" produces a vector with three components that sum to one, representing the probabilities of those three classes.

For these kinds of probability vector outputs, we typically care about the most likely class rather than the raw probabilities. To determine this, we have to know how classes are associated with particular vector components.

There are also other properties we could also compute from a probability vector, such as the top n probabilities (if we have many classes), the probability of a specific class, or a measure of the uncertainty of the prediction.

To make these kinds of queries more convenient, a "Class" NetDecoder can be used to store the mapping between vector components and classes and thus automatically interpret the output of the net. Other types of NetDecoder are also possible, for converting the output to an Image, a Boolean value, etc., though we do not discuss them in detail in this tutorial.

For the MNIST task, the 10 classes we will use are the digits 0 through 9. Let us create an appropriate decoder.

Create a "Class" NetDecoder to interpret a vector of probabilities:
In[1]
Click for copyable input
Out[1]

By default, the decoder will decode a probability vector as a most likely class (if this looks confusing, recall that the first class is actually 0).

Apply the decoder to a probability vector:
In[2]
Click for copyable input
Out[2]

We can also compute other properties, by supplying a named property as the second argument when applying the net.

Obtain a list of the most likely classes and their probabilities:
In[3]
Click for copyable input
Out[3]
Obtain the probability of a specific class:
In[4]
Click for copyable input
Out[4]
Obtain the full list of probabilities as an association:
In[5]
Click for copyable input
Out[5]
Obtain a measure of the uncertainty of the prediction:
In[6]
Click for copyable input
Out[6]

As with NetEncoder, we can attach a NetDecoder to the output of a layer. Here is a more streamlined example of the PoolingLayer we showed before, in which an "Image" NetEncoder is used to interpret the input to the layer, and an "Image" NetEncoder is used to convert the final output of the layer back to an image.

Attach both a NetEncoder and a NetDecoder to a layer:
In[7]
Click for copyable input
Out[7]
Applying the layer to an image will produce an image:
In[8]
Click for copyable input
Out[8]

Containers

Single neural net layers are generally not useful by themselves. We usually need to combine multiple layers together to do something interesting.

The simplest way to combine layers is chain them one after another, where the output of the first layer is used as the input for the next layer, and so on. We can use the NetChain container to connect layers in this way, but when more complex forms of connectivity are required, the NetGraph container should be used instead.

For now, let us create a simple chain.

Create a simple NetChain computing Cos[Sin[x]]:
In[1]
Click for copyable input
Out[1]
Apply the NetChain to an input and compare the result to applying Sin followed by Cos to the same input:
In[2]
Click for copyable input
Out[2]
In[3]
Click for copyable input
Out[3]
Create a simple NetChain that consists of an ElementwiseLayer that applies the Ramp function, followed by a SoftmaxLayer:
In[4]
Click for copyable input
Out[4]
Apply the NetChain to an input:
In[5]
Click for copyable input
Out[5]
The result is the same as applying the individual layers in succession:
In[6]
Click for copyable input
Out[6]
In[7]
Click for copyable input
Out[7]

An important property of containers is that they act as layers and can even be used as layers within other containers. Let us see an example of that.

Nest the previous NetChain inside another NetChain:
In[8]
Click for copyable input
Out[8]
Apply the chain to an input:
In[9]
Click for copyable input
Out[9]
Flatten the nested chains together:
In[10]
Click for copyable input
Out[10]

Previously, we saw the LeNet model, which was a more complex chain. Here is the code that constructs an uninitialized copy of LeNet.

Construct LeNet from scratch, supplying the previously constructed NetEncoder and NetDecoder:
In[11]
Click for copyable input
Out[11]

Note that the layers containing learnable parameters appear in red, indicating that they require initial values before the net can be applied to an input.

As a quick exercise, we will now randomly initialize LeNet and apply it to a sample input from MNIST. The output we get is, of course, also random, but serves to illustrate that things are working properly.

Randomly initialize the learnable parameters of LeNet:
In[12]
Click for copyable input
Out[12]
Apply the initialized LeNet to an input image, producing a random classification:
In[13]
Click for copyable input
Out[13]
Obtain a sorted list of the top probabilities:
In[14]
Click for copyable input
Out[14]

Our ultimate goal, of course, is to teach this randomly initialized network to correctly classify handwritten digits.

Graphs

In order to train LeNet, we need to construct a training network that feeds individual training examples to LeNet. Each training example in the MNIST dataset consists of the combination of an input image and a corresponding target label.

Show a set of examples from MNIST:
In[15]
Click for copyable input
Out[15]

NetChain does not allow a net to take more than one input, so we need to use NetGraph to build the training network. The task of the training network is to evaluate the prediction produced by LeNet, producing a small number if the prediction is good, and a large number if it is not. This is called a loss. It is best to think of it as a sort of proxy for prediction error.

Once we have this training network, we can use the NetTrain function to gradually modify the learnable parameters in the net so that the loss decreases over time.

For different learning tasks, different ways of computing the loss must be used. For a classification task such as classifying MNIST digits, a common choice of loss is the cross entropy loss. The layer CrossEntropyLossLayer can compute this loss when given both a prediction and a true label, or target.

The prediction we will use is in the form of a vector of probabilities, where each element of the vector represents the probability of the corresponding digit 0, 1, 2, etc. The target label is the index of the correct class (1 for digit 0, 2 for digit 1, etc.).

Here is a simple example of a CrossEntropyLossLayer being used to score a prediction.

Create a CrossEntropyLossLayer that compares an input prediction vector of length 5 with a target label:
In[16]
Click for copyable input
Out[16]
Apply the loss layer to a prediction vector in which the first component has a large probability:
In[17]
Click for copyable input
Out[17]
If the target is 1, the loss will be low, as the prediction has assigned high probability to the class 1:
In[18]
Click for copyable input
Out[18]
If the target is 5, the loss will be high, as the prediction has assigned low probability to class 5:
In[19]
Click for copyable input
Out[19]

The training net we will construct is simple: it applies LeNet to the image of the digit to produce a prediction, and then it compares this prediction with the target class.

Construct a NetGraph by supplying a list of layers and connections. Inputs to the graph are connected to layers using the syntax NetPort["input"]destination, and the inputs to the loss layer are connected via source->NetPort["loss","input"]:
In[20]
Click for copyable input
Out[20]

To test things out, let us use our training net, which contains a randomly initialized prediction LeNet, on a set of inputs and corresponding targets.

Feed a list of images to the "Input" port and a list of indices to the "Target" port:
In[21]
Click for copyable input
Out[21]

These losses summarize how well LeNet did at predicting the targets when given the images. Because LeNet was randomly initialized, we expect it to be no better than chance (on average). During training, the learnable parameters in LeNet are gradually adjusted to bring the average loss down.

How is this accomplished? The key idea is via gradients. These are adjustments to the learnable parameters that can be calculated through a process called backpropagation. These adjustments are derived so as to slightly reduce the average loss of the training net on a specific batch of examples.

By repeatedly selecting a batch of examples at random, calculating the adjustment, and applying it to the learnable parameters, the net gradually improves at the desired task.

This process is handled by NetTrain, which offers many ways to adjust and fine-tune the training process. But we can get some insight into the mechanism involved by calculating one of these gradients directly, using NetPortGradient.

Request the gradient produced by a specific input at the biases of the first convolution layer of LeNet:
In[22]
Click for copyable input
Out[22]

Now that we have the gradient, we can actually modify the corresponding learnable parameter using this gradient, which reduces the loss on this example.

Modify the corresponding value of the training net by obtaining the bias value, adjusting it slightly using the gradient, and replacing it in the original net:
In[23]
Click for copyable input
Out[23]
Compare the loss on the modified net with the original net. The loss has decreased:
In[24]
Click for copyable input
Out[24]
In[25]
Click for copyable input
Out[25]

Training

We are now ready to train our network with NetTrain.

Normally, NetTrain performs the construction of the training network automatically. For simple networks with one input and one output, it handles training data of the form {in->out,} by feeding the ini to the net and then comparing the output of the net with the outi via a suitable loss function.

But because we have explicitly constructed a training network, we must provide the training data in the form <|"port"->list,|>, where we explicitly feed the input ports of the training net with lists of data. So we must first convert the data we have, which is in the form of rules in->out.

As one additional complication, we must also account for the fact that the training data contains labels that are integers 0 to 9, whereas the "Target" input of our training net is expecting an index in the range 1 to 10. We could use a "Class" NetEncoder to convert them (which NetTrain would normally automate), but instead we will exploit the fact that we can just add 1 to accomplish the same thing.

Convert the training and test data into association form, using Keys and Values to obtain the images and the labels from the lists of rules in the training and test data:
In[1]
Click for copyable input
Show a small sample of the training association:
In[2]
Click for copyable input
Out[2]

Let us now perform the training using NetTrain. Notice several things about this net example:

Evaluation

Now that we have trained our net, we can derive a lot more information about it. For example, we can obtain the overall accuracy on another dataset. We can also get useful summaries like the confusion matrix, which summarizes how the net misclassifies examples. And we can compare the performance of the net, which uses deep learning, to that of other common machine learning techniques.

Let us jump in by building a ClassifierMeasurementsObject that measures various properties of the net's classification behavior on a dataset. It is important to use a dataset we did not train on, so we use the test set that comes built into the MNIST dataset.

Use ClassifierMeasurements to obtain a ClassifierMeasurementsObject on the test set:
In[1]
Click for copyable input
Out[1]

Now that we have the ClassifierMeasurementsObject, we can query for all sorts of properties efficiently.

Obtain the overall accuracy:
In[2]
Click for copyable input
Out[2]
Obtain a confusion matrix:
In[3]
Click for copyable input
Out[3]
Obtain a list of examples for which the loss was the highest:
In[4]
Click for copyable input
Out[4]
Obtain a list of examples for which the net was least certain of the class:
In[5]
Click for copyable input
Out[5]
Obtain a list of scores of how well various classes are classified:
In[6]
Click for copyable input
Out[6]
Obtain a list of which misclassifications are most common:
In[7]
Click for copyable input
Out[7]
Obtain the full set of available properties:
In[8]
Click for copyable input
Out[8]

The easiest way to compare the net to other methods is to use Classify, which automatically applies a range of common methods and chooses the best.

Use Classify to automatically pick an effective machine learning model:
In[9]
Click for copyable input
Click for copyable input
Out[9]
Apply the classifier to an input:
In[10]
Click for copyable input
Out[10]

The neural net outperforms the model selected by Classify on the test set.

Compare the classifier with the trained net:
In[11]
Click for copyable input
Out[11]
In[12]
Click for copyable input
Out[12]

Related Tutorials