Computer Vision
MNIST Digit Classification
Train a digit recognizer on the MNIST database of handwritten digits using a convolutional neural network.
Train the network for three training rounds. NetTrain will automatically attach a CrossEntropyLossLayer using the same classes that were provided to the decoder:
Use NetMeasurements to test the classification performance of the trained net on the test set:
CIFAR-10 Object Classification
Using the CIFAR-10 database of labeled images, train a convolutional net to predict the class of each object. First, obtain the training data:
From a random sample, select the images for which the net produces highest and lowest entropy predictions. High-entropy inputs can be interpreted as those for which the net is most uncertain about the correct class:
Create a training set by sampling pairs of images and associating them with True if their labels are different and False if their labels are the same:
Apply the network to a list of pairs of digits to compute their distances under the embedding. Digits with the same label have small distances:
Compute their embeddings and plot them. Digits with the same label are clustered under the learned embedding:
Create a new image with the content of one image and in the style of another image. This implementation follows the method described in Gatys et al., "A Neural Algorithm of Artistic Style".
To create the image that is a mix of both of these images, start by obtaining a pre-trained image classification network:
There are three loss functions used. The first loss ensures that the content of the synthesized image is similar to that of the content image:
The second loss ensures that the style of the synthesized image is similar to that of the style image. Style similarity is defined as the mean-squared difference between the Gram matrices of the input and target:
The third loss ensures that the magnitude of intensity changes across adjacent pixels in the synthesized image is small. This helps the synthesized image look more natural:
Define a function that creates the final training net for any content and style image. This function also creates a random initial image:
Define a NetDecoder for visualizing the predicted image:
The training data consists of features extracted from the content and style images. Define a feature extraction function:
Create the training net whose input dimensions correspond to the content and style image dimensions:
When training, the three losses are weighted differently to set the relative importance of the content and style. These values might need to be changed with different content and style images. Create a loss specification that defines the final loss as a combination of the three losses:
Optimize the image using NetTrain. LearningRateMultipliers are used to freeze all parameters in the net except for the NetArrayLayer. The training is best done on a GPU, as it will take up to an hour to get good results with CPU training. The training can be stopped at any time via Evaluation ▶ Abort Evaluation:
Extract the final image from the NetArrayLayer of the trained net:
Semantic Segmentation on a Toy Text Dataset
Train a net that classifies every pixel in an image of a word as being part of the background or part of one of the letters a through z.
First, generate training and test data, which consists of images of words and the corresponding "mask" integer matrices that label each pixel:
Define a convolutional net that takes an image and returns a probability vector for every pixel in the image: