Wolfram Language & System 11.0 (2016)|Legacy Documentation

This is documentation for an earlier version of the Wolfram Language.View current documentation (Version 11.2)

NetTrain

NetTrain[net,{input1,input2,}{output1,}]
trains the specified neural net by giving the inputi as input and minimizing the discrepancy between the outputi and the actual output of the net, using an automatically chosen loss function.

NetTrain[net,{input1output1,input2output2,}]
also trains the specified neural net based on the examples inputi and the outputs outputi given.

NetTrain[net,port1{data11,data12,},port2{},]
trains the specified net by supplying training data at the specified ports, with a loss function defined by the "Loss" port in the net.

NetTrain[net,{port1,port2,,port1,,}]
trains the net using a list of instances of training data.

NetTrain[net,examples,loss]
uses loss as a loss function in comparing actual and requested outputs from the net.

Details and OptionsDetails and Options

  • Individual inputs can be scalars, vectors, numeric tensors, Image objects, or strings.
  • If the loss is not given explicitly, a loss function will be chosen automatically based on the final layer or layers in the net.
  • When specifying target outputs using the specification porti->{datai1,datai2,}, any provided custom loss layers should take the porti as inputs in order to compute the loss.
  • When loss layers are automatically attached by NetTrain to output ports, their "Target" ports will be taken from the training data using the same name as the original output port.
  • When giving training data using the specification inputs->outputs, the network should not already contain any loss layers and should have precisely one input and one output port.
  • The following options are supported:
  • BatchSizeAutomatichow many examples to process in a batch
    MaxTrainingRoundsAutomatichow many times to traverse the training data
    MethodAutomaticthe training method to use
    TargetDevice"CPU"the target on which to perform training
    ValidationSetNonethe set of data on which to evaluate the model during training
  • The following settings for ValidationSet can be given:
  • datavalidation set in the same form as training data
    Noneuse existing training set to estimate loss
  • With ValidationSet->data, NetTrain will return the network that produced the lowest validation loss of any network during training.
  • In NetTrain[net,data,loss], the following forms can be used for loss:
  • Automaticautomatically attach loss layers to all outputs (default)
    "port"interpret the given port as a loss
    {"port1","port2",}interpret multiple ports as losses, minimizing them simultaneously
    layeruse a preconfigured net layer as a loss, attaching it to the output of the net
  • Layers that can be used as preconfigured loss layers include MeanAbsoluteLossLayer, MeanSquaredLossLayer, and CrossEntropyLossLayer.
  • Possible settings for Method include:
  • "StochasticGradientDescent"minimize loss by taking the gradient for randomized batches of input data
    "ADAM"stochastic gradient descent using an adaptive learning rate that is invariant to diagonal rescaling of the gradients
  • Suboptions for specific methods can be specified using Method{"method",opt1val1,}. The following suboptions are supported for all methods:
  • "InitialLearningRate"0.001the size of steps to take in the direction of the derivative
    "L2Regularization"0the global loss associated with the L2 norm of all learned tensors
  • For the method "StochasticGradientDescent", the following suboptions are supported:
  • "Momentum"0.93how much to preserve the previous step when updating the derivative
    "GradientClipping"Nonethe magnitude above which the gradient should be clipped
    "LearningRateSchedule""Polynomial"how to scale the learning rate as a function of the number of iterations
  • With "LearningRateSchedule"->f, the learning rate for a given batch will be calculated as f[batch,total,initial], where batch is the current batch number, total is the total number of batches that will be visited during training, and initial is the initial learning rate.
  • For the method "ADAM", the following suboptions are supported:
  • "Beta1"0.93exponential decay rate for the first moment estimate
    "Beta2"0.999exponential decay rate for the second moment estimate

ExamplesExamplesopen allclose all

Basic Examples  (3)Basic Examples  (3)

Define a single-layer neural network that takes in scalar numeric values and produces scalar numeric values:

In[1]:=
Click for copyable input
Out[1]=

Train this network on input output pairs:

In[2]:=
Click for copyable input
Out[2]=

Predict the value of a new input:

In[3]:=
Click for copyable input
Out[3]=

Make several predictions at once:

In[4]:=
Click for copyable input
Out[4]=

The prediction is a linear function of the input:

In[5]:=
Click for copyable input
Out[5]=

Define a network that takes in scalar numeric values and produces vectors of length two that are used as probabilities for the classes True and False:

In[1]:=
Click for copyable input
Out[1]=

Train this network to predict the class from the input:

In[2]:=
Click for copyable input
Out[2]=

Predict the class of a new input:

In[3]:=
Click for copyable input
Out[3]=

Make several predictions at once:

In[4]:=
Click for copyable input
Out[4]=

Plot the probability of the class True as a function of the input:

In[5]:=
Click for copyable input
Out[5]=

Define a three-layer network that takes in vectors of length two and produces scalar numeric values:

In[1]:=
Click for copyable input
Out[1]=

Construct training data from the function that maps (x,y) to x*y:

In[2]:=
Click for copyable input

Train the network on the data:

In[3]:=
Click for copyable input
Out[3]=

Plot the prediction of the net as a function of x and y:

In[4]:=
Click for copyable input
Out[4]=
Introduced in 2016
(11.0)