NetTrain

NetTrain[net,{input1,input2,}{output1,}]

trains the specified neural net by giving the inputi as input and minimizing the discrepancy between the outputi and the actual output of the net, using an automatically chosen loss function.

NetTrain[net,{input1output1,input2output2,}]

also trains the specified neural net based on the examples inputi and the outputs outputi given.

NetTrain[net,port1{data11,data12,},port2{},]

trains the specified net by supplying training data at the specified ports, with a loss function defined by the "Loss" port in the net.

NetTrain[net,{port1,port2,,port1,,}]

trains the net using a list of instances of training data.

NetTrain[net,f]

calls the function f during training to produce batches of training data.

NetTrain[net,data,loss]

uses loss as a loss function in comparing actual and requested outputs from the net.

NetTrain[net,data,loss,prop]

gives data associated with a specific property prop of the training session.

Details and Options

  • Any input ports of the net whose shapes are not fixed will be inferred from the form of training data, and NetEncoder objects will be attached if the training data contains Image objects, etc.
  • Individual training data inputs can be scalars, vectors, numeric tensors. If the net has appropriate NetEncoder objects attached, the inputs can include Image objects, strings, etc.
  • If the loss is not given explicitly, a loss function will be chosen automatically based on the final layer or layers in the net.
  • When specifying target outputs using the specification porti->{datai1,datai2,}, any provided custom loss layers should take the porti as inputs in order to compute the loss.
  • When loss layers are automatically attached by NetTrain to output ports, their "Target" ports will be taken from the training data using the same name as the original output port.
  • When giving training data using the specification inputs->outputs, the network should not already contain any loss layers and should have precisely one input and one output port.
  • The following options are supported:
  • BatchSizeAutomatichow many examples to process in a batch
    LearningRateMultipliersAutomaticset relative learning rates within the net
    MaxTrainingRoundsAutomatichow many times to traverse the training data
    MethodAutomaticthe training method to use
    TargetDevice"CPU"the target device on which to perform training
    TrainingProgressCheckpointingNonehow to periodically save partially trained nets
    TrainingProgressFunctionNonefunction to call periodically during training
    TrainingProgressReportingAutomatichow to report progress during training
    ValidationSetNonethe set of data on which to evaluate the model during training
  • The following settings for MaxTrainingRounds can be given:
  • Automatictrain for approximately 20 seconds (default)
    ntrain for the given number of rounds, where a round is a traversal of the entire dataset
    Quantity[r,"unit"]train for a specific number of seconds, minutes or hours
  • MaxTrainingRounds->Automatic will never train for less than 10 rounds or more than 10,000 rounds.
  • The following settings for ValidationSet can be given:
  • Noneuse only the existing training set to estimate loss (default)
    datavalidation set in the same form as training data
    Scaled[frac]reserve a specified fraction of the training set for validation
    {spec,"Interval"int}specify the interval at which to calculate validation loss
  • For ValidationSet->{spec,"Interval"->int}, the interval can be an integer n, indicating that validation loss should be calculated every n training rounds, or a Quantity in units of seconds, minutes or hours.
  • If a validation set is specified, NetTrain will return the net that produced the lowest validation loss during training with respect to this set.
  • In NetTrain[net,f], the function f is applied to <|"BatchSize"n,"Round"r|> to generate each batch of training data in the form {input1->output1,} or <|"port1"->data,|>.
  • In NetTrain[net,data,loss], the following forms can be used for loss:
  • Automaticautomatically attach loss layers to all outputs, or use existing loss outputs (default)
    "port"interpret the given port as a loss
    losslayerattach a preconfigured loss layer to the net's output
    "port"losslayercompute loss from one output of a multi-output net
    {lspec1,lspec2,}minimize several losses simultaneously
    {,lpeciScaled[r],}scale individual losses by a factor r
  • Layers that can be used as preconfigured loss layers include:
  • MeanAbsoluteLossLayer[]mean of distance between output and target
    MeanSquaredLossLayer[]mean of squared distance between output and target
    CrossEntropyLossLayer[form]distance between output class probabilities and target class
    ContrastiveLossLayer[]how well output is maximized or minimized conditioned on target
    NetGraph[], NetChain[], etc.any network with an "Input" and optional "Target" port
  • When specifying a single preconfigured loss layer with NetTrain[net,data,losslayer], the net must have exactly one output port.
  • When specifying a loss for a specific port with NetTrain[net,data,"port"lspec], the loss specification lspec can be a preconfigured loss layer or Automatic.
  • When a loss layer is chosen automatically for a port, the loss layer to use is based on the layer within the net whose output is connected to the port, as follows:
  • SoftmaxLayer[]use CrossEntropyLossLayer["Index"]
    ElementwiseLayer[LogisticSigmoid]use CrossEntropyLossLayer["Binary"]
    NetPairEmbeddingOperator[]use ContrastiveLossLayer[]
    other non-loss layersuse MeanSquaredLossLayer[]
    loss layers use unchanged
  • If the attached loss layer has one input port ("Input"), it will be attached to the output of the net, and the keys of the training data should supply ports to feed the input of the net only (first figure).
  • If the attached loss layer has two input ports ("Input" and "Target"), the input will be attached to the output of the net and the target will be fed from the training data, using the name of the output port of the net (second figure). Typically, this name is "Output", and so the usual case is training data of the form <|"Input"->{in1,in2,},"Output"->{out1,out2,}|>, which is often written as {in1,in2,}->{out1,out2,} or {in1->out1,in2->out2,}.
  • If multiple such layers are connected, there should be one port in the training data to feed the target of each layer (third figure).
  • In NetTrain[net,data,loss,prop], the property prop can be any of the following:
  • "TrainedNet"the trained network (default)
    "BatchLossList"a list of the mean loss after each batch update
    "RoundLossList"a list of the mean loss across each round
    "ValidationLossList"a list of the mean losses obtained on the ValidationSet
    "ValidationLossSeries"a TimeSeries associating total batches trained with validation loss
    "LossEvolutionPlot"a plot of the loss evolution during training
    "RMSGradientLists"RMS of gradients of each learned array during training
    "RMSGradientEvolutionPlot"log-log plot of the evolution all RMS gradients
    "RMSWeightLists"RMS of weights in each learned array during training
    "RMSWeightEvolutionPlot"log-log plot of the evolution of RMS weights
    "LastRoundLoss"the mean loss achieved on the final round
    "LastValidationLoss"the final mean loss achieved on the ValidationSet
    "LowestValidationLoss"the lowest mean loss achieved on the ValidationSet
    "TotalTrainingTime"the total time spent training, in seconds
    "MeanBatchesPerSecond"the mean number of batches processed per second
    "MeanInputsPerSecond"the mean number of inputs processed per second
  • NetTrain[net,data,loss,{prop1,prop2,}] returns a list of the results for the propi.
  • Possible settings for Method include:
  • "ADAM"stochastic gradient descent using an adaptive learning rate that is invariant to diagonal rescaling of the gradients
    "RMSProp"stochastic gradient descent using an adaptive learning rate derived from exponentially smoothed average of gradient magnitude
    "SGD"ordinary stochastic gradient descent with momentum
  • Suboptions for specific methods can be specified using Method{"method",opt1val1,}. The following suboptions are supported for all methods:
  • "LearningRate"Automaticthe size of steps to take in the direction of the derivative
    "LearningRateSchedule"Automatichow to scale the learning rate as training progresses
    "L2Regularization"Nonethe global loss associated with the L2 norm of all learned tensors
    "GradientClipping"Nonethe magnitude above which gradients should be clipped
    "WeightClipping"Nonethe magnitude above which weights should be clipped
  • With "LearningRateSchedule"->f, the learning rate for a given batch will be calculated as initial*f[batch,total], where batch is the current batch number, total is the total number of batches that will be visited during training, and initial is the initial learning rate specified using "LearningRate". The value returned by f should be a number between 0 and 1.
  • The suboptions "L2Regularization", "GradientClipping" and "WeightClipping" can be given in the following forms:
  • ruse the value r for all weights in the net
    {lspec1r1,lspec2r2,}use the value ri for the specific part lspeci of the net
  • The rules lspeciri are given in the same form as for LearningRateMultipliers.
  • For the method "SGD", the following additional suboptions are supported:
  • "Momentum"0.93how much to preserve the previous step when updating the derivative
  • For the method "ADAM", the following additional suboptions are supported:
  • "Beta1"0.9exponential decay rate for the first moment estimate
    "Beta2"0.999exponential decay rate for the second moment estimate
  • For the method "RMSProp", the following additional suboptions are supported:
  • "Beta"0.95exponential decay rate for the moving average of the gradient magnitude
    "Momentum"0.9momentum term
  • If a net already contains initialized or previously trained weights, these will be not be reinitialized by NetTrain before training is performed.

Examples

open allclose all

Basic Examples  (4)

Define a single-layer neural network that takes in scalar numeric values and produces scalar numeric values, and train this network on input output pairs:

In[1]:=
Click for copyable input
Out[1]=

Predict the value of a new input:

In[2]:=
Click for copyable input
Out[2]=

Make several predictions at once:

In[3]:=
Click for copyable input
Out[3]=

The prediction is a linear function of the input:

In[4]:=
Click for copyable input
Out[4]=

Train a network that takes in scalar numeric values and produces either True or False:

In[1]:=
Click for copyable input
Out[1]=

Predict whether a new input is True or False:

In[2]:=
Click for copyable input
Out[2]=

Obtain the probability of the input being True by disabling the NetDecoder:

In[3]:=
Click for copyable input
Out[3]=

Make several predictions at once:

In[4]:=
Click for copyable input
Out[4]=

Plot the probability as a function of the input:

In[5]:=
Click for copyable input
Out[5]=

Train a three-layer network to learn a function that maps (x,y) to x*y:

In[1]:=
Click for copyable input
Out[1]=

Evaluate the network on an input:

In[2]:=
Click for copyable input
Out[2]=

Plot the prediction of the net as a function of x and y:

In[3]:=
Click for copyable input
Out[3]=

Train a recurrent network that predicts the maximum value seen in the input sequence:

In[1]:=
Click for copyable input
Out[1]=

Evaluate the network on an input:

In[2]:=
Click for copyable input
Out[2]=

Plot the output of the network as one element of a sequence is varied:

In[3]:=
Click for copyable input
Out[3]=

Scope  (14)

Options  (16)

Applications  (20)

Properties & Relations  (2)

Interactive Examples  (1)

Neat Examples  (1)

See Also

NetModel  NetInitialize  NetChain  NetGraph  Classify  Predict  NMinimize

Introduced in 2016
(11.0)
| Updated in 2017
(11.1)