NetTrain
NetTrain[net,{input_{1},input_{2},…}{output_{1},…}]
trains the specified neural net by giving the input_{i} as input and minimizing the discrepancy between the output_{i} and the actual output of the net, using an automatically chosen loss function.
NetTrain[net,{input_{1}output_{1},input_{2}output_{2},…}]
also trains the specified neural net based on the examples input_{i} and the outputs output_{i} given.
NetTrain[net,port_{1}{data_{11},data_{12},…},port_{2}{…},…]
trains the specified net by supplying training data at the specified ports, with a loss function defined by the "Loss" port in the net.
NetTrain[net,{port_{1}…,port_{2}…,…,port_{1}…,…,…}]
trains the net using a list of instances of training data.
NetTrain[net,f]
calls the function f during training to produce batches of training data.
NetTrain[net,data,"prop"]
gives data associated with a specific property prop of the training session.
gives a NetTrainResultsObject[…] that summarizes information about the training session.
Details and Options
 Any input ports of the net whose shapes are not fixed will be inferred from the form of training data, and NetEncoder objects will be attached if the training data contains Image objects, etc.
 Individual training data inputs can be scalars, vectors, numeric tensors. If the net has appropriate NetEncoder objects attached, the inputs can include Image objects, strings, etc.
 If the loss is not given explicitly, a loss function will be chosen automatically based on the final layer or layers in the net.
 When specifying target outputs using the specification port_{i}>{data_{i1},data_{i2},…}, any provided custom loss layers should take the port_{i} as inputs in order to compute the loss.
 When loss layers are automatically attached by NetTrain to output ports, their "Target" ports will be taken from the training data using the same name as the original output port.
 When giving training data using the specification inputs>outputs, the network should not already contain any loss layers and should have precisely one input and one output port.
 The following options are supported:

BatchSize Automatic how many examples to process in a batch LearningRateMultipliers Automatic set relative learning rates within the net LossFunction Automatic the loss function for assessing outputs MaxTrainingRounds Automatic how many times to traverse the training data Method Automatic the training method to use TargetDevice "CPU" the target device on which to perform training TimeGoal Automatic number of seconds to train for TrainingProgressCheckpointing None how to periodically save partially trained nets TrainingProgressFunction None function to call periodically during training TrainingProgressReporting Automatic how to report progress during training ValidationSet None the set of data on which to evaluate the model during training  With the default setting of MaxTrainingRounds>Automatic, training will occur for approximately 20 seconds, but never for more than 10,000 rounds.
 With the setting of MaxTrainingRounds>n, training will occur for n rounds, where a round is defined to be a traversal of the entire training dataset.
 The following settings for ValidationSet can be given:

None use only the existing training set to estimate loss (default) data validation set in the same form as training data Scaled[frac] reserve a specified fraction of the training set for validation {spec,"Interval"int} specify the interval at which to calculate validation loss  For ValidationSet>{spec,"Interval">int}, the interval can be an integer n, indicating that validation loss should be calculated every n training rounds, or a Quantity in units of seconds, minutes or hours.
 If a validation set is specified, NetTrain will return the net that produced the lowest validation loss during training with respect to this set.
 In NetTrain[net,f], the function f is applied to <"BatchSize"n,"Round"r > to generate each batch of training data in the form {input_{1}>output_{1},…} or <"port_{1}">data,… >.
 NetTrain[net,{f,"RoundLength">n}] can be used to specify that f should be applied enough times during a training round to produce approximately n examples. The default is to apply f once per training round.
 NetTrain[net,…,ValidationSet>{g,"RoundLength">n}] can be used to specify that the function g should be applied in an equivalent manner to NetTrain[net,{f,"RoundLength">n}] in order to produce approximately n examples for the purposes of computing validation loss and accuracy.
 In NetTrain[net,data,prop], the property prop can be any of the following:

"TrainedNet" the optimal trained network found (default) "FinalNet" the final network generated in the training process "TrainingNet" the network as prepared for training "BatchLossList" a list of the mean loss after each batch update "BatchErrorRateList" a list of the error rate after each batch update "RoundLossList" a list of the mean loss across each round "RoundErrorRateList" a list of the mean error rate across each round "ValidationLossList" a list of the mean losses obtained on the ValidationSet "ValidationLossSeries" a TimeSeries associating total batches trained with validation loss "ValidationErrorRateList" a list of the error rates obtained on the ValidationSet "ValidationErrorRateSeries" a TimeSeries associating total batches trained with error rate "LossEvolutionPlot" a plot of the evolution of the mean loss during training "ErrorRateEvolutionPlot" a plot of the evolution of the mean error rate during training "RMSGradientsHistories" RMS of gradients of each learned array during training "RMSGradientsEvolutionPlot" loglog plot of the evolution all RMS gradients "RMSWeightsHistories" RMS of weights in each learned array during training "RMSWeightsEvolutionPlot" loglog plot of the evolution of RMS weights "FinalRoundLoss" the mean loss achieved on the final round "FinalRoundErrorRate" the mean error rate achieved on the final round "FinalValidationLoss" the final mean loss achieved on the ValidationSet "FinalValidationErrorRate" the final mean error rate achieved on the ValidationSet "LowestValidationLoss" the lowest mean loss achieved on the ValidationSet "LowestValidationErrorRate" the lowest mean error rate achieved on the ValidationSet "BatchSize" the effective value of BatchSize "TotalRounds" the total number of rounds of training performed "TotalBatches" the total number of batches encountered during training "TotalInputs" the total number of individual inputs trained on "TotalTrainingTime" the total time spent training, in seconds "MeanBatchesPerSecond" the mean number of batches processed per second "MeanInputsPerSecond" the mean number of inputs processed per second "InitialLearningRate" the learning rate at the start of training "FinalLearningRate" the learning rate at the end of training "WeightsLearningRateMultipliers" an association of the learning rate multiplier used for each weight "OptimizationMethod" the name of the optimization method used "BatchWeightsHistories" an association of values taken for every weight, sampled every batch update "RoundWeightsHistories" an association of values taken for every weight, sampled every round "BatchGradientsHistories" an association of gradients for every weight, sampled every batch "BatchWeightsVectorHistories" a list whose elements are vectors formed by flattening all weights together, sampled every batch update "RoundWeightsVectorHistories" a list whose elements are vectors formed by flattening all weights together, sampled every round "BatchGradientsVectorHistories" a list whose elements are vectors formed by flattening all weight gradients together, sampled every batch update "FinalWeights" association of final value of all weights "FinalWeightsVector" a vector formed by flattening the final value of all weights together "FinalExampleLosses" loss associated with each example after training "FinalExampleErrorRates" error rate associated with each example after training "ExampleLossHistories" history of losses taken by each example during training "ExampleErrorRateHistories" history of error rates taken by each example during training "NetTrainInputForm" an expression representing the originating call to NetTrain "ResultsObject" a NetTrainResultsObject[…] containing a majority of the available properties in this table "Properties" the full list of available properties  NetTrain[net,data,{prop_{1},prop_{2},…}] returns a list of the results for the prop_{i}.
 NetTrain[net,data,All] returns a NetTrainResultsObject[…] that contains values for all properties that do not require significant additional computation or memory.
 Possible settings for Method include:

"ADAM" stochastic gradient descent using an adaptive learning rate that is invariant to diagonal rescaling of the gradients "RMSProp" stochastic gradient descent using an adaptive learning rate derived from exponentially smoothed average of gradient magnitude "SGD" ordinary stochastic gradient descent with momentum  Suboptions for specific methods can be specified using Method{"method",opt_{1}val_{1},…}. The following suboptions are supported for all methods:

"LearningRate" Automatic the size of steps to take in the direction of the derivative "LearningRateSchedule" Automatic how to scale the learning rate as training progresses "L2Regularization" None the global loss associated with the L2 norm of all learned tensors "GradientClipping" None the magnitude above which gradients should be clipped "WeightClipping" None the magnitude above which weights should be clipped  With "LearningRateSchedule">f, the learning rate for a given batch will be calculated as initial*f[batch,total], where batch is the current batch number, total is the total number of batches that will be visited during training, and initial is the initial learning rate specified using "LearningRate". The value returned by f should be a number between 0 and 1.
 The suboptions "L2Regularization", "GradientClipping" and "WeightClipping" can be given in the following forms:

r use the value r for all weights in the net {lspec_{1}r_{1},lspec_{2}r_{2},…} use the value r_{i} for the specific part lspec_{i} of the net  The rules lspec_{i}r_{i} are given in the same form as for LearningRateMultipliers.
 For the method "SGD", the following additional suboptions are supported:

"Momentum" 0.93 how much to preserve the previous step when updating the derivative  For the method "ADAM", the following additional suboptions are supported:

"Beta1" 0.9 exponential decay rate for the first moment estimate "Beta2" 0.999 exponential decay rate for the second moment estimate  For the method "RMSProp", the following additional suboptions are supported:

"Beta" 0.95 exponential decay rate for the moving average of the gradient magnitude "Momentum" 0.9 momentum term  If a net already contains initialized or previously trained weights, these will be not be reinitialized by NetTrain before training is performed.
Examples
open allclose allBasic Examples (5)
Train a singlelayer linear net on input output pairs:
Predict the value of a new input:
Make several predictions at once:
The prediction is a linear function of the input:
Train a perceptron that classifies inputs as either True or False:
Predict whether a new input is True or False:
Obtain the probability of the input being True by disabling the NetDecoder:
Make several predictions at once:
Plot the probability as a function of the input:
Train a threelayer network to learn a 2D function:
Evaluate the network on an input:
Plot the prediction of the net as a function of x and y:
Train a recurrent network that predicts the maximum value seen in the input sequence:
Evaluate the network on an input:
Plot the output of the network as one element of a sequence is varied:
Train a net and produce a results object that summarizes the training process:
Scope (12)
Options (22)
Properties & Relations (2)
Interactive Examples (1)
Neat Examples (1)
See Also
NetTrainResultsObject NetModel NetInitialize NetChain NetGraph Classify Predict NMinimize