TrainingProgressCheckpointing

is an option for NetTrain that specifies how to save copies of the net during training.

Details

With the default value of TrainingProgressCheckpointing->None, no checkpointing is done.
For TrainingProgressCheckpointing->spec, the following specifications can be used:

	{"File","path/file.wltnet"}	save the net to a file, overwriting any previous version
	{"Directory","path"}	save the net as a uniquely named file in directory path
	{…,subopts}	include additional suboptions

All nets are saved in "WLNet" format.
If checkpointing is enabled, it will be done by default once every training round.
The suboption "Interval"->Quantity[n,"unit"] specifies the interval at which to do checkpointing. Possible forms for "unit" include:
"Rounds" net training rounds

"Batches" training data batches

"Seconds","Minutes","Hours" absolute time
The suboption "MinimumInterval"->n specifies that checkpoints should not be performed more frequently than once every n seconds. If unspecified, there is no limit.
Multiple files or directories can be saved to with TrainingProgressCheckpointing->{spec₁,spec₂,…}.
With TrainingProgressCheckpointing->{"Directory","dir"}, individual files are named "starttime_counter_round_batch_loss.wlnet", where starttime is generated by DateString["ISODateTime"] at the start of training, counter is a value incremented after each call to NetTrain, round is the current round, batch is the cumulative batch number and loss is the most recent round loss. If a ValidationSet was specified, the most recent validation loss will also be included in the file name.
The list of checkpointing files created, if any, can be retrieved from a NetTrainResultsObject via the property "CheckpointingFiles".

Examples

Take periodic checkpoints of a convolutional network during training on the MNIST dataset:

List all created checkpoints:

Import the final checkpoint:

Top