TrainingProgressCheckpointing

TrainingProgressCheckpointing

is an option for NetTrain that specifies how to save copies of the net during training.

Details

  • With the default value of TrainingProgressCheckpointing->None, no checkpointing is done.
  • For TrainingProgressCheckpointing->spec, the following specifications can be used:
  • {"File","path/file.wltnet"}save the net to a file, overwriting any previous version
    {"Directory","path"}save the net as a uniquely named file in directory path
    {,subopts}include additional suboptions
  • All nets are saved in "WLNet" format.
  • If checkpointing is enabled, it will be done by default once every training round.
  • The suboption "Interval"->Quantity[n,"unit"] specifies the interval at which to do checkpointing. Possible forms for "unit" include:
  • "Rounds"net training rounds
    "Batches"training data batches
    "Seconds","Minutes","Hours"absolute time
  • The suboption "MinimumInterval"->n specifies that checkpoints should not be performed more frequently than once every n seconds. If unspecified, there is no limit.
  • Multiple files or directories can be saved to with TrainingProgressCheckpointing->{spec1,spec2,}.
  • With TrainingProgressCheckpointing->{"Directory","dir"}, individual files are named "starttime_counter_round_batch_loss.wlnet", where starttime is generated by DateString["ISODateTime"] at the start of training, counter is a value incremented after each call to NetTrain, round is the current round, batch is the cumulative batch number and loss is the most recent round loss. If a ValidationSet was specified, the most recent validation loss will also be included in the file name.
  • The list of checkpointing files created, if any, can be retrieved from a NetTrainResultsObject via the property "CheckpointingFiles".

Examples

Basic Examples  (1)

Take periodic checkpoints of a convolutional network during training on the MNIST dataset:

List all created checkpoints:

Import the final checkpoint:

Introduced in 2017
 (11.1)
 |
Updated in 2018
 (11.3)