TrainingProgressCheckpointing

TrainingProgressCheckpointing

is an option for NetTrain that specifies how to save copies of the net during training.

Details

  • With the default value of TrainingProgressCheckpointing->None, no checkpointing is done.
  • For TrainingProgressCheckpointing->spec, the following specifications can be used:
  • {"File","path/file.wltnet"}save the net to a file, overwriting any previous version
    {"Directory","path"}save the net as a uniquely named file in directory path
    {,subopts}include additional suboptions
  • All nets are saved in "WLNet" format.
  • If checkpointing is enabled, it will be done by default once every training round.
  • The suboption "Interval"->Quantity[n,"unit"] specifies the interval at which to do checkpointing. Possible forms for "unit" include:
  • "Rounds"net training rounds
    "Batches"training data batches
    "Seconds","Minutes","Hours"absolute time
  • The suboption "MinimumInterval"->n specifies that checkpoints should not be performed more frequently than once every n seconds. If unspecified, the default is 1 second.
  • Multiple files or directories can be saved to with TrainingProgressCheckpointing->{spec1,spec2,}.
  • With TrainingProgressCheckpointing->{"Directory","dir"}, individual files are named "starttime_counter_round_batch_loss.wlnet", where starttime is generated by DateString["ISODateTime"] at the start of training, counter is a value incremented after each call to NetTrain, round is the current round, batch is the cumulative batch number and loss is the most recent round loss. If a ValidationSet was specified, the most recent validation loss will also be included in the file name.

Examples

Basic Examples  (1)

Take periodic checkpoints of a convolutional network during training on the MNIST dataset:

In[1]:=
Click for copyable input
Out[1]=
In[2]:=
Click for copyable input
In[3]:=
Click for copyable input
In[4]:=
Click for copyable input

List all created checkpoints:

In[5]:=
Click for copyable input
Out[5]=

Import the final checkpoint:

In[6]:=
Click for copyable input
Out[6]=

See Also

NetTrain  TrainingProgressReporting  TrainingProgressFunction

Introduced in 2017
(11.1)