7.2 Examples of Different Training Algorithms
This section includes a small example illustrating the different training algorithms used by NeuralFit. If you want examples of different training algorithms of more realistic sizes, see the ones in Chapter 8, Dynamic Neural Networks, or Chapter 12, Application Examples, and change the option Method in the calls to NeuralFit.
Read in the Neural Networks package.
In[1]:=
Consider the following small example where the network only has two parameters. This makes it possible to illustrate the RMSE being minimized as a surface. To do this you need the following package.
Read in a standard package for graphics.
In[2]:=
The "true" function is chosen to be an FF network with one input and one output, no hidden layer and with a sigmoidal nonlinearity at the output. The true parameter values are 2 and 1.
Initialize a network of correct size and insert the true parameter values.
In[3]:=
Generate data with the true function.
In[5]:=
A 2parameter function is defined to carry out the RMSE computation. Note that this function makes use of the generated data {x,y} and is needed to generate the plots.
Define the criterion function.
In[8]:=
The criterion function can be plotted in a neighborhood of the minimum (2,1) using Plot3D.
Look at the criterion function.
In[9]:=
Now it is time to test the different training methods. The initial parameters are chosen to be (0.5,5). You can repeat the example with different initializations.
LevenbergMarquardt
Initialize the network and train with the LevenbergMarquardt method.
In[10]:=
The parameter record and the criterion log are included as rules in the training record, constituting the second output argument of NeuralFit. This information may be inserted into a list of 3element sublists containing the two parameter values and the corresponding RMS value for each iteration of the training process. Viewing this list as 3dimensional {x,y,z} points, it can be used to illustrate the RMSE error surface as a function of parameters using Plot3D.
Form a list of the trajectory in the parameter space.
In[13]:=
Form plots of the trajectory and show it together with the criterion surface.
In[14]:=
The {x,y,z} iterates of the training process are marked with dots that are connected with straight lines to show the trajectory. The training has converged after about five iterations.
GaussNewton Algorithm
The training of the initial neural network is now repeated with the GaussNewton algorithm.
Train the same neural network with the GaussNewton algorithm.
In[17]:=
Form a list of the trajectory in the parameter space.
In[18]:=
Form plots of the trajectory and show it together with the criterion surface.
In[19]:=
The GaussNewton algorithm converges in seven iterations.
Steepest Descent Method
Train the same neural network with SteepestDescent.
In[22]:=
The training did not converge within the 30 iterations. This is not necessarily a problem, since the parameter values may still be close enough to the minimum.
Form a list of the trajectory in the parameter space.
In[23]:=
Form plots of the trajectory and show it together with the criterion surface.
In[24]:=
Toward the end of the training the convergence is particularly slow. There, the steepest descent method exhibits much slower convergence than the LevenbergMarquardt and GaussNewton methods.
Backpropagation Algorithm
When you use the backpropagation algorithm you have to choose the step size and the momentum. It may not be an easy matter to choose judicious values for these parameters, something that is not an issue when using the other methods since they automatically tune the step size. You can repeat the example with different values of these parameters to see their influence.
Train the same neural network with backpropagation.
In[27]:=
Form a list of the trajectory in the parameter space.
In[28]:=
Form plots of the trajectory and show it together with the criterion surface.
In[29]:=
Due to the momentum term used in the training, the parameter estimate goes up on the slope adjacent to the initial parameter values. You can repeat the training with different values of the StepLength and Momentum options to see how they influence the minimization.
