When derivatives cannot be computed symbolically, Newton's method will be used, but with a finite difference approximation to the Jacobian. This can have costs in terms of both time and reliability. Just as for minimization, an alternative is to use an algorithm specifically designed to work without derivatives.
In one dimension, the idea of the secant method is to use the slope of the line between two consecutive search points to compute the step instead of the derivative at the latest point. Similarly in dimensions, differences between the residuals at points are used to construct an approximation of sorts to the Jacobian. Note that this is similar to finite differences, but rather than trying to make the difference interval small in order to get as good a Jacobian approximation as possible, it effectively uses an average derivative just like the one-dimensional secant method. Initially, the points are constructed from two starting points that are distinct in all dimensions. Subsequently, as steps are taken, only the points with the smallest merit function value are kept. It is rare, but possible, that steps are collinear and the secant approximation to the Jacobian becomes singular. In this case, the algorithm is restarted with distinct points.
The method requires two starting points in each dimension. In fact, if two starting points are given in each dimension, the secant method is the default method except in one dimension, where Brent's method may be chosen.
Note that, as compared to Newton's method, many more residual function evaluations are required. However, the method is able to follow the relatively narrow valley without directly using derivative information.
However, when compared to Newton's method with finite differences, the number of residual function evaluations is comparable. For sparse Jacobian matrices with larger problems, the finite difference Newton method will usually be more efficient since the secant method does not take advantage of sparsity in any way.