Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization
General Robotics, Automation, Sensing and Perception Laboratory
The problem of learning using connectionist networks, in which network connection strengths are modified systematically so that the response of the network increasingly approximates the desired response can be structured as an optimization problem. The widely used back propagation method of connectionist learning [19, 21, 18] is set in the context of nonlinear optimization. In this framework, the issues of stability, convergence and parallelism are considered. As a form of gradient descent with fixed step size, back propagation is known to be unstable, which is illustrated using Rosenbrock's function. This is contrasted with stable methods which involve a line search in the gradient direction. The convergence criterion for connectionist problems involving binary functions is discussed relative to the behavior of gradient descent in the vicinity of local minima. A minimax criterion is compared with the least squares criterion. The contribution of the momentum term [19, 18] to more rapid convergence is interpreted relative to the geometry of the weight space. It is shown that in plateau regions of relatively constant gradient, the momentum term acts to increase the step size by a factor of 1/1-μ, where μ is the momentum term. In valley regions with steep sides, the momentum constant acts to focus the search direction toward the local minimum by averaging oscillations in the gradient.