Backprop-free methods for training neural networks
2022-09-20 — 2025-07-22
Suspiciously similar content
Methods for training neural networks without using reverse-mode autodiff, a.k.a. Backprop, and SGD. Once common, now rare, because Backprop is terrifyingly effective.
Why might we care about other structures? Many reasons. For one, neural networks without backprop are “more” biologically plausible. For another, what if I want to handle no-differentiable parameters
1 Direct feedback alignment
Started in Lillicrap et al. (2016):
The brain processes information through multiple layers of neurons. This deep architecture is representationally powerful, but complicates learning because it is difficult to identify the responsible neurons when a mistake is made. In machine learning, the backpropagation algorithm assigns blame by multiplying error signals with all the synaptic weights on each neuron’s axon and further downstream. However, this involves a precise, symmetric backward connectivity pattern, which is thought to be impossible in the brain. Here we demonstrate that this strong architectural constraint is not required for effective error propagation. We present a surprisingly simple mechanism that assigns blame by multiplying errors by even random synaptic weights. This mechanism can transmit teaching signals across multiple layers of neurons and performs as effectively as backpropagation on a variety of tasks. Our results help reopen questions about how the brain could use error signals and dispel long-held assumptions about algorithmic constraints on learning.
Nø kland (2016) took it from Feedback Alignment to Direct Feedback Alignment.
AFAICT it’s a randomized Linear Algebra trick?
Actioned by much subsequent work, (Launay, Poli, and Krzakala 2019; Refinetti et al. 2021; Webster, Choi, and Ahn 2020).
2 Forward-forward
Hinton (2022):
The aim of this paper is to introduce a new learning procedure for neural networks and to demonstrate that it works well enough on a few small problems to be worth serious investigation. The Forward-Forward algorithm replaces the forward and backward passes of backpropagation by two forward passes, one with positive (i.e. real) data and the other with negative data which could be generated by the network itself. Each layer has its own objective function which is simply to have high goodness for positive data and low goodness for negative data. The sum of the squared activities in a layer can be used as the goodness but there are many other possibilities, including minus the sum of the squared activities. If the positive and negative passes can be separated in time, the negative passes can be done offline, which makes the learning much simpler in the positive pass and allows video to be pipelined through the network without ever storing activities or stopping to propagate derivatives.
See also Ren et al. (2022), What is the “forward-forward” algorithm, Geoffrey Hinton’s new AI technique?
3 NoProp
Li, Teh, and Pascanu (2025):
In this paper, we propose a back-propagation-free approach for training. The method is based on the denoising score matching approach that underlies [diffusion models…], enabling each layer of the neural network to be trained independently. In brief, at training time each layer is trained to predict the target label given a noisy label and the training input, while at inference time each layer takes the noisy label produced by the previous layer, and denoises it by taking a step towards the label it predicts. Of particular note is the observation that at training time the method does not even require a forward pass, hence we call our method NoProp.
4 Ensemble Kalman filter
See NN-by-EnKF.
5 Evolution strategies
If you must. See genetic programming.