Backprop-free methods for training neural networks

2022-09-20 — 2025-07-22

Wherein alternative training schemes are surveyed, and biological plausibility, randomized feedback signals, a forward‑forward two‑pass layerwise objective, and diffusion‑style NoProp are examined.

Bayes

dynamical systems

likelihood free

linear algebra

machine learning

Monte Carlo

neural nets

nonparametric

particle

probability

sciml

signal processing

sparser than thou

statistics

statmech

uncertainty

Methods for training neural networks without using reverse-mode autodiff, a.k.a. Backprop, and SGD. Once common, now rare, because Backprop is terrifyingly effective.

Why might we care about other methods than the terrifyingly effective one? Many reasons. For one, neural networks without backprop are “more” biologically plausible. For another, what if I want to handle non-differentiable parameters in my algorithm?

1 Direct feedback alignment

Started in Lillicrap et al. (2016) a Feedback Alignment:

The brain processes information through multiple layers of neurons. This deep architecture is representationally powerful, but complicates learning because it is difficult to identify the responsible neurons when a mistake is made. In machine learning, the backpropagation algorithm assigns blame by multiplying error signals with all the synaptic weights on each neuron’s axon and further downstream. However, this involves a precise, symmetric backward connectivity pattern, which is thought to be impossible in the brain. Here we demonstrate that this strong architectural constraint is not required for effective error propagation. We present a surprisingly simple mechanism that assigns blame by multiplying errors by even random synaptic weights. This mechanism can transmit teaching signals across multiple layers of neurons and performs as effectively as backpropagation on a variety of tasks. Our results help reopen questions about how the brain could use error signals and dispel long-held assumptions about algorithmic constraints on learning.

Nø kland (2016) took the idea from Feedback Alignment to Direct Feedback Alignment.

AFAICT, it’s a randomized linear algebra trick?

This was followed by much subsequent work, (Launay, Poli, and Krzakala 2019; Refinetti et al. 2021; Webster, Choi, and Ahn 2020).

2 Forward-forward

Hinton (2022):

The aim of this paper is to introduce a new learning procedure for neural networks and to demonstrate that it works well enough on a few small problems to be worth serious investigation. The Forward-Forward algorithm replaces the forward and backward passes of backpropagation by two forward passes, one with positive (i.e. real) data and the other with negative data which could be generated by the network itself. Each layer has its own objective function which is simply to have high goodness for positive data and low goodness for negative data. The sum of the squared activities in a layer can be used as the goodness but there are many other possibilities, including minus the sum of the squared activities. If the positive and negative passes can be separated in time, the negative passes can be done offline, which makes the learning much simpler in the positive pass and allows video to be pipelined through the network without ever storing activities or stopping to propagate derivatives.

3 NoProp

Li, Teh, and Pascanu (2025):

In this paper, we propose a back-propagation-free approach for training. The method is based on the denoising score matching approach that underlies [diffusion models…], enabling each layer of the neural network to be trained independently. In brief, at training time each layer is trained to predict the target label given a noisy label and the training input, while at inference time each layer takes the noisy label produced by the previous layer, and denoises it by taking a step towards the label it predicts. Of particular note is the observation that at training time the method does not even require a forward pass, hence we call our method NoProp.

4 Ensemble Kalman filter

Empirical Least Squares tricks. See NN-by-EnKF.

5 Evolution strategies

If we must. See genetic programming.

6 References

Haber, Lucka, and Ruthotto. 2018. “Never Look Back - A Modified EnKF Method and Its Application to the Training of Neural Networks Without Back Propagation.” arXiv:1805.08034 [Cs, Math].

Hinton. 2022. “The Forward-Forward Algorithm: Some Preliminary Investigations.”

Kovachki, and Stuart. 2019. “Ensemble Kalman Inversion: A Derivative-Free Technique for Machine Learning Tasks.” Inverse Problems.

Launay, Poli, and Krzakala. 2019. “Principled Training of Neural Networks with Direct Feedback Alignment.”

Lillicrap, Cownden, Tweed, et al. 2016. “Random Synaptic Feedback Weights Support Error Backpropagation for Deep Learning.” Nature Communications.

Li, Teh, and Pascanu. 2025. “NoProp: Training Neural Networks Without Back-Propagation or Forward-Propagation.”

Meng, Bachmann, and Khan. 2020. “Training Binary Neural Networks Using the Bayesian Learning Rule.” In Proceedings of the 37th International Conference on Machine Learning.

Nø kland. 2016. “Direct Feedback Alignment Provides Learning in Deep Neural Networks.” In Advances in Neural Information Processing Systems.

Refinetti, D’Ascoli, Ohana, et al. 2021. “Align, Then Memorise: The Dynamics of Learning with Feedback Alignment.” In Proceedings of the 38th International Conference on Machine Learning.

Ren, Kornblith, Liao, et al. 2022. “Scaling Forward Gradient With Local Losses.”

Webster, Choi, and Ahn. 2020. “Learning the Connections in Direct Feedback Alignment.”

Yegenoglu, Krajsek, Pier, et al. 2020. “Ensemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-Performing Gradient Descent.” In Machine Learning, Optimization, and Data Science.