Learning from ranking, learning to predict ranking

Learning preferences, ordinal regressions etc

2020-09-16 — 2023-11-22

Wherein pairwise preferences are compared using models such as Bradley–Terry, connections to quantile regression are observed, and application to fine‑tuning language models via learning from human preferences is described.

kernel tricks

ordinal

regression

tail risk

Ordinal data is ordered, but the differences between items may fail to possess a magnitude, e.g. I like mangoes more than apples, but how would I quantify the difference in my preference?

Made famous by fine-tuning language models, e.g. RLHF and kin (Zhu, Jordan, and Jiao 2023; Ziegler et al. 2019).

Connection to order statistics, probably, and presumably quantile regression.

If it is not clear already, I do not know much about this topic, I just wanted to keep track of it.

1 Learning with inconsistent preferences

Do we need an ordinal ranking to be consistent? Arrow’s theorem might remind us of an important case in which preferences are not consistent: when they are the preferences of many people in aggregate, i.e., when we are voting. Some algorithms make use of an analogous generalisation (Adachi et al. 2023; Chau, Gonzalez, and Sejdinovic 2022).

2 Incoming

Bradley–Terry model
Discrete choice models
Ordinal regression
OpenAI, Learning from human preferences
Connection to contrastive estimation, which has a ranking variate

3 References

Adachi, Planden, Howey, et al. 2023. “Looping in the Human: Collaborative and Explainable Bayesian Optimization.”

Agarwal, and Niyogi. 2009. “Generalization Bounds for Ranking Algorithms via Algorithmic Stability.” In Journal of Machine Learning Research.

Aziz, Biró, de Haan, et al. 2019. “Pareto Optimal Allocation Under Uncertain Preferences: Uncertainty Models, Algorithms, and Complexity.” Artificial Intelligence.

Chau, Gonzalez, and Sejdinovic. 2022. “Learning Inconsistent Preferences with Gaussian Processes.” In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics.

Chu, and Ghahramani. 2005a. “Preference Learning with Gaussian Processes.” In Proceedings of the 22nd International Conference on Machine Learning. ICML ’05.

———. 2005b. “Gaussian Processes for Ordinal Regression.” The Journal of Machine Learning Research.

Davis III, Spapé, and Ruotsalo. 2021. “Collaborative Filtering with Preferences Inferred from Brain Signals.” In Proceedings of the Web Conference 2021. WWW ’21.

González, Dai, Damianou, et al. 2017. “Preferential Bayesian Optimization.” In Proceedings of the 34th International Conference on Machine Learning.

Nishimura, Dunson, and Lu. 2020. “Discontinuous Hamiltonian Monte Carlo for Discrete Parameters and Discontinuous Likelihoods.” Biometrika.

Ouyang, Wu, Jiang, et al. 2022. “Training Language Models to Follow Instructions with Human Feedback.” In.

Shah, Tabibian, Muandet, et al. 2016. “Design and Analysis of the NIPS 2016 Review Process.”

Zhu, Jordan, and Jiao. 2023. “Principled Reinforcement Learning with Human Feedback from Pairwise or K-Wise Comparisons.” In Proceedings of the 40th International Conference on Machine Learning.

Ziegler, Stiennon, Wu, et al. 2019. “Fine-Tuning Language Models from Human Preferences.” In arXiv.org.