Learning from ranking, learning to predict ranking
Learning preferences, ordinal regressions etc
2020-09-16 — 2023-11-22
Wherein pairwise preferences are compared using models such as Bradley–Terry, connections to quantile regression are observed, and application to fine‑tuning language models via learning from human preferences is described.
Ordinal data is ordered, but the differences between items may fail to possess a magnitude, e.g. I like mangoes more than apples, but how would I quantify the difference in my preference?
Made famous by fine-tuning language models, e.g. RLHF and kin (Zhu, Jordan, and Jiao 2023; Ziegler et al. 2019).
Connection to order statistics, probably, and presumably quantile regression.
If it is not clear already, I do not know much about this topic, I just wanted to keep track of it.
1 Learning with inconsistent preferences
Do we need an ordinal ranking to be consistent? Arrow’s theorem might remind us of an important case in which preferences are not consistent: when they are the preferences of many people in aggregate, i.e., when we are voting. Some algorithms make use of an analogous generalisation (Adachi et al. 2023; Chau, Gonzalez, and Sejdinovic 2022).
2 Incoming
- Bradley–Terry model
- Discrete choice models
- Ordinal regression
- OpenAI, Learning from human preferences
- Connection to contrastive estimation, which has a ranking variate