Learning on tabular data

2020-11-30 — 2021-06-21

Wherein tabular data is presented as the common substrate for models, and gradient boosting machines are recommended as the practical recourse, while neural PFNs are noted for recent in‑context feats.

Figure 1

Learning for tabular data, i.e. the stuff you generally store in spreadsheets and relational databases.

Popular in many areas, notably recommender systems.

(): jrzaurin/pytorch-widedeep: A flexible package to combine tabular data with text and images using Wide and Deep models in Pytorch. pytorch-widedeep, deep learning for tabular data IV: Deep Learning vs LightGBM

Note that the author of that package advises using gradient boosting machines to get this job done.

Recently, in-context inference via PFNs has had much-hyped success for tabular data.

1 Incoming

2 References

Arik, and Pfister. 2019. TabNet: Attentive Interpretable Tabular Learning.”
Cheng, Koc, Harmsen, et al. 2016. Wide & Deep Learning for Recommender Systems.” arXiv:1606.07792 [Cs, Stat].
Gorishniy, Rubachev, Khrulkov, et al. 2023. Revisiting Deep Learning Models for Tabular Data.”
Grinsztajn, Oyallon, and Varoquaux. 2022. Why Do Tree-Based Models Still Outperform Deep Learning on Tabular Data?
Hollmann, Müller, Eggensperger, et al. 2023. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second.”
Hollmann, Müller, Purucker, et al. 2025. Accurate Predictions on Small Data with a Tabular Foundation Model.” Nature.
Montanari, and Weiner. 2023. Compressing Tabular Data via Latent Variable Estimation.”
Richetti, Diakogianis, Bender, et al. 2023. A Methods Guideline for Deep Learning for Tabular Data in Agriculture with a Case Study to Forecast Cereal Yield.” Computers and Electronics in Agriculture.
Shwartz-Ziv, and Armon. 2021. Tabular Data: Deep Learning Is Not All You Need.” arXiv:2106.03253 [Cs].