Distributed optimization for regression



How do you design statistics that can be conducted over many nodes? Many algorithms factorise nicely over nodes. I might list some here. An obvious one is message passing.

If you wish to solve this with heterogeneous, untrustworthy or ad hoc nodes, as opposed to a nice orderly campus HPC cluster, then perhaps it would be better to think of this as swarm sensing.

Placeholder; I have nothing to say about this right now, although I should mention that message-passing algorithms based on variational inference and graphical models are one possible avenue.

Tools

Spark.

CoCOA.

References

Acemoglu, Daron, Victor Chernozhukov, and Muhamet Yildiz. 2006. β€œLearning and Disagreement in an Uncertain World.” Working Paper 12648. National Bureau of Economic Research.
Battey, Heather, Jianqing Fan, Han Liu, Junwei Lu, and Ziwei Zhu. 2015. β€œDistributed Estimation and Inference with Statistical Guarantees.” arXiv:1509.05457 [Math, Stat], September.
Bianchi, P., and J. Jakubowicz. 2013. β€œConvergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization.” IEEE Transactions on Automatic Control 58 (2): 391–405.
Bieniawski, Stefan, and David H. Wolpert. 2004. β€œAdaptive, Distributed Control of Constrained Multi-Agent Systems.” In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 3, 4:1230–31. IEEE Computer Society.
Bottou, LΓ©on, Frank E. Curtis, and Jorge Nocedal. 2016. β€œOptimization Methods for Large-Scale Machine Learning.” arXiv:1606.04838 [Cs, Math, Stat], June.
Boyd, Stephen. 2010. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Vol. 3. Now Publishers Inc.
Calderhead, Ben. 2014. β€œA General Construction for Parallelizing Metropolisβˆ’Hastings Algorithms.” Proceedings of the National Academy of Sciences 111 (49): 17408–13.
Christ, Maximilian, Andreas W. Kempa-Liehr, and Michael Feindt. 2016. β€œDistributed and Parallel Time Series Feature Extraction for Industrial Big Data Applications.” arXiv:1610.07717 [Cs], October.
Gaines, Brian R., and Hua Zhou. 2016. β€œAlgorithms for Fitting the Constrained Lasso.” arXiv:1611.01511 [Stat], October.
Heinze, Christina, Brian McWilliams, and Nicolai Meinshausen. 2016. β€œDUAL-LOCO: Distributing Statistical Estimation Using Random Projections.” In, 875–83.
Heinze, Christina, Brian McWilliams, Nicolai Meinshausen, and Gabriel Krummenacher. 2014. β€œLOCO: Distributing Ridge Regression with Random Projections.” arXiv:1406.3469 [Stat], June.
Jaggi, Martin, Virginia Smith, Martin Takac, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, and Michael I Jordan. 2014. β€œCommunication-Efficient Distributed Dual Coordinate Ascent.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 3068–76. Curran Associates, Inc.
Jain, Prateek, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, and Aaron Sidford. 2016. β€œParallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging.” arXiv:1610.03774 [Cs, Stat], October.
Krummenacher, Gabriel, Brian McWilliams, Yannic Kilcher, Joachim M Buhmann, and Nicolai Meinshausen. 2016. β€œScalable Adaptive Stochastic Optimization Using Random Projections.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1750–58. Curran Associates, Inc.
Ma, Chenxin, Jakub Konečnα»³, Martin Jaggi, Virginia Smith, Michael I. Jordan, Peter RichtΓ‘rik, and Martin TakÑč. 2015. β€œDistributed Optimization with Arbitrary Local Solvers.” arXiv Preprint arXiv:1512.04039.
Ma, Chenxin, Virginia Smith, Martin Jaggi, Michael I. Jordan, Peter RichtΓ‘rik, and Martin TakÑč. 2015. β€œAdding Vs.Β Averaging in Distributed Primal-Dual Optimization.” arXiv:1502.03508 [Cs], February.
Mann, Richard P., and Dirk Helbing. 2016. β€œMinorities Report: Optimal Incentives for Collective Intelligence.” arXiv:1611.03899 [Cs, Math, Stat], November.
Mateos, G., J. A. Bazerque, and G. B. Giannakis. 2010. β€œDistributed Sparse Linear Regression.” IEEE Transactions on Signal Processing 58 (10): 5262–76.
McLachlan, Geoffrey J, and T Krishnan. 2008. The EM algorithm and extensions. Hoboken, N.J.: Wiley-Interscience.
Nathan, Alexandros, and Diego Klabjan. 2016. β€œOptimization for Large-Scale Machine Learning with Distributed Features and Observations.” arXiv:1610.10060 [Cs, Stat], October.
Shalev-Shwartz, Shai, and Tong Zhang. 2013. β€œStochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization.” Journal of Machine Learning Research 14 (Feb): 567–99.
Shamir, Ohad, Nathan Srebro, and Tong Zhang. 2014. β€œCommunication-Efficient Distributed Optimization Using an Approximate Newton-Type Method.” In ICML, 32:1000–1008.
Smith, Virginia, Simone Forte, Michael I. Jordan, and Martin Jaggi. 2015. β€œL1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework.” arXiv:1512.04011 [Cs], December.
Thanei, Gian-Andrea, Christina Heinze, and Nicolai Meinshausen. 2017. β€œRandom Projections For Large-Scale Regression.” arXiv:1701.05325 [Math, Stat], January.
Trofimov, Ilya, and Alexander Genkin. 2015. β€œDistributed Coordinate Descent for L1-Regularized Logistic Regression.” In Analysis of Images, Social Networks and Texts, edited by Mikhail Yu Khachay, Natalia Konstantinova, Alexander Panchenko, Dmitry I. Ignatov, and Valeri G. Labunets, 243–54. Communications in Computer and Information Science 542. Springer International Publishing.
β€”β€”β€”. 2016. β€œDistributed Coordinate Descent for Generalized Linear Models with Regularization.” arXiv:1611.02101 [Cs, Stat], November.
Yang, Tianbao. 2013. β€œTrading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent.” In Advances in Neural Information Processing Systems, 629–37.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.