Distributed optimization for regression

How do you design statistics that can be conducted over many nodes? Many algorithms factorise nicely over nodes. I might list some here. An obvious one is message passing.

If you wish to solve this with heterogeneous, untrustworthy or ad hoc nodes, as opposed to a nice orderly campus HPC cluster, then perhaps it would be better to think of this as swarm sensing.

Placeholder; I have nothing to say about this right now, although I should mention that message-passing algorithms based on variational inference and graphical models are one possible avenue.





Acemoglu, Daron, Victor Chernozhukov, and Muhamet Yildiz. 2006. “Learning and Disagreement in an Uncertain World.” Working Paper 12648. National Bureau of Economic Research. https://doi.org/10.3386/w12648.
Battey, Heather, Jianqing Fan, Han Liu, Junwei Lu, and Ziwei Zhu. 2015. “Distributed Estimation and Inference with Statistical Guarantees.” arXiv:1509.05457 [math, Stat], September. http://arxiv.org/abs/1509.05457.
Bianchi, P., and J. Jakubowicz. 2013. “Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization.” IEEE Transactions on Automatic Control 58 (2): 391–405. https://doi.org/10.1109/TAC.2012.2209984.
Bieniawski, Stefan, and David H. Wolpert. 2004. “Adaptive, Distributed Control of Constrained Multi-Agent Systems.” In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 3, 4:1230–31. IEEE Computer Society. https://ti.arc.nasa.gov/m/profile/dhw/papers/7.pdf.
Bottou, Léon, Frank E. Curtis, and Jorge Nocedal. 2016. “Optimization Methods for Large-Scale Machine Learning.” arXiv:1606.04838 [cs, Math, Stat], June. http://arxiv.org/abs/1606.04838.
Boyd, Stephen. 2010. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Vol. 3. Now Publishers Inc. https://doi.org/10.1561/2200000016.
Calderhead, Ben. 2014. “A General Construction for Parallelizing Metropolis−Hastings Algorithms.” Proceedings of the National Academy of Sciences 111 (49): 17408–13. https://doi.org/10.1073/pnas.1408184111.
Christ, Maximilian, Andreas W. Kempa-Liehr, and Michael Feindt. 2016. “Distributed and Parallel Time Series Feature Extraction for Industrial Big Data Applications.” arXiv:1610.07717 [cs], October. http://arxiv.org/abs/1610.07717.
Gaines, Brian R., and Hua Zhou. 2016. “Algorithms for Fitting the Constrained Lasso.” arXiv:1611.01511 [stat], October. http://arxiv.org/abs/1611.01511.
Heinze, Christina, Brian McWilliams, and Nicolai Meinshausen. 2016. “DUAL-LOCO: Distributing Statistical Estimation Using Random Projections.” In, 875–83. http://www.jmlr.org/proceedings/papers/v51/heinze16.html.
Heinze, Christina, Brian McWilliams, Nicolai Meinshausen, and Gabriel Krummenacher. 2014. “LOCO: Distributing Ridge Regression with Random Projections.” arXiv:1406.3469 [stat], June. http://arxiv.org/abs/1406.3469.
Jaggi, Martin, Virginia Smith, Martin Takac, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, and Michael I Jordan. 2014. “Communication-Efficient Distributed Dual Coordinate Ascent.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 3068–76. Curran Associates, Inc. http://papers.nips.cc/paper/5599-communication-efficient-distributed-dual-coordinate-ascent.pdf.
Jain, Prateek, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, and Aaron Sidford. 2016. “Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging.” arXiv:1610.03774 [cs, Stat], October. http://arxiv.org/abs/1610.03774.
Krummenacher, Gabriel, Brian McWilliams, Yannic Kilcher, Joachim M Buhmann, and Nicolai Meinshausen. 2016. “Scalable Adaptive Stochastic Optimization Using Random Projections.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1750–58. Curran Associates, Inc. http://papers.nips.cc/paper/6054-scalable-adaptive-stochastic-optimization-using-random-projections.pdf.
Ma, Chenxin, Jakub Konečnỳ, Martin Jaggi, Virginia Smith, Michael I. Jordan, Peter Richtárik, and Martin Takáč. 2015. “Distributed Optimization with Arbitrary Local Solvers.” arXiv Preprint arXiv:1512.04039. http://arxiv.org/abs/1512.04039.
Ma, Chenxin, Virginia Smith, Martin Jaggi, Michael I. Jordan, Peter Richtárik, and Martin Takáč. 2015. “Adding Vs. Averaging in Distributed Primal-Dual Optimization.” arXiv:1502.03508 [cs], February. http://arxiv.org/abs/1502.03508.
Mann, Richard P., and Dirk Helbing. 2016. “Minorities Report: Optimal Incentives for Collective Intelligence.” arXiv:1611.03899 [cs, Math, Stat], November. http://arxiv.org/abs/1611.03899.
Mateos, G., J. A. Bazerque, and G. B. Giannakis. 2010. “Distributed Sparse Linear Regression.” IEEE Transactions on Signal Processing 58 (10): 5262–76. https://doi.org/10.1109/TSP.2010.2055862.
McLachlan, Geoffrey J, and T Krishnan. 2008. The EM algorithm and extensions. Hoboken, N.J.: Wiley-Interscience. http://site.ebrary.com/id/10296227.
Nathan, Alexandros, and Diego Klabjan. 2016. “Optimization for Large-Scale Machine Learning with Distributed Features and Observations.” arXiv:1610.10060 [cs, Stat], October. http://arxiv.org/abs/1610.10060.
Shalev-Shwartz, Shai, and Tong Zhang. 2013. “Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization.” Journal of Machine Learning Research 14 (Feb): 567–99. http://www.jmlr.org/papers/v14/shalev-shwartz13a.html.
Shamir, Ohad, Nathan Srebro, and Tong Zhang. 2014. “Communication-Efficient Distributed Optimization Using an Approximate Newton-Type Method.” In ICML, 32:1000–1008. http://www.jmlr.org/proceedings/papers/v32/shamir14.pdf.
Smith, Virginia, Simone Forte, Michael I. Jordan, and Martin Jaggi. 2015. “L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework.” arXiv:1512.04011 [cs], December. http://arxiv.org/abs/1512.04011.
Thanei, Gian-Andrea, Christina Heinze, and Nicolai Meinshausen. 2017. “Random Projections For Large-Scale Regression.” arXiv:1701.05325 [math, Stat], January. http://arxiv.org/abs/1701.05325.
Trofimov, Ilya, and Alexander Genkin. 2015. “Distributed Coordinate Descent for L1-Regularized Logistic Regression.” In Analysis of Images, Social Networks and Texts, edited by Mikhail Yu Khachay, Natalia Konstantinova, Alexander Panchenko, Dmitry I. Ignatov, and Valeri G. Labunets, 243–54. Communications in Computer and Information Science 542. Springer International Publishing. https://doi.org/10.1007/978-3-319-26123-2_24.
———. 2016. “Distributed Coordinate Descent for Generalized Linear Models with Regularization.” arXiv:1611.02101 [cs, Stat], November. http://arxiv.org/abs/1611.02101.
Yang, Tianbao. 2013. “Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent.” In Advances in Neural Information Processing Systems, 629–37. http://papers.nips.cc/paper/5114-tra.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.