# Garbled highlights from NIPS 2016

Snippets noted for future references

## Time series workshop

Luminaries:

• Mehryar Mohri
• Yan Liu
• Andrew Nobel
• Inderjit Dhillon
• Stephen Roberts

Vitaly Kuznetsov, Mehryar Mohri, introduced me to Learning theory for time series.

Mehryar Mohri presented his online-learning time series analysis using mixtures of experts through empirical discrepancy. He had me up until the model selection phase, when I got lost in a recursive argument. Will come back to this.

Yan Liu - FDA approaches, Hawkes models, clustering of time series. Large section on subspace clustering, which I guess I need to comprehend at some point. Time is special because it reflects the arrow of entropy. Also it can give us a notion of real causality.

Andrew B. Nobel- important of mis-specification in time series models, wrt compounding of the problem over time, increased difficulty of validating assumptions. Time is special because it compounds error. P.s. why not more focus on algorithm failure cases? NIPS conference dynamic doesn’t encourage falsification.

Mohri: time is special because i.i.d is a special case thereof. “Prediction” really is about the future states with these. (How do you do inference of “true models” in his formalism?)

I missed the name of one Bayesian presenter, who asked:

Why not use DNN to construct features? How can the feature construction of DNNs be plugged in to Bayesian models? BTW, Bayesian nonparametrics still state of the art for general time series.

## MetaGrad: Multiple Learning rates in Online Learning

Tim van Erven, Wouter M Koolen

Learn correct learning rate by simultaneously trying many.

Question: Why is this online-specific?

## Structured Orthogonal Random Features

I forget who presented

We present an intriguing discovery related to Random Fourier Features: replacing multiplication by a random Gaussian matrix with multiplication by a properly scaled random orthogonal matrix significantly decreases kernel approximation error. We call this technique Orthogonal Random Features (ORF), and provide theoretical and empirical justification for its effectiveness. Motivated by the discovery, we further propose Structured Orthogonal Random Features (SORF), which uses a class of structured discrete orthogonal matrices to speed up the computation. The method reduces the time cost from $$\mathcal{O}(d^2)$$ to $$\mathcal{O}(d log d)$$, where d is the data dimensionality, with almost no compromise in kernel approximation quality compared to ORF.

Leads naturally to question: How to manage other types of correlation. How about time series?

## Universal Correspondence Network

I forgot who presented , which integrates geometric transforms into CNNs in a reasonably natural way:

We present a deep learning framework for accurate visual correspondences and demonstrate its effectiveness for both geometric and semantic matching, spanning across rigid motions to intra-class shape or appearance variations. In contrast to previous CNN-based approaches that optimize a surrogate patch similarity objective, we use deep metric learning to directly learn a feature space that preserves either geometric or semantic similarity.

Cries out for a musical implementation

## Weight Normalization: A simple reparameterisation to Accelerate Training of Deep Neural Networks

Tim Salimans presents the simplest paper at NIPS, :

We present weight normalization: a reparameterisation of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterisation is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speed-up of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization steps to be taken in the same amount of time.

An elaborate motivation for a conceptually and practically simple way (couple of lines of code) of fixing up batch normalisation.

## Relevant sparse codes with variational information bottleneck

Matthew Chalk presents .

In many applications, it is desirable to extract only the relevant aspects of data. A principled way to do this is the information bottleneck (IB) method, where one seeks a code that maximises information about a relevance variable, Y, while constraining the information encoded about the original data, X. Unfortunately however, the IB method is computationally demanding when data are high-dimensional and/or non-Gaussian. Here we propose an approximate variational scheme for maximising a lower bound on the IB objective, analogous to variational EM. Using this method, we derive an IB algorithm to recover features that are both relevant and sparse. Finally, we demonstrate how kernelised versions of the algorithm can be used to address a broad range of problems with non-linear relation between X and Y.

This one is a cool demo machine.

## Dense Associative Memory for Pattern recognition

Dmitry Krotov presents , a.k.a. Hopfield 2.0:

We propose a model of associative memory having an unusual mathematical structure. Contrary to the standard case, which works well only in the limit when the number of stored memories is much smaller than the number of neurons, our model stores and reliably retrieves many more patterns than the number of neurons in the network. We propose a simple duality between this dense associative memory and neural networks commonly used in models of deep learning. On the associative memory side of this duality, a family of models that smoothly interpolates between two limiting cases can be constructed. One limit is referred to as the feature-matching mode of pattern recognition, and the other one as the prototype regime. On the deep learning side of the duality, this family corresponds to neural networks with one hidden layer and various activation functions, which transmit the activities of the visible neurons to the hidden layer. This family of activation functions includes logistics, rectified linear units, and rectified polynomials of higher degrees. The proposed duality makes it possible to apply energy-based intuition from associative memory to analyze computational properties of neural networks with unusual activation functions — the higher rectified polynomials which until now have not been used for training neural networks. The utility of the dense memories is illustrated for two test cases: the logical gate XOR and the recognition of handwritten digits from the MNIST data set.

## Density estimation using Real NVP

Laurent Dinh explains :

Unsupervised learning of probabilistic models is a central yet challenging problem in machine learning. Specifically, designing models with tractable learning, sampling, inference and evaluation is crucial in solving this task. We extend the space of such models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space. We demonstrate its ability to model natural images on four datasets through sampling, log-likelihood evaluation and latent variable manipulations.

This ultimately feeds into the reparameterisation trick literature.

## InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Xi Chen presents

This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.

Usable parameterizations of GAN by structuring the latent space.

## Parameter Learning for Log-supermodular Distributions

Tatiana Shpakova presents .

Hack of note:

In order to minimize the expectation […] , we propose to use the projected stochastic gradient method, not on the data as usually done, but on our own internal randomization.

## Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates

Non-negative matrix factorization is a popular tool for decomposing data into feature and weight matrices under non-negativity constraints. It enjoys practical success but is poorly understood theoretically. This paper proposes an algorithm that alternates between decoding the weights and updating the features, and shows that assuming a generative model of the data, it provably recovers the ground- truth under fairly mild conditions. In particular, its only essential requirement on features is linear independence. Furthermore, the algorithm uses ReLU to exploit the non-negativity for decoding the weights, and thus can tolerate adversarial noise that can potentially be as large as the signal, and can tolerate unbiased noise much larger than the signal. The analysis relies on a carefully designed coupling between two potential functions, which we believe is of independent interest.

## High dimensional learning with structure

Luminaries:

• Richard Samworth
• Po-Ling Loh
• Sahand Negahban
• Mark Schmidt
• Kai-Wei Chang
• Allen Yang
• Chinmay Hegde
• Rene Vidal
• Guillaume Obozinski
• Lorenzo Rosasco

Several applications necessitate learning a very large number of parameters from small amounts of data, which can lead to overfitting, statistically unreliable answers, and large training/prediction costs. A common and effective method to avoid the above mentioned issues is to restrict the parameter-space using specific structural constraints such as sparsity or low rank. However, such simple constraints do not fully exploit the richer structure which is available in several applications and is present in the form of correlations, side information or higher order structure. Designing new structural constraints requires close collaboration between domain experts and machine learning practitioners. Similarly, developing efficient and principled algorithms to learn with such constraints requires further collaborations between experts in diverse areas such as statistics, optimization, approximation algorithms etc. This interplay has given rise to a vibrant research area.

The main objective of this workshop is to consolidate current ideas from diverse areas such as machine learning, signal processing, theoretical computer science, optimization and statistics, clarify the frontiers in this area, discuss important applications and open problems, and foster new collaborations.

Chinmay Hegde:

We consider the demixing problem of two (or more) high-dimensional vectors from nonlinear observations when the number of such observations is far less than the ambient dimension of the underlying vectors. Specifically, we demonstrate an algorithm that stably estimate the underlying components under general structured sparsity assumptions on these components. Specifically, we show that for certain types of structured superposition models, our method provably recovers the components given merely n = O(s) samples where s denotes the number of nonzero entries in the underlying components. Moreover, our method achieves a fast (linear) convergence rate, and also exhibits fast (near-linear) per-iteration complexity for certain types of structured models. We also provide a range of simulations to illustrate the performance of the proposed algorithm.

This ends up being a sparse recovery for given bases (e.g. Dirac deltas plus Fourier basis). The interesting problem is recovering the correct decomposition with insufficient incoherence (they have a formalism for this)

Rene Vidal: “Deep learning is nonlinear tensor factorization”. Various results on tensor factorization, regularized with various norms. They have proofs for a generalized class of matrix factorisations that “Sufficiently wide” factorization matrices do not have local minima. Conclusion: increase size of factorization, in optimisation procedure.

Guillaume Obozinski: hierarchical sparsity penalties for DAG inference.

## Doug Eck

Presents magenta.

## NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop

NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop

## Adaptive and Scalable Nonparametric Methods in Machine Learning

Looked solidly amazing, but I was caught up elsewhere:

Adaptive and Scalable Nonparametric Methods in Machine Learning

## Brains and Bits: Neuroscience Meets Machine Learning

Max Welling: Making Deep Learning Efficient Through Sparsification.

## Constructive machine learning

Rus Salakhutdinov

On Multiplicative Integration with Recurrent Neural Networks Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, Ruslan R. Salakhutdinov

Constructive machine learning

## References

Allen-Zhu, Zeyuan, and Elad Hazan. 2016. “Optimal Black-Box Reductions Between Optimization Objectives.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1606–14. Curran Associates, Inc. http://papers.nips.cc/paper/6364-optimal-black-box-reductions-between-optimization-objectives.pdf.
Ba, Jimmy, Geoffrey E Hinton, Volodymyr Mnih, Joel Z Leibo, and Catalin Ionescu. 2016. “Using Fast Weights to Attend to the Recent Past.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 4331–39. Curran Associates, Inc. http://papers.nips.cc/paper/6057-using-fast-weights-to-attend-to-the-recent-past.pdf.
Bhojanapalli, Srinadh, Behnam Neyshabur, and Nati Srebro. 2016. “Global Optimality of Local Search for Low Rank Matrix Recovery.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3873–81. Curran Associates, Inc. http://papers.nips.cc/paper/6271-global-optimality-of-local-search-for-low-rank-matrix-recovery.pdf.
Chalk, Matthew, Olivier Marre, and Gasper Tkacik. 2016. “Relevant Sparse Codes with Variational Information Bottleneck.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1957–65. Curran Associates, Inc. http://papers.nips.cc/paper/6101-relevant-sparse-codes-with-variational-information-bottleneck.pdf.
Chen, Xi, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, R. Garnett, and R. Garnett, 2172–80. Curran Associates, Inc. http://papers.nips.cc/paper/6399-infogan-interpretable-representation-learning-by-information-maximizing-generative-adversarial-nets.pdf.
Choy, Christopher B, JunYoung Gwak, Silvio Savarese, and Manmohan Chandraker. 2016. “Universal Correspondence Network.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2406–14. Curran Associates, Inc. http://papers.nips.cc/paper/6487-universal-correspondence-network.pdf.
David, Ofir, Shay Moran, and Amir Yehudayoff. 2016. “Supervised Learning Through the Lens of Compression.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2784–92. Curran Associates, Inc. http://papers.nips.cc/paper/6490-supervised-learning-through-the-lens-of-compression.pdf.
Dinh, Laurent, Jascha Sohl-Dickstein, and Samy Bengio. 2016. “Density Estimation Using Real NVP.” In Advances In Neural Information Processing Systems. https://openreview.net/forum?id=SyPNSAW5.
Dumoulin, Vincent, Jonathon Shlens, and Manjunath Kudlur. 2016. “A Learned Representation For Artistic Style.” October 24, 2016. http://arxiv.org/abs/1610.07629.
Ellis, Kevin, Armando Solar-Lezama, and Josh Tenenbaum. 2016. “Sampling for Bayesian Program Learning.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1289–97. Curran Associates, Inc. http://papers.nips.cc/paper/6082-sampling-for-bayesian-program-learning.pdf.
Erven, Tim van, and Wouter M Koolen. 2016. MetaGrad: Multiple Learning Rates in Online Learning.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3666–74. Curran Associates, Inc. http://papers.nips.cc/paper/6268-metagrad-multiple-learning-rates-in-online-learning.pdf.
Finn, Chelsea, Ian Goodfellow, and Sergey Levine. 2016. “Unsupervised Learning for Physical Interaction Through Video Prediction.” In Advances In Neural Information Processing Systems 29, edited by D. D. Lee, U. V. Luxburg, I. Guyon, and R. Garnett, 64–72. Curran Associates, Inc. http://papers.nips.cc/paper/6160-unsupervised-learning-for-physical-interaction-through-video-prediction.pdf.
Flamary, Rémi, Cédric Févotte, Nicolas Courty, and Valentin Emiya. 2016. “Optimal Spectral Transportation with Application to Music Transcription.” In, 703–11. Curran Associates, Inc. http://papers.nips.cc/paper/6479-optimal-spectral-transportation-with-application-to-music-transcription.pdf.
Fraccaro, Marco, Sø ren Kaae Sø nderby, Ulrich Paquet, and Ole Winther. 2016. “Sequential Neural Models with Stochastic Layers.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2199–2207. Curran Associates, Inc. http://papers.nips.cc/paper/6039-sequential-neural-models-with-stochastic-layers.pdf.
Ge, Rong, Jason D Lee, and Tengyu Ma. 2016. “Matrix Completion Has No Spurious Local Minimum.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2973–81. Curran Associates, Inc. http://papers.nips.cc/paper/6048-matrix-completion-has-no-spurious-local-minimum.pdf.
Genevay, Aude, Marco Cuturi, Gabriel Peyré, and Francis Bach. 2016. “Stochastic Optimization for Large-Scale Optimal Transport.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3432–40. Curran Associates, Inc. http://papers.nips.cc/paper/6566-stochastic-optimization-for-large-scale-optimal-transport.pdf.
Gruslys, Audrunas, Remi Munos, Ivo Danihelka, Marc Lanctot, and Alex Graves. 2016. “Memory-Efficient Backpropagation Through Time.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 4125–33. Curran Associates, Inc. http://papers.nips.cc/paper/6221-memory-efficient-backpropagation-through-time.pdf.
Haarnoja, Tuomas, Anurag Ajay, Sergey Levine, and Pieter Abbeel. 2016. “Backprop KF: Learning Discriminative Deterministic State Estimators.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 4376–84. Curran Associates, Inc. http://papers.nips.cc/paper/6090-backprop-kf-learning-discriminative-deterministic-state-estimators.pdf.
Haeffele, Benjamin D., and Rene Vidal. 2015. “Global Optimality in Tensor Factorization, Deep Learning, and Beyond.” June 24, 2015. http://arxiv.org/abs/1506.07540.
Hazan, Elad, and Tengyu Ma. 2016. “A Non-Generative Framework and Convex Relaxations for Unsupervised Learning.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3306–14. Curran Associates, Inc. http://papers.nips.cc/paper/6533-a-non-generative-framework-and-convex-relaxations-for-unsupervised-learning.pdf.
He, Xinran, Ke Xu, David Kempe, and Yan Liu. 2016. “Learning Influence Functions from Incomplete Observations.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2065–73. Curran Associates, Inc. http://papers.nips.cc/paper/6181-learning-influence-functions-from-incomplete-observations.pdf.
Horel, Thibaut, and Yaron Singer. 2016. “Maximization of Approximately Submodular Functions.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3045–53. Curran Associates, Inc. http://papers.nips.cc/paper/6236-maximization-of-approximately-submodular-functions.pdf.
Jia, Xu, Bert De Brabandere, Tinne Tuytelaars, and Luc V Gool. 2016. “Dynamic Filter Networks.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 667–75. Curran Associates, Inc. http://papers.nips.cc/paper/6578-dynamic-filter-networks.pdf.
Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. “Improving Variational Inference with Inverse Autoregressive Flow.” In Advances in Neural Information Processing Systems 29. Curran Associates, Inc. http://arxiv.org/abs/1606.04934.
Krotov, Dmitry, and John J. Hopfield. 2016. “Dense Associative Memory for Pattern Recognition.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1172–80. Curran Associates, Inc. http://papers.nips.cc/paper/6121-dense-associative-memory-for-pattern-recognition.pdf.
Krummenacher, Gabriel, Brian McWilliams, Yannic Kilcher, Joachim M Buhmann, and Nicolai Meinshausen. 2016. “Scalable Adaptive Stochastic Optimization Using Random Projections.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1750–58. Curran Associates, Inc. http://papers.nips.cc/paper/6054-scalable-adaptive-stochastic-optimization-using-random-projections.pdf.
Kuznetsov, Vitaly, and Mehryar Mohri. 2014a. “Forecasting Non-Stationary Time Series: From Theory to Algorithms.” http://www.cims.nyu.edu/~munoz/multitask/Paper_22_fts.pdf.
———. 2014b. “Generalization Bounds for Time Series Prediction with Non-Stationary Processes.” In Algorithmic Learning Theory, edited by Peter Auer, Alexander Clark, Thomas Zeugmann, and Sandra Zilles, 260–74. Lecture Notes in Computer Science. Bled, Slovenia: Springer International Publishing. https://doi.org/10.1007/978-3-319-11662-4_19.
———. 2015. “Learning Theory and Algorithms for Forecasting Non-Stationary Time Series.” In Advances in Neural Information Processing Systems, 541–49. Curran Associates, Inc. http://papers.nips.cc/paper/5836-learning-theory-and-algorithms-for-forecasting-non-stationary-time-series.
———. 2016. “Generalization Bounds for Non-Stationary Mixing Processes.” In Machine Learning Journal. http://www.cs.nyu.edu/ mohri/pub/nonstatj.pdf.
Li, Yuanzhi, Yingyu Liang, and Andrej Risteski. 2016. “Recovery Guarantee of Non-Negative Matrix Factorization via Alternating Updates.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 4988–96. Curran Associates, Inc. http://papers.nips.cc/paper/6417-recovery-guarantee-of-non-negative-matrix-factorization-via-alternating-updates.pdf.
Lindgren, Erik, Shanshan Wu, and Alexandros G Dimakis. 2016. “Leveraging Sparsity for Efficient Submodular Data Summarization.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3414–22. Curran Associates, Inc. http://papers.nips.cc/paper/6382-leveraging-sparsity-for-efficient-submodular-data-summarization.pdf.
Luo, Haipeng, Alekh Agarwal, Nicolò Cesa-Bianchi, and John Langford. 2016. “Efficient Second Order Online Learning by Sketching.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 902–10. Curran Associates, Inc. http://papers.nips.cc/paper/6207-efficient-second-order-online-learning-by-sketching.pdf.
Makoto Yamada, Koh Takeuchi, Tomoharu Iwata, John Shawe-Taylor, and Samuel KAski. 2016. “Localized Lasso for High-Dimensional Regression.” In. https://www.cs.utexas.edu/ rofuyu/lhds-nips16/papers/2.pdf.
Mohammadreza Soltani, and Chinmay Hegde. 2016. “Iterative Thresholding for Demixing Structured Superpositions in High Dimensions.” In.
Ostrovsky, Dmitry, Zaid Harchaoui, Anatoli Juditsky, and Arkadi S Nemirovski. 2016. “Structure-Blind Signal Recovery.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 4817–25. Curran Associates, Inc. http://papers.nips.cc/paper/6063-structure-blind-signal-recovery.pdf.
Poole, Ben, Subhaneil Lahiri, Maithreyi Raghu, Jascha Sohl-Dickstein, and Surya Ganguli. 2016. “Exponential Expressivity in Deep Neural Networks Through Transient Chaos.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3360–68. Curran Associates, Inc. http://papers.nips.cc/paper/6322-exponential-expressivity-in-deep-neural-networks-through-transient-chaos.pdf.
Ritchie, Daniel, Anna Thomas, Pat Hanrahan, and Noah Goodman. 2016. “Neurally-Guided Procedural Models: Amortized Inference for Procedural Graphics Programs Using Neural Networks.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 622–30. Curran Associates, Inc. http://papers.nips.cc/paper/6353-neurally-guided-procedural-models-amortized-inference-for-procedural-graphics-programs-using-neural-networks.pdf.
Salimans, Tim, and Diederik P Kingma. 2016. “Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 901–1. Curran Associates, Inc. http://papers.nips.cc/paper/6114-weight-normalization-a-simple-reparameterization-to-accelerate-training-of-deep-neural-networks.pdf.
Schein, Aaron, Hanna Wallach, and Mingyuan Zhou. 2016. “Poisson-Gamma Dynamical Systems.” In Advances In Neural Information Processing Systems, 5006–14. http://papers.nips.cc/paper/6082-poisson-gamma-dynamical-systems.
Shpakova, Tatiana, and Francis Bach. 2016. “Parameter Learning for Log-Supermodular Distributions.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3234–42. Curran Associates, Inc. http://papers.nips.cc/paper/6402-parameter-learning-for-log-supermodular-distributions.pdf.
Sinha, Aman, and John C Duchi. 2016. “Learning Kernels with Random Features.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1298–1306. Curran Associates, Inc. http://papers.nips.cc/paper/6180-learning-kernels-with-random-features.pdf.
Soltani, Mohammadreza, and Chinmay Hegde. 2016a. “Demixing Sparse Signals from Nonlinear Observations.” Statistics 7: 9. http://home.engineering.iastate.edu/~chinmay/files/papers/demix_ISUTR.pdf.
———. 2016b. “Fast Algorithms for Demixing Sparse Signals from Nonlinear Observations.” August 3, 2016. http://arxiv.org/abs/1608.01234.
Surace, Simone Carlo, and Jean-Pascal Pfister. 2016. “Online Maximum Likelihood Estimation of the Parameters of Partially Observed Diffusion Processes.” In.
Wang, Yunhe, Chang Xu, Shan You, Dacheng Tao, and Chao Xu. 2016. CNNpack: Packing Convolutional Neural Networks in the Frequency Domain.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 253–61. Curran Associates, Inc. http://papers.nips.cc/paper/6390-cnnpack-packing-convolutional-neural-networks-in-the-frequency-domain.pdf.
Wu, Shanshan, Srinadh Bhojanapalli, Sujay Sanghavi, and Alexandros G Dimakis. 2016. “Single Pass PCA of Matrix Products.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2577–85. Curran Associates, Inc. http://papers.nips.cc/paper/6075-single-pass-pca-of-matrix-products.pdf.
Wu, Yuhuai, Saizheng Zhang, Ying Zhang, Yoshua Bengio, and Ruslan R Salakhutdinov. 2016. “On Multiplicative Integration with Recurrent Neural Networks.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2856–64. Curran Associates, Inc. http://papers.nips.cc/paper/6215-on-multiplicative-integration-with-recurrent-neural-networks.pdf.
Yu, Felix X, Ananda Theertha Suresh, Krzysztof M Choromanski, Daniel N Holtmann-Rice, and Sanjiv Kumar. 2016. “Orthogonal Random Features.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1975–83. Curran Associates, Inc. http://papers.nips.cc/paper/6246-orthogonal-random-features.pdf.
Yu, Hsiang-Fu, Nikhil Rao, and Inderjit S Dhillon. 2016. “Temporal Regularized Matrix Factorization for High-Dimensional Time Series Prediction.” In Advances In Neural Information Processing Systems 29, edited by D. D. Lee, U. V. Luxburg, I. Guyon, and R. Garnett, 847–55. Curran Associates, Inc. http://papers.nips.cc/paper/6159-temporal-regularized-matrix-factorization-for-high-dimensional-time-series-prediction.pdf.
Yuan, Xiaotong, Ping Li, Tong Zhang, Qingshan Liu, and Guangcan Liu. 2016. “Learning Additive Exponential Family Graphical Models via $\ell_{\lbrace 2,1\rbrace}$ -Norm Regularized M-Estimation.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 4367–75. Curran Associates, Inc. http://papers.nips.cc/paper/6106-learning-additive-exponential-family-graphical-models-via-ell_21-norm-regularized-m-estimation.pdf.
Zhang, Huishuai, and Yingbin Liang. 2016. “Reshaped Wirtinger Flow for Solving Quadratic System of Equations.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2622–30. Curran Associates, Inc. http://papers.nips.cc/paper/6319-reshaped-wirtinger-flow-for-solving-quadratic-system-of-equations.pdf.
zhang, matt, Peng Lin, Peng Lin, Ting Guo, Yang Wang, Yang Wang, and Fang Chen. 2016. “Infinite Hidden Semi-Markov Modulated Interaction Point Process.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3900–3908. Curran Associates, Inc. http://papers.nips.cc/paper/6243-infinite-hidden-semi-markov-modulated-interaction-point-process.pdf.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.