On the importance of initialization and momentum in deep learning I Sutskever, J Martens, G Dahl, G Hinton International conference on machine learning, 1139-1147, 2013 | 4480 | 2013 |
Generating text with recurrent neural networks I Sutskever, J Martens, GE Hinton ICML, 2011 | 1632 | 2011 |
Deep learning via hessian-free optimization. J Martens ICML 27, 735-742, 2010 | 1022 | 2010 |
Learning recurrent neural networks with hessian-free optimization J Martens, I Sutskever ICML, 2011 | 709 | 2011 |
Optimizing neural networks with kronecker-factored approximate curvature J Martens, R Grosse International conference on machine learning, 2408-2417, 2015 | 577 | 2015 |
Adding gradient noise improves learning for very deep networks A Neelakantan, L Vilnis, QV Le, I Sutskever, L Kaiser, K Kurach, J Martens arXiv preprint arXiv:1511.06807, 2015 | 423 | 2015 |
New insights and perspectives on the natural gradient method J Martens arXiv preprint arXiv:1412.1193, 2014 | 318 | 2014 |
The mechanics of n-player differentiable games D Balduzzi, S Racaniere, J Martens, J Foerster, K Tuyls, T Graepel International Conference on Machine Learning, 354-363, 2018 | 218 | 2018 |
Training deep and recurrent networks with hessian-free optimization J Martens, I Sutskever Neural networks: Tricks of the trade, 479-535, 2012 | 187 | 2012 |
A kronecker-factored approximate fisher matrix for convolution layers R Grosse, J Martens International Conference on Machine Learning, 573-582, 2016 | 176 | 2016 |
Adversarial robustness through local linearization C Qin, J Martens, S Gowal, D Krishnan, K Dvijotham, A Fawzi, S De, ... Advances in Neural Information Processing Systems 32, 2019 | 164 | 2019 |
On the representational efficiency of restricted boltzmann machines J Martens, A Chattopadhya, T Pitassi, R Zemel Advances in Neural Information Processing Systems 26, 2013 | 70 | 2013 |
Distributed second-order optimization using Kronecker-factored approximations J Ba, R Grosse, J Martens | 67 | 2016 |
Fast convergence of natural gradient descent for over-parameterized neural networks G Zhang, J Martens, RB Grosse Advances in Neural Information Processing Systems 32, 2019 | 64 | 2019 |
Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model G Zhang, L Li, Z Nado, J Martens, S Sachdeva, G Dahl, C Shallue, ... Advances in neural information processing systems 32, 2019 | 59 | 2019 |
On the expressive efficiency of sum product networks J Martens, V Medabalimi arXiv preprint arXiv:1411.7717, 2014 | 59 | 2014 |
Estimating the hessian by back-propagating curvature J Martens, I Sutskever, K Swersky arXiv preprint arXiv:1206.6464, 2012 | 57 | 2012 |
Second-order optimization for neural networks J Martens University of Toronto (Canada), 2016 | 50 | 2016 |
Differentiable game mechanics A Letcher, D Balduzzi, S Racaniere, J Martens, J Foerster, K Tuyls, ... The Journal of Machine Learning Research 20 (1), 3032-3071, 2019 | 49 | 2019 |
Kronecker-factored curvature approximations for recurrent neural networks J Martens, J Ba, M Johnson International Conference on Learning Representations, 2018 | 49 | 2018 |