Suvrit Sra (∯∐∇⊉I⊤ ∳⊋⟑)

OPTIMIZATION FOR MACHINE LEARNING

• Geometric Optimization

Geometry plays a deep role in optimization, be it for computational speed, algorithm design, or complexity theory. My research in geometric optimization comprises two main topics:

Algorithms and complexity theory for optimization on non-Euclidean spaces such as Riemannian manifolds, CAT(0) geometries, Wasserstein spaces, etc., inspired by usual theory of polynomial time convex optimization (which can be recovered for instance when the manifold curvature goes to zero).
New applications of non-Euclidean optimization as well as scaling up old ones via geometric insights.

Selected publications

S. Sra, R. Hosseini. Conic geometric optimization. SIAM J. Optimization, 2015.
H. Zhang, S. Sra. First-order methods for geodesically convex optimization. Conference on Learning Theory (COLT), 2016.
H. Zhang, S. Reddi, S. Sra. Riemannian SVRG: Fast stochastic optimization on Riemannian manifolds. NIPS, 2016.
R. Hosseini, S. Sra. An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization. arXiv, 2017
S. Sra. On the Matrix Square Root via Geometric Optimization Electronic J. Linear Algebra, 2016.
P. Zadeh, R. Hosseini, S. Sra. Geometric mean metric learning. ICML 2016.

Ancient and modern machine learning rely on solving a variety of nonconvex optimization problems, for instance: \begin{eqnarray*} \mathrm{(ERM)}\quad &&\min_x\ f(x) := \frac{1}{n}\sum_i f_i(x),\\ \mathrm{(Stochastic)}\quad &&\min_x\ f(x) := \mathbb{E}_\xi[F(x,\xi)], \end{eqnarray*} where all the function involved can be nonconvex (e.g., in deep learning).
My work on solving these problems focuses on the following key aspects:

Design, analysis, and implementation of algorithms that obtain an approximate critical points provably fast.
Methods that can provably escape saddle points rapidly (e.g., by mixing first and second-order information)
General nonconvex problems with other types of structure that enables global optimality
Design and deveopment of distributed and parallel methods for the above problems.

Selected publications

S. Sra. Scalable nonconvex inexact proximal splitting. NIPS, 2012.
S. Reddi, S. Sra, B. Poczos, A. Smola. Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. NIPS, 2016.
S. Reddi, A. Hefny, S. Sra, B. Poczos, A. Smola. Stochastic variance reduction for nonconvex optimization. ICML, 2016.
S. Reddi, S. Sra, B. Poczos, A. Smola. Stochastic Frank-Wolfe Methods for Nonconvex Optimization. Allerton CC, 2016.
S. Reddi, S. Sra, B. Poczos, A. Smola. Fast incremental method for smooth nonconvex optimization. IEEE CDC, 2016.

• Convex optimization

Interior point methods (new)
Derivative free optimization (new)
Stochastic convex optimization, constrained and unconstrained
Parallel and distributed convex optimization
Fast proximity solvers; Fenchel conjugates
Semidefinite programming, and large scale conic programming.

Selected publications

Á. Barbero, S. Sra. Modular proximal optimization with application to total variation regularization. arXiv, 2017.
Y. Wang, V. Sadhanala, W. Dai, W. Neiswanger, S. Sra, E. P. Xing Asynchronous Parallel Block-Coordinate Frank-Wolfe. ICML, 2016.
S. Sra, A. Wei Yu, M. Li, A. Smola. AdaDelay: Delay sensitive distributed stochastic convex optimization. AISTATS, 2016.
S. Reddi, A. Hefny, S. Sra, B. Poczos, A. Smola. Asynchronous variance reduced stochastic gradient descent. NIPS, 2015.
S. Reddi, A. Hefny, C. Downey, A. Dubey, S. Sra. Large-scale randomized-coordinate descent methods with non-separable linear constraints. UAI, 2015.
M. Wytock, S. Sra, Z. Kolter. Fast Newton methods for the group fused Lasso. UAI, 2014
S. Azadi, S. Sra. Towards stochastic alternating direction method of multipliers. ICML, 2014
S. Jegelka, F. Bach, S. Sra. Reflection methods for user-friendly submodular optimization. NIPS, 2013

THEORY OF DEEP LEARNING

Our group is studying several theoretical questions in deep learning. In particular, we are studying

Global and local optimality properties
stability, robustness, and generalization theory
new models for learning representations, better training algorithms

Selected publications

C. Yun, S. Sra, A. Jadbabie. A critical view of global optimality in deep learning.
C. Yun, S. Sra, A. Jadbabie. Global optimality conditions for deep neural networks. arXiv, 2017.
C. Li, D. Alvarez-Melis, K. Xu, S. Jegelka, S. Sra. Distributional Adversarial Networks. arXIv, 2017.
A. Cherian, S. Sra, R. Hartley. Sequence Summarization Using Order-constrained Kernelized Feature Subspaces. arXiv, 2017.
Z. Mariet, S. Sra. Diversity Networks: Neural Network Compression Using Determinantal Point Processes. ICLR, 2016.

DISCRETE PROBABILITY FOR ML

\[ f_\mu(z) := \sum_{S \subseteq \{1,\ldots,n\}} \mu(S)\prod_{i\in S}z_i. \] Important examples include: Determinantal Point Processes, Dual-Volume distribution, etc. Our work here focuses on

Selected publications

C. Li, S. Jegelka, S. Sra. Polynomial Time Algorithms for Dual Volume Sampling. arXiv, 2017.
C. Li, S. Jegelka, S. Sra. Fast DPP sampling for Nyström with application to kernel methods. ICML, 2016.
C. Li, S. Jegelka, S. Sra. Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling. NIPS, 2016.
Z. Mariet, S. Sra. Fixd-point algorithms for learning determinantal point processes. Inf. Conf. on Machine Learning (ICML), 2015.
Z. Mariet, S. Sra. Kronecker Determinantal Point Processes. NIPS, 2015.
Z. Mariet, S. Sra. Elementary Symmetric Polynomial based Optimal Design. arXiv, 2017.

PURE & APPLIED MATH

Probability theory, convex geometry, Brunn-Minkowski theory
Algebraic combinatorics, positivity questions, symmetric polynomials
Noncommutative algebra, matrix inequalities
Stable polynomials, hyperbolic polynomials, conic geometries
Differential geometry, metric geometry

Selected publications

S. Sra. Positive Definite Matrices and the S-Divergence. Proc. American Math. Society (PAMS), 2016.
S. Sra. On inequalities for normalized Schur functions. Eur. J. Combinatorics, 2016.
L. Borisov, P. Neff, S. Sra, C. Thiel. The sum of squared logarithms inequality in arbitrary dimensions. LAA, 2017.
W. Berndt, S. Sra. Hlawka-Popoviciu inequalities on positive definite tensors. LAA, 2015.
S. Sra. Logarithmic inequalities under an elementary symmetric polynomial dominance order. arXiv, 2015.

APPLICATIONS

Computer Vision; Graphics; Healthcare; Materials Design; Statistics; etc

Selected publications

P. Zadeh, R. Hosseini, S. Sra. Geometric mean metric learning. ICML 2016.
S. Sra. Directional Statistics in Machine Learning: a Brief Review. arXiv, 2016
A. Cherian, S. Sra. Riemannian dictionary learning and sparse coding for positive definite matrices. IEEE TNNLS, 2016.
R. Hosseini, S. Sra, L. Theis, M. Bethge. Inference and mixture modelling with the Elliptical Gamma Distribution. CSDA, 2016
A. Wei Yu, W. Ma, Y. Yu, J. Carbonell, S. Sra. Efficient Structured Matrix Rank Minimization. NIPS, 2014
A. Cherian, S. Sra, V. Morellas, N. Papanikolopoulos. Efficient nearest neighbors via robust sparse hashing. IEEE TIP, 2014.
A. Cherian, S. Sra, A. Banerjee, N. Papanikolopoulos. Jensen-Bregman LogDet Divergence for Efficient Similarity Computations on Positive Definite Tensors. IEEE TPAMI, 2013.

Research Overview

OPTIMIZATION FOR MACHINE LEARNING

• Geometric Optimization

• Nonconvex optimization

• Convex optimization

THEORY OF DEEP LEARNING

DISCRETE PROBABILITY FOR ML

PURE & APPLIED MATH

APPLICATIONS