ABSTRACT

This tutorial provides an introduction to a rapidly evolving topic: the theory of negative dependence and its numerous ramifications in machine learning. Indeed, negatively dependent probability measures provide a powerful tool for modeling non-i.i.d. data, and thus can impact all aspects of learning, including supervised, unsupervised, interpretable, interactive, and large-scale setups. The most well-known examples of negatively dependent distributions are perhaps the Determinantal Point Processes (DPPs), which have already found numerous ML applications. But DPPs are just the tip of the iceberg; the class of negatively dependent measures is much broader, and given the vast web of mathematical connections it enjoys, its holds great promise as a tool for machine learning. This tutorial exposes the ML audience to this rich mathematical toolbox, while outlining key theoretical ideas and motivating fundamental applications. Tasks that profit from negative dependence include anomaly detection, information maximization, experimental design, validation of black-box systems, architecture learning, fast MCMC sampling, dataset summarization, interpretable learning. Time permitting, we will also touch up recent related breakthrough on the broader class of so-called "completely log-concave" polynomials.

1. Introduction

This tutorial will provide an introduction to the rich area of negatively dependent probability measures. In particular, the tutorial will outline key conceptual points in the theory of negative dependence in discrete probability, while motivating the topic by fundamental machine learning tasks (including supervised, unsupervised, interpretable, interactive, and large-scale setups). The foundations established by the tutorial will help the audience gain a new perspective on many ML problems, and accouter them with a new set of tools for their modeling and algorithmic repertoire.

Specifically, we focus on Strongly Rayleigh (SR) probability measures. These were introduced in the landmark paper of Borcea et al., (2009), who developed a rich theory of negatively dependent (ND) measures. The ND property reflects a ``repelling'' nature of items, a property that occurs more broadly across probability theory, combinatorial optimization, physics, and other fields. However, negative association is a subtle property, and the class of SR measures provides the most useful characterization of this concept by grounding it in the fascinating theory of multivariate stable polynomials. The most notable examples of SR measures are Determinantal Point Processes (DPPs), which have received significant attention in machine learning too. However, the full potential of SR measures remains to be developed and discovered, and by grounding our tutorial in the mature use of DPPs, while outlining SR measures, we hope to raise awareness in the community and spur future developments. The key driver behind this thinking is a theoretical observation of Borcea et al., that the class of SR measures is exponentially larger than DPPs

Our key emphasis, however, will be on core concepts. Hence, we will not cover detailed proofs, but will strive to highlight the rich variety of mathematical ideas underlying the theory of negative dependence, DPPs, SR measures, and related concepts, thereby helping the audience understand, appreciate, apply, and enhance the material being presented. We outline below a rough table-of-contents of the material this tutorial plans to cover.

2. Outline

  • Introduction:
    • Definitions of positive and negative dependence (ND)
    • Motivating examples in machine learning tasks where ND is useful
    • Illustration of the difficulties of modeling ND
  • Strongly Rayleigh (SR) Measures:
    • Basics, definitions and theory
    • Determinantal Point Processes
    • Volume Sampling, Dual Volume Sampling
    • Learning SR measures from data
    • Fast Markov-Chain Monte Carlo Sampling
  • Applications in ML that benefit from Negative Dependence:
    • Active learning, Interactive learning
    • Kernelized regression
    • Experimental Design
    • Architecture Learning for Neural Networks
    • Column subset selection
    • Scaling up Kernel methods via Nyström
    • Speeding up SGD via Negative Dependence
    • Recommender Systems
    • Adversarial models, negative mining
  • Perspectives:
    • Generalizations of SR measures
    • Connections to log-submodularity
    • Open theoretical problems
    • Other ML settings and uses of ND
  • Summary and Wrap up

References

  1. Alex Kulesza and Ben Taskar (2012), "Determinantal Point Processes for Machine Learning", Foundations and Trends® in Machine Learning: Vol. 5: No. 2–3, pp 123-286.
  2. Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Polynomial time algorithms for dual volume sampling. NIPS 2017.
  3. Michal Derezinski and Manfred K. Warmuth. Unbiased estimates for linear regression via volume sampling. NIPS 2017.
  4. Zelda Mariet and Suvrit Sra. Diversity Networks. Proceedings of ICLR 2016.
  5. Julius Borcea, Petter Brändén and Thomas M. Liggett. Negative dependence and the geometry of polynomials. Journal American Math. Society, 22(2009).
  6. Robin Pemantle. Towards a theory of negative dependence. Journal of Mathematical Physics 41, 1371 (2000).
  7. John Urschel, Victor-Emmanuel Brunel, Ankur Moitra, and Philippe Rigollet. Learning Determinantal Point Processes with Moments and Cycles.
  8. Zelda Mariet and Suvrit Sra. Fixed-point algorithms for learning determinantal point processes. ICML 2016.
  9. J. Gillenwater, A. Kulesza, E. Fox, and B. Taskar. Expectation-Maximization for learning Determinantal Point Processes. NIPS 2014.
  10. Mike Gartrell, Ulrich Paquet, Noam Koenigstein. Bayesian low-rank determinantal point processes. ACM Conference on Recommender Systems, 2016.
  11. N. Anari, S. O. Gharan and A. Rezaei. Monte Carlo Markov Chain Algorithms for Sampling Strongly Rayleigh Distributions and Determinantal Point Processes. COLT 2016.
  12. Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Fast DPP Sampling for Nystr\"om with Application to Kernel Methods. ICML 2016.
  13. Chengtao Li, Suvrit Sra and Stefanie Jegelka. Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling. NIPS 2016.
  14. Tarun Kathuria, Amit Deshpande, and Pushmeet Kohli. Batched Gaussian Process Bandit Optimization via Determinantal Point Processes. NIPS 2016.

Copyright © Suvrit Sra 2018