Stanford Theory Seminar (formerly known as Stanford Algorithms Seminar/AFLB) is usually held in Gates 498 or Theory Lounge(Gates 463A)

Faculty Contact: Gregory Valiant

Student Contact: Weihao Kong

Click here for directions.

This is joint work with Chandra Chekuri.

Equivalently, the algorithm takes input a subspace $W \subseteq \mathbb{R}^{n^2}$ and distinguishes between the case that $W$ contains a rank one matrix and the case that every rank one matrix is at least $\epsilon$ far (in the Euclidean distance) from $W$.

At the heart of our result is a general rounding paradigm that uses polynomial reweightings to round the solution to the Sum-of-Squares (SoS) semidefinite programming relaxations. Somewhat surprisingly, the construction of such a polynomial reweighting scheme uses an argument inspired by the recent breakthrough on log-rank conjecture by Lovett (STOC’14, JACM’16) who showed that the communication complexity of every rank-n Boolean matrix is bounded by $\sqrt{n} poly log(n)$.

I will assume no specialized background in the talk.

Based on joint work with Boaz Barak (Harvard) and David Steurer (Cornell, IAS).

Our second main contribution is showing that the structural properties required for our oracle-efficient online algorithm are present in a large class problems. As an example, we discuss applications of our oracle-efficient learning results to the adaptive optimization of a large class of auctions, including (1) VCG auctions with bidder-specific reserves in single-parameter settings, (2) envy-free item pricing in multi-item auctions, and (3) Myerson-like auctions for single-item settings.

Bio: Nika Haghtalab is a Ph.D. student at the Computer Science department of Carnegie Mellon University, co-advised by Avrim Blum and Ariel Procaccia. Her research interests include machine learning theory, computational aspects of economics, and algorithms. Nika is a recipient of the IBM and Microsoft Research Ph.D. fellowships.

As a running example we employ the stochastic block model -- a widely studied family of random graph models which contain latent community structure. We recover and unify the proofs of the best-known sample complexity bounds for the partial recovery problem in this model. We also give the first provable guarantees for partial recovery of community structure in constant-degree graphs where nodes may participate in many communities simultaneously. This model is known to exhibit a sharp sample complexity threshold -- with fewer than a very specific number of samples, recovering community structure becomes impossible. While previous explanations for this phenomenon appeal to sophisticated ideas from statistical mechanics, we give a new and simple explanation based on properties of low-degree polynomials.

Joint work with David Steurer.

In this talk, I will present examples of such new data structure design ideas and implementations. In particular, I will discuss some inherent limitations of parallelizing classic data structures, and then focus on approaches to circumvent these limitations. The first approach is to relax the software semantics, to allow for approximation, randomization, or both. The second is to modify the underlying hardware architecture to unlock more parallelism. Time permitting, I will also cover results showing that both approaches can improve real-world performance, and touch upon some of the major open questions in the area.

Short bio: Dan Alistarh is an Assistant Professor at IST Austria, currently visiting ETH Zurich on an SNF Ambizione Fellowship. Previously, he was a Researcher at Microsoft Research, Cambridge, UK, and a Postdoctoral Associate at MIT CSAIL. He received his PhD from the EPFL, under the guidance of Prof. Rachid Guerraoui. His research focuses on distributed algorithms and concurrent data structures, and spans from algorithms and lower bounds to practical implementations.

Gopalan et al [CCC 2016] conjectured a robust analogue of the sensitivity conjecture: if most of the nodes have low sensitivity, then most of the Fourier mass is supported on levels with low hamming weight. They proved it under the stronger assumption, that all nodes have low sensitivity. In this work, we prove the robust sensitivity conjecture with near optimal quantitative bounds. This provides new insights on the relation between smoothness and the Fourier structure of boolean functions.

The talk does not assume any previous knowledge.

Joint work with Avishay Tal (IAS) and Jiapeng Zhang (UCSD).

This is a joint work with Omer Angel, Sebastien Bubeck, and Yuval Peres.

We give provable bounds for algorithms that are based on variational methods: a technique which is very popular in practice but rather poorly understood theoretically in comparison to MCMC methods.

We make use of recent tools in combinatorial optimization: the Sherali-Adams and Lasserre convex programming hierarchies to get algorithms for "dense" Ising models. These techniques give new, non-trivial approximation guarantees for the partition function beyond the regime of correlation decay. They also generalize some classical results from statistical physics about the Curie-Weiss ferromagnetic Ising model, as well as provide a partition function counterpart of classical results about max-cut on dense graphs. With this, we connect techniques from two apparently disparate research areas -- optimization and counting/partition function approximations.

Time permitting, we also will show worst-case guarantees for coarse (multiplicative) approximations to the log-partition functions in the worst-case setting (i.e. for arbitrary Ising models).

In the continuous setting, if the signal is sampled over a duration T, it is impossible to robustly identify frequencies that lie too close to each other (within 1/T). We show that one can achieve this bound to within a log factor while maintaining constant approximation factor and good sample complexity. We furthermore show that the barrier does not apply to estimating the signal as a whole: we give a sample-efficient algorithm to robustly estimate Fourier-sparse signals, regardless of the frequency gap.

As a special case, we get an algorithm to interpolate degree d polynomials from noisy measurements, using O(d) samples and O~(d) time, while increasing the error by a constant factor in L2.

Joint work with Xue Chen, Daniel Kane, and Zhao Song. Based on https://arxiv.org/abs/1609.00896 and https://arxiv.org/abs/1609.01361 .

1. A notion of SQ dimension of a problem that lower bounds the SQ complexity

2. Lower bounds on the SQ dimension of constraint satisfaction problems

3. SQ algorithms for stochastic convex optimization.

I'll overview these results and give some of their additional applications.

* Statistical query algorithms (Kearns, 1993) are algorithms that instead of random samples from an input distribution D over a domain X, have access to a SQ oracle for D. Given a real-valued query function g over X, the oracle returns an estimate of the expectation of g on a sample chosen randomly from D.

Based on joint works with C. Guzman, W. Perkins and S. Vempala.

In this work, we give the first efficient algorithms for robustly learning a high-dimensional Gaussian that are able to tolerate a constant fraction of corruptions. Our techniques also yield robust estimators for several other high-dimensional models, including Bayesian networks, and various mixture models.

The talk will be based on joint works with (different subsets of) G. Kamath, D. Kane, J. Li, A. Moitra, and A. Stewart.

Note that our algorithm is also faster than the fastest randomized algorithm for this problem, which is Karger's 1996 O(m log^3 n)-time algorithm.

This is joint work with Satish Rao and Di Wang and will appear at SODA 2017.

The work in this talk has been supported by Google, Yahoo and the NSF.

Moreover, we introduce a new framework that we call pseudocalibration to construct sum-of-squares lower bounds. It yields a general recipe for constructing good pseudo-distributions (i.e. dual certificates for the sum-of-squares semidefinite program) and sheds further light on the ways in which this hierarchy differs from others.

Joint work with Boaz Barak, Jon Kelner, Pravesh Kothari, Ankur Moitra, and Aaron Potechin.

Joint work with Constantinos Daskalakis Link to paper: https://arxiv.org/abs/1511.01411

The Fourier transform is one of the most important linear operators in engineering. Computing it for a given input vector x in n-space can be done in time Theta(n log n) using Cooley and Tukey's FFT (Fast Fourier Transform). A lower bound better than trivial \Omega(n) has been evasive. In this talk I will show the surprising connection between speeding up FFT and uncertainty: If you could (theoretically) speed up FFT without changing the word size on the machine, then you would create uncertainty in the data in the form of numerical overflow or underflow. Alas, increasing the machine word size costs more time per multiplication/addition operation.

I will explain and prove this principle, and present two very recent related bounds together with a main conjecture and some related open problems.

The talk requires standard linear algebra and very basic familiaritywith Shannon entropy.

In this work, we show that, in fact, RS codes are much better for distributed storage than you might think! Our main result is that, in some parameter regimes, RS codes themselves are optimal regenerating codes, among MDS codes with linear repair schemes. Moreover, we give a characterization of MDS codes with good linear repair schemes which holds in any parameter regime, and which can be used to give non-trivial repair schemes for RS codes in other settings.

Along the way, we'll develop a surprising fact about low-degree polynomials (at least, surprising to me). A fundamental fact about polynomial interpolation is that k evaluations determine a degree k-1 polynomial. This is necessary in a strong sense: if we have only k-1 evaluations, we don't learn anything at all about the evaluation on a k'th point. However, our repair scheme for RS codes shows that if we are allowed to consider partial evaluations (in the sense that we only get a few bits of information about the evaluation), then in fact we can do better (in the sense that we can use fewer bits total to determine an unknown evaluation).

This is joint work with Venkat Guruswami.

To achieve these improved running times I will provide an improved algorithm for computing solutions to directed Laplacian systems, a natural generalization of symmetric or undirected Laplacian systems. In particular, I will discuss how to solve a directed Laplacian systems associated with a directed graph with "m" edges and "n" vertices in time \tilde{O}(m^{3/4}n+mn^{2/3}), where the \tilde{O} notation suppresses polylogarithmic factors n, m, the accuracy desired, and the appropriate condition number of the problem. For sparse graphs with polynomial condition numbers, this gives a running time of \tilde{O}(n^{7/4}), which breaks the O(n^{2}) barrier that would be the best that one could hope to achieve using fast matrix multiplication. Along the way, I will introduce some general techniques for reasoning about directed Laplacian systems that may be of independent interest.

This talk reflects joint work with Michael B. Cohen, Jonathan Kelner, John Peebles, Richard Peng, and Adrian Vladu.