Stanford Theory Seminar (formerly known as Stanford Algorithms Seminar/AFLB) is usually held in Gates 463A.

Faculty Contact: Mary Wootters

Student Contact: Shashwat Silas

Click here for directions.

Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowd-sourced and contain sensitive information. The models should not expose private information in these datasets. Differential Privacy is a standard privacy definition that implies a strong and concrete guarantee on protecting such information. In this talk, I'll then outline two recent approaches to training deep neural networks while providing a differential privacy guarantee, and some new analysis tools we developed in the process. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality. Based on joint works with Martin Abadi, Andy Chu, Úlfar Erlingsson, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Nicolas Papernot, Ananth Raghunathan, Daniel Ramage, Shuang Song and Li Zhang.

We give an explicit pseudorandom generator with seed length poly(log m, 1/\delta) * log n that \delta-fools the class of all m-facet polytopes over {0,1}^n. The previous best seed length had linear dependence on m. As a corollary, we obtain a deterministic quasipolynomial time algorithm for approximately counting the number of feasible solutions of general {0,1}-integer programs.

Joint work with Ryan O'Donnell (CMU) and Rocco Servedio (Columbia)

Locally decodable codes (LDCs) and locally correctable codes (LCCs) are error-correcting codes in which individual bits of the message and codeword, respectively, can be recovered by querying only few bits from a noisy codeword. These codes have found numerous applications both in theory and in practice.

A natural relaxation of LDCs, introduced by Ben-Sasson et al. (SICOMP, 2006), allows the decoder to reject (i.e., refuse to answer) in case it detects that the codeword is corrupt. They call such a decoder a relaxed decoder and construct a constant-query relaxed LDC with almost-linear blocklength, which is sub-exponentially better than what is known for (full-fledged) LDCs in the constant-query regime.

We consider an analogous relaxation for local correction. Thus, a relaxed local corrector reads only few bits from a (possibly) corrupt codeword and either recovers the desired bit of the codeword, or rejects in case it detects a corruption.

We give two constructions of relaxed LCCs in two regimes, where the first optimizes the query complexity and the second optimizes the rate:

1. Constant Query Complexity: A relaxed LCC with polynomial blocklength whose corrector only reads a constant number of bits of the codeword. This is a sub-exponential improvement over the best constant query (full-fledged) LCCs that are known.

2. Constant Rate: A relaxed LCC with constant rate (i.e., linear blocklength) with quasi-polylogarithmic query complexity. This is a nearly sub-exponential improvement over the query complexity of a recent (full-fledged) constant-rate LCC of Kopparty et al. (STOC, 2016).

Joint work with Govind Ramnarayan and Ron Rothblum.

Graphons were invented to model the limit of large, dense graphs. While this led to interesting applications in combinatorics and property testing, most applications require limits of sparse graphs. In this talk, I will review recent progress on graph limits for sparse graphs, and then discuss a couple of applications: non-parametric modelling and estimation of sparse graphs, and recommendation systems where the matrix of known ratings is so sparse that two typical users have never rated the same item, making standard similarity based recommendation algorithms challenging. This is joint work with Jennifer Chayes, Henry Cohn, and many others.

Is matching in NC? In other words, is there a deterministic fast parallel algorithm for it? This has been an open question for over three decades, ever since the discovery of Randomized NC matching algorithms. Within this question, the case of planar graphs has remained an enigma: On the one hand, counting the number of perfect matchings is generally believed to be harder than finding one (the former is #P-complete and the latter is in P), and on the other, for planar graphs, counting has long been known to be in NC whereas finding one has resisted a solution!

The case of bipartite planar graphs was solved by Miller and Naor in 1989 via a flow-based algorithm. In 2000, Mahajan and Varadarajan gave an elegant way of using counting matchings to finding one, hence giving a different NC algorithm.

However, non-bipartite planar graphs still didn't yield: the stumbling block being tight odd cuts. Interestingly enough, these are also a key to the solution: a balanced odd tight cut leads to a straight-forward divide and conquer NC algorithm. The remaining task is to find such a cut in NC. This requires several algorithmic ideas, such as finding a point in the interior of the minimum weight face of the perfect matching polytope, and uncrossing tight odd cuts.

Paper available at: https://arxiv.org/pdf/1709.07822.pdf

Joint work with Vijay Vazirani.

The widespread use of machine learning systems creates a new class of computer security vulnerabilities where, rather than attacking the integrity of the software itself, malicious actors exploit the statistical nature of the learning algorithms. For instance, attackers can add fake data (e.g. by creating fake user accounts), or strategically manipulate inputs to the system once it is deployed.

So far, attempts to defend against these attacks have focused on empirical performance against known sets of attacks. I will argue that this is a fundamentally inadequate paradigm for achieving meaningful security guarantees. Instead, we need algorithms that are provably secure by design, in line with best practices for traditional computer security.

To achieve this goal, we take inspiration from robust optimization and robust statistics, but with an eye towards the security requirements of modern machine learning systems. In particular, we will develop new algorithms for robust learning in high-dimensional settings, as well as for certifiably robust optimization of non-convex models.

Random matrices have become a very active area of research in the recent years and have found enormous applications in modern mathematics, physics, engineering, biological modeling, and other fields. In this work, we focus on symmetric sign (+/-1) matrices (SSMs) that were originally utilized by Wigner to model the nuclei of heavy atoms in mid-50s. Assuming the entries of the upper triangular part to be independent +/-1 with equal probabilities, Wigner showed in his pioneering works that when the sizes of matrices grow, their empirical spectra converge to a non-random measure having a semicircular shape. Later, this fundamental result was improved and substantially extended to more general families of matrices and finer spectral properties. In many physical phenomena, however, the entries of matrices exhibit significant correlations. At the same time, almost all available analytical tools heavily rely on the independence condition making the study of matrices with structure (dependencies) very challenging. The few existing works in this direction consider very specific setups and are limited by particular techniques, lacking a unified framework and tight information-theoretic bounds that would quantify the exact amount of structure that matrices may possess without affecting the limiting semicircular form of their spectra.

From a different perspective, in many applications one needs to simulate random objects. Generation of large random matrices requires very powerful sources of randomness due to the independence condition, the experiments are impossible to reproduce, and atypical or non-random looking outcomes may appear with positive probability. Reliable deterministic construction of SSMs with random-looking spectra and low algorithmic and computational complexity is of particular interest due to the natural correspondence of SSMs and undirected graphs, since the latter are extensively used in combinatorial and CS applications e.g. for the purposes of derandomization. Unfortunately, most of the existing constructions of pseudo-random graphs focus on the extreme eigenvalues and do not provide guaranties on the whole spectrum. In this work, using binary Golomb sequences, we propose a simple completely deterministic construction of circulant SSMs with spectra converging to the semicircular law with the same rate as in the original Wigner ensemble. We show that this construction has close to lowest possible algorithmic complexity and is very explicit. Essentially, the algorithm requires at most 2log(n) bits implying that the real amount of randomness conveyed by the semicircular property is quite small.

We consider the problem of learning a function from samples with L2-bounded noise. In the simplest agnostic learning setting, the number of samples required for robust estimation depends on a condition number that can be arbitrarily large. We show how to improve this dependence in two natural extensions of the setting: a query access setting, where we can estimate the function at arbitrary points, and an active learning setting, where we get a large number of unlabeled points and choose a small subset to label. For linear spaces of functions, such as the family of n-variate degree-d polynomials, this eliminates the dependence on the condition number. The technique can also yield improvements for nonlinear spaces, as we demonstrate for the family of k-Fourier-sparse signals with continuous frequencies.

We study the A-optimal design problem, in which we are given n rank-1 d-by-d positive semidefinite (PSD) matrices, and an integer k, and our goal is to find a set S of k matrices, so that the trace of the inverse of the sum of the matrices in S is minimized. This problem is one variant of the optimal experimental design problem in statistics, where each rank-1 matrix corresponds to a noisy linear measurement of an unknown vector, and the goal is to pick k measurements to take, so as to minimize the average variance of the error of the maximum likelihood estimator of the unknown vector. The problem also finds applications in sensor placement in wireless networks, sparse least squares regression, feature selection for k-means clustering, and low-rank matrix approximation. We introduce proportional volume sampling to obtain improved approximation algorithms for A-optimal design.

In traditional volume sampling we pick a subset of k (rank-1 PSD) matrices with probability proportional to the determinant of their sum. In proportional volume sampling this probability distribution is modified by multiplying the probability of each size k subset S by the probability mu(S) assigned to S by another probability distribution mu. In order to obtain improved approximation algorithms for the A-optimal design problem, we appeal to hard-core distributions mu based on the solution of a convex relaxation of the problem. Our results include a d-approximation when k=d, and a (1+epsilon)-approximation when k is on the order of (d/epsilon) + log(1/epsilon)/epsilon^2. When we are allowed repetitions of the selected rank-1 matrices, we achieve a k/(k-d+1)-approximation, which matches the integrality gap of the natural convex relaxation of the problem. We also show that our results cannot be extended to the related E-optimal design problem in which we maximize the minimum eigenvalue of the sum of selected matrices. In particular, we show that for E-optimal design the integrality gap of the natural convex relaxation is at least 1-epsilon for k as large as d/epsilon^2. We also derive generalizations of the problem to selecting fewer than d matrices, and consider an application to restricted invertibility principles.

Joint work with Mohit Singh and Uthaipon Tao Tantipongpipat

Traditional multi-armed bandit models posit that the payoff distribution of each action (or "arm") is stationary over time, and hence that the goal of learning is to identify the arm with the highest expected payoff and choose that one forever after. However, in many applications the efficacy of an action depends on the amount of time that has elapsed since it was last performed. Examples arise in precision agriculture, online education, and music recommendations. In this talk we introduce a generalization of the multi-armed bandit problem that models such applications. In the course of analyzing algorithms for this problem, we will encounter some interesting combinatorial questions about coloring the integers subject to bounds on the sizes of subintervals that exclude a given color.

This talk is based on joint work with Nicole Immorlica.

We give a constant-factor approximation algorithm for the asymmetric traveling salesman problem. Our approximation guarantee is analyzed with respect to the standard LP relaxation, and thus our result confirms the conjectured constant integrality gap of that relaxation. Our techniques build upon the constant-factor approximation algorithm for the special case of node-weighted metrics. Specifically, we give a generic reduction to structured instances that resemble but are more general than those arising from node-weighted metrics. For those instances, we then solve Local-Connectivity ATSP, a problem known to be equivalent (in terms of constant-factor approximation) to the asymmetric traveling salesman problem. This is joint work with Ola Svensson and László Végh.

Dimensionality reduction in Euclidean space, as attainable by the Johnson-Lindenstrauss lemma, has been a fundamental tool in algorithm design and machine learning. The JL lemma states that any n point subset of l_2 can be mapped to l_2^m with distortion 1+epsilon, where m = O((log n) / epsilon^2). In this talk, I discuss our recent proof that the JL lemma is optimal, in the sense that for any n, d, epsilon, where epsilon is not too small, there is a point set X in l_2^d such that any f:X->l_2^m with 1+epsilon must have m = Omega(epsilon^{-2} log n). I will also discuss some subsequent work and future directions.

Joint work with Kasper Green Larsen (Aarhus University).

No theory seminar today, instead go to the...

Thanks to major advances in neuroscience, we are on the brink of a scientific understanding of how the brain achieves consciousness. This talk will describe neuroscientist Bernard Baars's Global Workspace Model (GWM) of the brain and propose a formal Turing-Machine-like computational model inspired by it for understanding consciousness. One of several consequences of this Model is the possibility of free will in a completely deterministic world. Another deals with the possibility of building machines that are conscious.

More info about the Motwani Colloquium is available here

We study the general problem of testing whether an unknown discrete probability distribution belongs to a specified family of distributions. More specifically, given a distribution family $\mathcal{P}$ and sample access to an unknown discrete distribution $p$, we want to distinguish (with high probability) between the case that $p$ is in $\mathcal{P}$ and the case that $p$ is eps-far, in total variation distance, from every distribution in $\mathcal{P}$. This is the archetypal hypothesis testing problem that has received significant attention in statistics and, over the past decade and roughly a half, in theoretical computer science.

Of course, the sample complexity of this general inference task depends on the underlying family $\mathcal{P}$. The gold standard in distribution testing is to design sample-optimal and computationally efficient algorithms for this task, as a function of $\mathcal{P}$. The main contribution of this work is a simple and general testing technique that is applicable to /all/ distribution families, and is particularly suited to those whose /Fourier spectrum/ satisfies a certain approximate /sparsity/ property. To the best of our knowledge, ours is the first use of the Fourier transform in the context of distribution testing.

We apply our Fourier-based framework to obtain near sample-optimal and computationally efficient testers for the following fundamental distribution families: Sums of Independent Integer Random Variables (SIIRVs), Poisson Multinomial Distributions (PMDs), and Discrete Log-Concave Distributions. For the first two, ours are the first non-trivial testers in the literature, vastly generalizing previous work on testing Poisson Binomial Distributions. For the third, our tester improves on prior work in both sample and time complexity.

Joint work with Ilias Diakonikolas (USC) and Alistair Stwart (USC).

A de-Morgan formula over Boolean variables x_1, ..., x_n is a binary tree whose internal nodes are marked with AND or OR gates and whose leaves are marked with variables or their negation. We define the size of the formula as the number of leaves in it. Proving that some explicit function (in P or NP) requires large formulas is a central open question in computational complexity.

In this work, we introduce a size-amplification hardness reduction for de-Morgan formulas. We show that average-case hardness implies worst-case hardness for a larger size. More precisely, if a function f cannot be computed correctly on more than 1/2 + eps of the inputs by any formula of size s, then computing f correctly on all inputs requires size ~s*log(1/eps). The tradeoff is essentially tight. Quite surprisingly, the proof relies on a result from quantum query complexity by Reichardt [SODA, 2011].

As an application, we improve the best known formula size lower bounds for explicit functions by logarithmic factors to ~n^3/log(n). In addition, we propose candidates for explicit functions that we believe have formula size ~n^4, and prove non trivial super-quadratic formula size lower bounds for them using our reduction. Learning discrete Markov Random Fields with nearly optimal runtime and sample complexity.

Our approach is new and uses a multiplicative-weight update algorithm. Our algorithm-- Sparsitron-- is easy to implement (has only one parameter) and holds in the online setting. It also gives the first provably efficient solution to the problem of learning sparse Generalized Linear Models (GLMs).

Joint work with Adam Klivans.

Joint work with Sushant Sachdeva.

The crucial techniques are about expanders: 1) an algorithm for decomposing a graph into a collection of expanders in near-linear time, and 2) an algorithm for "repairing" the expansion property of an expander after deleting some edges of it. These techniques can be of independent interest.

This talk is based on results by [Nanongkai, Saranurak and Wulff-Nilsen, FOCS'17], [Nanongkai and Saranurak, STOC'17] and [Wulff-Nilsen, STOC'17].

This talk is based on a joint-work with Shaddin Dughmi, Jason Hartline, and Bobby Kleinberg.