Stanford Theory Seminar

Stanford Theory Seminar (formerly known as Stanford Algorithms Seminar/AFLB) is usually held in Gates 463A.
Faculty Contact: Mary Wootters
Student Contact: Shashwat Silas
Click here for directions.

March 8, 2018

Gates 463A, 4:15PM

Kunal Talwar (Google Brain)

Two approaches to (Deep) learning with Differential Privacy

Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowd-sourced and contain sensitive information. The models should not expose private information in these datasets. Differential Privacy is a standard privacy definition that implies a strong and concrete guarantee on protecting such information. In this talk, I'll then outline two recent approaches to training deep neural networks while providing a differential privacy guarantee, and some new analysis tools we developed in the process. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality. Based on joint works with Martin Abadi, Andy Chu, Úlfar Erlingsson, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Nicolas Papernot, Ananth Raghunathan, Daniel Ramage, Shuang Song and Li Zhang.

(Tuesday!) March 6, 2018

Gates 415, 2PM Note the unusual room, day, and time!

Li-Yang Tan (TTIC)

Fooling Polytopes

We give an explicit pseudorandom generator with seed length poly(log m, 1/\delta) * log n that \delta-fools the class of all m-facet polytopes over {0,1}^n. The previous best seed length had linear dependence on m. As a corollary, we obtain a deterministic quasipolynomial time algorithm for approximately counting the number of feasible solutions of general {0,1}-integer programs.

Joint work with Ryan O'Donnell (CMU) and Rocco Servedio (Columbia)

February 22, 2018

Gates 463A, 4:15PM

Tom Gur (Berkeley)

Relaxed Locally Correctable Codes

Locally decodable codes (LDCs) and locally correctable codes (LCCs) are error-correcting codes in which individual bits of the message and codeword, respectively, can be recovered by querying only few bits from a noisy codeword. These codes have found numerous applications both in theory and in practice.

A natural relaxation of LDCs, introduced by Ben-Sasson et al. (SICOMP, 2006), allows the decoder to reject (i.e., refuse to answer) in case it detects that the codeword is corrupt. They call such a decoder a relaxed decoder and construct a constant-query relaxed LDC with almost-linear blocklength, which is sub-exponentially better than what is known for (full-fledged) LDCs in the constant-query regime.

We consider an analogous relaxation for local correction. Thus, a relaxed local corrector reads only few bits from a (possibly) corrupt codeword and either recovers the desired bit of the codeword, or rejects in case it detects a corruption.

We give two constructions of relaxed LCCs in two regimes, where the first optimizes the query complexity and the second optimizes the rate:

1. Constant Query Complexity: A relaxed LCC with polynomial blocklength whose corrector only reads a constant number of bits of the codeword. This is a sub-exponential improvement over the best constant query (full-fledged) LCCs that are known.

2. Constant Rate: A relaxed LCC with constant rate (i.e., linear blocklength) with quasi-polylogarithmic query complexity. This is a nearly sub-exponential improvement over the query complexity of a recent (full-fledged) constant-rate LCC of Kopparty et al. (STOC, 2016).

Joint work with Govind Ramnarayan and Ron Rothblum.

February 15, 2018

Gates 463A, 4:15PM

Christian Borgs (MSR)

Graphons: From Graph Limits to Non-Parametric Estimation and Recommendation Systems

Graphons were invented to model the limit of large, dense graphs. While this led to interesting applications in combinatorics and property testing, most applications require limits of sparse graphs. In this talk, I will review recent progress on graph limits for sparse graphs, and then discuss a couple of applications: non-parametric modelling and estimation of sparse graphs, and recommendation systems where the matrix of known ratings is so sparse that two typical users have never rated the same item, making standard similarity based recommendation algorithms challenging. This is joint work with Jennifer Chayes, Henry Cohn, and many others.

February 8, 2018

Gates 463A, 4:15PM

Nima Ahmadipouranari (Stanford)

Planar Graph Perfect Matching is in NC

Is matching in NC? In other words, is there a deterministic fast parallel algorithm for it? This has been an open question for over three decades, ever since the discovery of Randomized NC matching algorithms. Within this question, the case of planar graphs has remained an enigma: On the one hand, counting the number of perfect matchings is generally believed to be harder than finding one (the former is #P-complete and the latter is in P), and on the other, for planar graphs, counting has long been known to be in NC whereas finding one has resisted a solution!

The case of bipartite planar graphs was solved by Miller and Naor in 1989 via a flow-based algorithm. In 2000, Mahajan and Varadarajan gave an elegant way of using counting matchings to finding one, hence giving a different NC algorithm.

However, non-bipartite planar graphs still didn't yield: the stumbling block being tight odd cuts. Interestingly enough, these are also a key to the solution: a balanced odd tight cut leads to a straight-forward divide and conquer NC algorithm. The remaining task is to find such a cut in NC. This requires several algorithmic ideas, such as finding a point in the interior of the minimum weight face of the perfect matching polytope, and uncrossing tight odd cuts.

Paper available at:

Joint work with Vijay Vazirani.

(Tuesday!) February 6, 2018

Gates 463A, 3:00PM (Note the unusual day and time!)

Jacob Steinhardt (Stanford)

Provably Secure Machine Learning

The widespread use of machine learning systems creates a new class of computer security vulnerabilities where, rather than attacking the integrity of the software itself, malicious actors exploit the statistical nature of the learning algorithms. For instance, attackers can add fake data (e.g. by creating fake user accounts), or strategically manipulate inputs to the system once it is deployed.

So far, attempts to defend against these attacks have focused on empirical performance against known sets of attacks. I will argue that this is a fundamentally inadequate paradigm for achieving meaningful security guarantees. Instead, we need algorithms that are provably secure by design, in line with best practices for traditional computer security.

To achieve this goal, we take inspiration from robust optimization and robust statistics, but with an eye towards the security requirements of modern machine learning systems. In particular, we will develop new algorithms for robust learning in high-dimensional settings, as well as for certifiably robust optimization of non-convex models.

February 1, 2018

Gates 463A, 4:15PM

Ilya Soloveychik (Harvard)

Deterministic Random Matrices

Random matrices have become a very active area of research in the recent years and have found enormous applications in modern mathematics, physics, engineering, biological modeling, and other fields. In this work, we focus on symmetric sign (+/-1) matrices (SSMs) that were originally utilized by Wigner to model the nuclei of heavy atoms in mid-50s. Assuming the entries of the upper triangular part to be independent +/-1 with equal probabilities, Wigner showed in his pioneering works that when the sizes of matrices grow, their empirical spectra converge to a non-random measure having a semicircular shape. Later, this fundamental result was improved and substantially extended to more general families of matrices and finer spectral properties. In many physical phenomena, however, the entries of matrices exhibit significant correlations. At the same time, almost all available analytical tools heavily rely on the independence condition making the study of matrices with structure (dependencies) very challenging. The few existing works in this direction consider very specific setups and are limited by particular techniques, lacking a unified framework and tight information-theoretic bounds that would quantify the exact amount of structure that matrices may possess without affecting the limiting semicircular form of their spectra.

From a different perspective, in many applications one needs to simulate random objects. Generation of large random matrices requires very powerful sources of randomness due to the independence condition, the experiments are impossible to reproduce, and atypical or non-random looking outcomes may appear with positive probability. Reliable deterministic construction of SSMs with random-looking spectra and low algorithmic and computational complexity is of particular interest due to the natural correspondence of SSMs and undirected graphs, since the latter are extensively used in combinatorial and CS applications e.g. for the purposes of derandomization. Unfortunately, most of the existing constructions of pseudo-random graphs focus on the extreme eigenvalues and do not provide guaranties on the whole spectrum. In this work, using binary Golomb sequences, we propose a simple completely deterministic construction of circulant SSMs with spectra converging to the semicircular law with the same rate as in the original Wigner ensemble. We show that this construction has close to lowest possible algorithmic complexity and is very explicit. Essentially, the algorithm requires at most 2log(n) bits implying that the real amount of randomness conveyed by the semicircular property is quite small.

January 18, 2018

Gates 463A, 4:15PM

Eric Price (UT Austin)

Condition number-free query and active learning of linear families

We consider the problem of learning a function from samples with L2-bounded noise. In the simplest agnostic learning setting, the number of samples required for robust estimation depends on a condition number that can be arbitrarily large. We show how to improve this dependence in two natural extensions of the setting: a query access setting, where we can estimate the function at arbitrary points, and an active learning setting, where we get a large number of unlabeled points and choose a small subset to label. For linear spaces of functions, such as the family of n-variate degree-d polynomials, this eliminates the dependence on the condition number. The technique can also yield improvements for nonlinear spaces, as we demonstrate for the family of k-Fourier-sparse signals with continuous frequencies.

December 7, 2017

Gates 463A, 4:15PM

Aleksandar Nikolov (Toronto)

Proportional Volume Sampling and Approximation Algorithms for A-Optimal Design

We study the A-optimal design problem, in which we are given n rank-1 d-by-d positive semidefinite (PSD) matrices, and an integer k, and our goal is to find a set S of k matrices, so that the trace of the inverse of the sum of the matrices in S is minimized. This problem is one variant of the optimal experimental design problem in statistics, where each rank-1 matrix corresponds to a noisy linear measurement of an unknown vector, and the goal is to pick k measurements to take, so as to minimize the average variance of the error of the maximum likelihood estimator of the unknown vector. The problem also finds applications in sensor placement in wireless networks, sparse least squares regression, feature selection for k-means clustering, and low-rank matrix approximation. We introduce proportional volume sampling to obtain improved approximation algorithms for A-optimal design.

In traditional volume sampling we pick a subset of k (rank-1 PSD) matrices with probability proportional to the determinant of their sum. In proportional volume sampling this probability distribution is modified by multiplying the probability of each size k subset S by the probability mu(S) assigned to S by another probability distribution mu. In order to obtain improved approximation algorithms for the A-optimal design problem, we appeal to hard-core distributions mu based on the solution of a convex relaxation of the problem. Our results include a d-approximation when k=d, and a (1+epsilon)-approximation when k is on the order of (d/epsilon) + log(1/epsilon)/epsilon^2. When we are allowed repetitions of the selected rank-1 matrices, we achieve a k/(k-d+1)-approximation, which matches the integrality gap of the natural convex relaxation of the problem. We also show that our results cannot be extended to the related E-optimal design problem in which we maximize the minimum eigenvalue of the sum of selected matrices. In particular, we show that for E-optimal design the integrality gap of the natural convex relaxation is at least 1-epsilon for k as large as d/epsilon^2. We also derive generalizations of the problem to selecting fewer than d matrices, and consider an application to restricted invertibility principles.

Joint work with Mohit Singh and Uthaipon Tao Tantipongpipat

December 7, 2017

Gates 463A, 12:15PM (special (full-length) theory seminar during theory lunch!)

Bobby Kleinberg (Cornell)

Recharging Bandits: Learning to Schedule Recurring Interventions

Traditional multi-armed bandit models posit that the payoff distribution of each action (or "arm") is stationary over time, and hence that the goal of learning is to identify the arm with the highest expected payoff and choose that one forever after. However, in many applications the efficacy of an action depends on the amount of time that has elapsed since it was last performed. Examples arise in precision agriculture, online education, and music recommendations. In this talk we introduce a generalization of the multi-armed bandit problem that models such applications. In the course of analyzing algorithms for this problem, we will encounter some interesting combinatorial questions about coloring the integers subject to bounds on the sizes of subintervals that exclude a given color.

This talk is based on joint work with Nicole Immorlica.

November 30, 2017

Gates 463A, 4:15PM

Jakub Tarnawski (EPFL)

A Constant-Factor Approximation Algorithm for the Asymmetric Traveling Salesman Problem

We give a constant-factor approximation algorithm for the asymmetric traveling salesman problem. Our approximation guarantee is analyzed with respect to the standard LP relaxation, and thus our result confirms the conjectured constant integrality gap of that relaxation. Our techniques build upon the constant-factor approximation algorithm for the special case of node-weighted metrics. Specifically, we give a generic reduction to structured instances that resemble but are more general than those arising from node-weighted metrics. For those instances, we then solve Local-Connectivity ATSP, a problem known to be equivalent (in terms of constant-factor approximation) to the asymmetric traveling salesman problem. This is joint work with Ola Svensson and László Végh.

Tuesday November 28, 2017


Jelani Nelson (Harvard)

Optimality of the Johnson-Lindenstrauss lemma

Dimensionality reduction in Euclidean space, as attainable by the Johnson-Lindenstrauss lemma, has been a fundamental tool in algorithm design and machine learning. The JL lemma states that any n point subset of l_2 can be mapped to l_2^m with distortion 1+epsilon, where m = O((log n) / epsilon^2). In this talk, I discuss our recent proof that the JL lemma is optimal, in the sense that for any n, d, epsilon, where epsilon is not too small, there is a point set X in l_2^d such that any f:X->l_2^m with 1+epsilon must have m = Omega(epsilon^{-2} log n). I will also discuss some subsequent work and future directions.

Joint work with Kasper Green Larsen (Aarhus University).

November 16, 2017

No theory seminar today, instead go to the...

Motwani Theory Colloquium: Manuel Blum (CMU)

Huang Mackenzie Center Room 300, 4:15PM (followed by light refreshments)

Can a Machine be Conscious? Towards a Computational Model of Consciousness

Thanks to major advances in neuroscience, we are on the brink of a scientific understanding of how the brain achieves consciousness. This talk will describe neuroscientist Bernard Baars's Global Workspace Model (GWM) of the brain and propose a formal Turing-Machine-like computational model inspired by it for understanding consciousness. One of several consequences of this Model is the possibility of free will in a completely deterministic world. Another deals with the possibility of building machines that are conscious.

More info about the Motwani Colloquium is available here

November 9, 2017

Gates 463A, 4:15PM

Clement Canonne (Stanford)

When Fourier SIIRVs: Fourier-Based Testing for Families of Distributions

We study the general problem of testing whether an unknown discrete probability distribution belongs to a specified family of distributions. More specifically, given a distribution family $\mathcal{P}$ and sample access to an unknown discrete distribution $p$, we want to distinguish (with high probability) between the case that $p$ is in $\mathcal{P}$ and the case that $p$ is eps-far, in total variation distance, from every distribution in $\mathcal{P}$. This is the archetypal hypothesis testing problem that has received significant attention in statistics and, over the past decade and roughly a half, in theoretical computer science.

Of course, the sample complexity of this general inference task depends on the underlying family $\mathcal{P}$. The gold standard in distribution testing is to design sample-optimal and computationally efficient algorithms for this task, as a function of $\mathcal{P}$. The main contribution of this work is a simple and general testing technique that is applicable to /all/ distribution families, and is particularly suited to those whose /Fourier spectrum/ satisfies a certain approximate /sparsity/ property. To the best of our knowledge, ours is the first use of the Fourier transform in the context of distribution testing.

We apply our Fourier-based framework to obtain near sample-optimal and computationally efficient testers for the following fundamental distribution families: Sums of Independent Integer Random Variables (SIIRVs), Poisson Multinomial Distributions (PMDs), and Discrete Log-Concave Distributions. For the first two, ours are the first non-trivial testers in the literature, vastly generalizing previous work on testing Poisson Binomial Distributions. For the third, our tester improves on prior work in both sample and time complexity.

Joint work with Ilias Diakonikolas (USC) and Alistair Stwart (USC).

November 2, 2017

Gates 463A, 4:15PM

Avishay Tal (Stanford)

Computing Requires Larger Formulas than Approximating

A de-Morgan formula over Boolean variables x_1, ..., x_n is a binary tree whose internal nodes are marked with AND or OR gates and whose leaves are marked with variables or their negation. We define the size of the formula as the number of leaves in it. Proving that some explicit function (in P or NP) requires large formulas is a central open question in computational complexity.

In this work, we introduce a size-amplification hardness reduction for de-Morgan formulas. We show that average-case hardness implies worst-case hardness for a larger size. More precisely, if a function f cannot be computed correctly on more than 1/2 + eps of the inputs by any formula of size s, then computing f correctly on all inputs requires size ~s*log(1/eps). The tradeoff is essentially tight. Quite surprisingly, the proof relies on a result from quantum query complexity by Reichardt [SODA, 2011].

As an application, we improve the best known formula size lower bounds for explicit functions by logarithmic factors to ~n^3/log(n). In addition, we propose candidates for explicit functions that we believe have formula size ~n^4, and prove non trivial super-quadratic formula size lower bounds for them using our reduction. Learning discrete Markov Random Fields with nearly optimal runtime and sample complexity.

October 26, 2017

Gates 463A, 4:15PM

Raghu Meka (UCLA)

Learning discrete Markov Random Fields with nearly optimal runtime and sample complexity.

We give an algorithm for learning the structure of an undirected graphical model that has essentially optimal sample complexity and running time. We make no assumptions on the structure of the graphical model. For Ising models, this subsumes and improves on all prior work. For general t-wise MRFs, these are the first results of their kind.

Our approach is new and uses a multiplicative-weight update algorithm. Our algorithm-- Sparsitron-- is easy to implement (has only one parameter) and holds in the online setting. It also gives the first provably efficient solution to the problem of learning sparse Generalized Linear Models (GLMs).

Joint work with Adam Klivans.

October 19, 2017

Gates 463A, 4:15PM

Rasmus Kyng (Simons)

Approximate Gaussian Elimination for Laplacians

We show how to perform sparse approximate Gaussian elimination for Laplacian matrices. We present a simple, nearly linear time algorithm that approximates a Laplacian matrix by a matrix with a sparse LU factorization. We compute this factorization by subsampling standard Gaussian elimination. This gives the simplest known nearly linear time solver for Laplacian equations. It is a significant step towards practical and provably correct algorithms for this problem. The crux of our proof is the use of matrix martingales to analyze the algorithm.

Joint work with Sushant Sachdeva.

October 12, 2017

Gates 463A, 4:15PM

Thatchaphol Saranurak (KTH)

Dynamic Spanning Forest: Techniques and Connections to Other Fields

I will first give an overview of dynamic algorithms and their connections to other fields. Then, I will present our recent progress on the question "is there a dynamic algorithm with small worst-case update time" for the spanning forest problem, which is among central problems in dynamic algorithms on graphs. Our result guarantees an n^{o(1)} worst-case update time with high probability, where n is the number of nodes. The best worst-case bounds prior to our work are (i) the long-standing O(\sqrt{n}) bound of [Frederickson STOC'83, Eppstein, Galil, Italiano and Nissenzweig FOCS'92] (which is slightly improved by a O(\sqrt{\log(n)}) factor by [Kejlberg-Rasmussen, Kopelowitz, Pettie, Thorup ESA'16]) and (ii) the polylogarithmic bound of [Kapron, King and Mountjoy SODA'13] which works under an oblivious adversary assumption (our result does not make such assumption).

The crucial techniques are about expanders: 1) an algorithm for decomposing a graph into a collection of expanders in near-linear time, and 2) an algorithm for "repairing" the expansion property of an expander after deleting some edges of it. These techniques can be of independent interest.

This talk is based on results by [Nanongkai, Saranurak and Wulff-Nilsen, FOCS'17], [Nanongkai and Saranurak, STOC'17] and [Wulff-Nilsen, STOC'17].

October 5, 2017

Gates 463A, 4:15PM

Rad Niazadeh (Stanford)

Bernoulli Factories and Blackbox Reductions in Mechanism Design

In this talk, I am going to talk about a recent polynomial-time reduction from Bayesian incentive-compatible mechanism design to Bayesian algorithm design for welfare maximization problems. Unlike prior results, our reduction achieves exact incentive compatibility for problems with multi-dimensional and continuous type spaces. The key technical barrier preventing exact incentive compatibility in prior black-box reductions is that repairing violations of incentive constraints requires understanding the distribution of the mechanism’s output, which is typically #P-hard to compute. Reductions that instead estimate the output distribution by sampling inevitably suffer from sampling error, which typically precludes exact incentive compatibility. We overcome this barrier by employing and generalizing the computational model in the literature on Bernoulli Factories. In a Bernoulli factory problem, one is given a function mapping the bias of an “input coin” to that of an “output coin”, and the challenge is to efficiently simulate the output coin given only sample access to the input coin. I will show how to incorporate Bernoulli factories to design a polynomial time algorithm for the following selection problem: a ground set of elements and sampling oracles for each element are given (for unknown input distributions), expected values of the input distributions correspond to the weights of the elements in the set, and we wish to select an element with probability proportional to an exponential function of its weight by efficient sampling. I then show how this algorithm solves a simple single agent truthful mechanism design problem. This truthful mechanism is the key ingredient that can be used to make the approximately incentive compatible reduction of Hartline et al. (2015) exactly incentive compatible.

This talk is based on a joint-work with Shaddin Dughmi, Jason Hartline, and Bobby Kleinberg.

September 28, 2017

Gates 463A, 4:15PM

Shengwu Li (Harvard)

Credible Mechanism Design

Consider an extensive-form mechanism, run by an auctioneer who communicates sequentially and privately with agents. Suppose the auctioneer can make any deviation that no single agent can detect. We study the mechanisms such that it is incentive-compatible for the auctioneer not to deviate -- the credible mechanisms. Consider the ex post individually-rational optimal auctions. The first-price auction is the unique sealed-bid credible mechanism. The ascending auction is the unique strategy-proof credible mechanism.