Stanford Theory Seminar (formerly known as Stanford Algorithms Seminar/AFLB) is usually held in Gates 498 or Theory Lounge(Gates 463A)

Faculty Contact: Virginia Vassilevska Williams

Student Contact: Huacheng Yu

Click here for directions.

Abstract: We study the classic problem of prediction with expert advice. Focusing on settings with a constant number of experts, we develop optimal algorithms and obtain precisely optimal regret values for the case of 2 and 3 experts. Further, we develop regret lower bounds for the multiplicative weights algorithm that exactly match the known upper bounds for an arbitrary number of experts k. This establishes a constant factor separation between the regrets achieved by the optimal algorithm and the multiplicative weights algorithm for a constant number of experts.

The optimal algorithms have a crisp interpretation. For instance, in the case of 2 experts, the optimal algorithm follows the advice of a particular expert with exactly the probability that the expert will turn out to be the best expert.

Our main tool is the minimax principle which lets us analyze the optimal adversary to compute optimal regret values. While analyzing the optimal adversary, we establish deep connections with non-trivial aspects of random walk. We further use this connection to develop an improved regret bound for the case of 4 experts.

Joint work with Nick Gravin and Yuval Peres

Bio: Balasubramanian Sivan is a postdoctoral researcher at Microsoft Research, Redmond. He completed his PhD in Computer Science at the University of Wisconsin-Madison in 2013. His PhD thesis on "Prior Robust Optimization" was awarded the 2014 ACM SIGecom doctoral dissertation award and the University of Wisconsin-Madison Computer Science department's outstanding graduate student researcher award. His research interests are in Algorithmic Game Theory, online and approximation algorithms, expert learning. More details can be found here: http://research.microsoft.com/en-us/um/people/bsivan/

Abstract: In this talk, I will present a new algorithm for solving linear programs. Given a linear program with n variables, m > n constraints, and bit complexity L, our algorithm runs in Õ(sqrt(n) L) iterations each consisting of solving Õ(1) linear systems and additional nearly linear time computation. Our method improves upon the convergence rate of previous state-of-the-art linear programming methods which required solving either Õ(sqrt(m)L) linear systems [R88] or consisted of Õ((mn)^(1/4)) steps of more expensive linear algebra [VA93].

Interestingly, our algorithm not only nearly matches the convergence rate of the universal barrier of Nesterov and Nemirovskii [NN94], but in the special case of the linear programming formulation of various flow problems our methods converge at a rate faster than that predicted by any self-concordant barrier. In particular, we achieve a running time of Õ(|E| sqrt(|V|) log^2 U) for solving the maximum flow problem on a directed graph with |E| edges, |V| vertices, and capacity ratio U, thereby improving upon the previous fastest running time for solving this problem when |E| > Ω(|V|^(1+epsilon)) for any constant epsilon.

This talk will assume little exposure to linear programming algorithms, convex optimization, or graph theory.

This talk reflects joint work with Aaron Sidford.

See http://arxiv.org/abs/1312.6677 and http://arxiv.org/abs/1312.6713.

Abstract: We give a new and improved proof that the shrinkage exponent of De Morgan formulae is 2. Namely, we show that for any Boolean function $f: \{0,1\}^n \to \{0,1\}$, setting each variable out of $x_1, \ldots, x_n$ with probability $1-p$ to a randomly chosen constant, reduces the expected formula size of the function by a factor of $O(p^2)$. This result is tight and improves the work of Hastad [SIAM J. C., 1998] by removing logarithmic factors.

As a consequence of our results, we strengthen the best formula size lower bounds for functions in P.

The proof mainly uses discrete Fourier analysis, and relies on a result from quantum query complexity by Laplante et al. [CC, 2006], Hoyer et al. [STOC, 2007] and Reichardt [SODA, 2011].

No prior knowledge about quantum computing is assumed.

Short bio: I am a PhD student from the Weizmann Institute under the guidance of Ran Raz. Before that, I did my MSc in computer science at the Technion, under the guidance of Amir Shpilka. My main areas of interest are analysis of Boolean functions, circuit lower bounds, pseudo randomness/derandomization and combinatorics.

A great deal of effort has been made to reduce the risk of spurious scientific discoveries, from the use of holdout sets and sophisticated cross-validation techniques, to procedures for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of science: the theory assumes a fixed collection of hypotheses to be tested, or learning algorithms to be applied, selected non-adaptively before the data are gathered, whereas science is by definition an adaptive process, in which data are shared and re-used, and hypotheses and new studies are generated on the basis of data exploration and previous outcomes.

Surprisingly, the challenges of adaptivity can be addressed using insights from differential privacy, a field of study supporting a definition of privacy tailored to private data analysis. As a corollary we show how to safely reuse a holdout set a great many times without undermining its power of ``correctness protection,'' even when hypotheses and computations are chosen adaptively. Armed with this technique, the analyst is free to explore the data ad libitum, generating and evaluating hypotheses, verifying results on the holdout, and backtracking as needed.

Joint work with Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toni Pitassi and Aaron Roth

We study the problem of sketching an input graph so that, given the sketch, one can estimate the value (capacity) of any cut in the graph up to a small approximation, 1+epsilon. The classic construction of [Benczur, Karger 1996] implies a sketch of size essentially proportional to the number of vertices and the *square* of 1/epsilon.

We show that the dependence on epsilon can be brought down to only linear in 1/epsilon, if one tolerates a slightly weaker guarantee. Namely, we give a randomized scheme which, given accuracy epsilon and an n-vertex graph G=(V,E), produces a sketch of size O~(n/epsilon), from which one can report the capacity of any fixed cut (S,V\S) within approximation factor (1+epsilon) with high probability. This "for each" guarantee remains useful in some applications nonetheless.

To complement the above, we also show that the weaker guarantee is inescapable to achieve the improved dependence on epsilon. In particular, if a sketch succeeds in estimating the capacity of *all* cuts in the graph simultaneously, it must have size at least Omega(n/epsilon^2) bits.

Joint work with Robert Krauthgamer and David Woodruff.

Bio: Alexandr Andoni is a researcher with interests broadly revolving around big data algorithmics, in particular sublinear algorithms and theoretical machine learning. Alexandr graduated from MIT in 2009, with a PhD thesis on Nearest Neighbor Search, under the supervision of Prof. Piotr Indyk. From 2009–2010, he was a postdoc at the Center for Computational Intractability at Princeton, as well as a visitor at NYU and IAS. Since then, Alexandr was a researcher at Microsoft Research Silicon Valley, until its closure in 2014.

At the heart of every local search procedure is a directed graph on candidate solutions (states) such that every unsatisfying state has at least one outgoing arc. In randomized local search the hope is that a random walk on the graph reaches a satisfying state (sink) quickly. We give a general algorithmic local lemma by establishing a sufficient condition for this to be true. Our work is inspired by Moser's entropic method proof of the Lov\'{a}sz Local Lemma (LLL) for satisfiability and completely bypasses the Probabilistic Method formulation of the LLL. In particular, our method allows both the search space and the optimality conditions to be entirely amorphous, enabling the analysis of far more sophisticated algorithms than the LLL. Similarly to Moser's argument, the key point is that algorithmic progress is measured in terms of entropy rather than in terms of energy (number of violated constraints) so that termination can be established even under the proliferation of local optima. The talk assumes no familiarity with the LLL or the Probabilistic Method but will introduce you to the idea of seeing good algorithms as compressors.

Two-prover games are fundamental to theoretical computer science. Their applications range from cryptography to probabilistically checkable proofs to quantum computing. The parallel repetition theorem analyzes repeated two prover games, and is notorious for its difficulty.

In this work we give a simple transformation on games -- "fortification" -- and show that for fortified games a parallel repetition theorem holds with ideal parameters. Our proof is combinatorial and short. As corollaries, we obtain simple combinatorial proofs of state-of-the-art PCP Theorems.

We achieve essentially the largest possible separation between quantum and classical query complexities. We do so using a property-testing problem called Forrelation, where one needs to decide whether one Boolean function is highly correlated with the Fourier transform of a second function. This problem can be solved using 1 quantum query, yet we show that any randomized algorithm needs Omega(sqrt(N) / log(N)) queries (improving an Omega(N^{1/4}) lower bound of Aaronson). Conversely, we show that this 1 versus ~Omega(sqrt(N)) separation is optimal: indeed, any t-query quantum algorithm whatsoever can be simulated by an O(N^{1-1/2t})-query randomized algorithm. Thus, resolving an open question of Buhrman et al. from 2002, there is no partial Boolean function whose quantum query complexity is constant and whose randomized query complexity is linear. We conjecture that a natural generalization of Forrelation achieves the optimal t versus ~Omega(N^{1-1/2t}) separation for all t. As a bonus, we show that this generalization is BQP-complete. This yields what's arguably the simplest BQP-complete problem yet known, and gives a second sense in which Forrelation "captures the maximum power of quantum computation."

Joint work with Andris Ambainis.

Can we prove the correctness of a polynomial-time computation to a weak verifier who cannot re-execute the computation? Such proof systems can be used in cloud computing scenarios, allowing weak devices (from phones and tablets to wearable or embedded devices) to delegate work and storage to a third party without compromising the correctness of delegated computations.

I will survey a line of work that attempts to answer this question using the machinery of interactive proofs and cryptography. The focus will be on interactive proofs with sublinear time verifiers, where with high probability the verifier accepts every input in the language, and rejects every input that is far from the language. The verifier's query complexity (and computation complexity), as well as the communication, should all be sublinear in the input length (much smaller than the complexity of deciding the language). We call such a proof system an Interactive Proof of Proximity, and we show general upper and lower bounds.

I plan to discuss the hypothesis that CNF-SAT essentially requires 2^n time to solve, my failed attempts to refute this hypothesis over 15(!) years of work on it, and the research program(s) that arose out of these failed attempts.

We present new algorithms for listing triangles in dense and sparse graphs: given a graph G and a parameter t, we want to list t triangles of G if G has at least t triangles, or all the triangles in G otherwise. Our algorithms rely on fast matrix multiplication. If the matrix multiplication exponent omega is 2, the running times of the algorithms are O(n^2+nt^{2/3}) and O{m^{4/3}+mt^{1/3}), for graphs with n nodes and m edges. In particular, if omega=2, our algorithm lists m triangles in O(m^{4/3}) time. Patrascu (STOC 2010) showed that O(m^{4/3-o(1)}) time is required for listing m triangles, unless there exist subquadratic algorithms for 3SUM. We provide further conditional lower bounds showing that our running times are tight also for graphs with more triangles.

(From ICALP'14, joint work with A. Bjorklund, R. Pagh and U. Zwick)

Given a k-edge-connected graph G=(V,E), a spanning tree T is \alpha-thin w.r.t., G, if for any S\subset V, |T(S,V-S)|\leq \alpha\cdot |E(S,V-S)|. The thin tree conjecture asserts that for a sufficiently large k (independent of the size of G), every k-edge-connected graph has a 1/2-thin tree. This conjecture is intimately related to designing approximation algorithms for Asymmetric Traveling Salesman Problem (ATSP).

We show that any k-edge-connected graph has a polyloglog(n)/k-thin spanning tree. This implies that the integrality gap of the natural LP relaxation of ATSP is at most polyloglog(n). In other words, there is a polynomial time algorithm that approximates the value of the optimum tour within a factor of polyloglog(n). This is the first significant improvement over the \tilde{O}(\log n) approximation factor for ATSP. Our proof builds on the method of interlacing polynomials of Marcus, Spielman and Srivastava and employs several tools from spectral theory and high dimensional geometry.

Based on a joint work with Nima Anari.

We prove that finding a Nash equilibrium of a game is hard, assuming the existence of indistinguishability obfuscation and injective one-way functions with sub-exponential hardness. We do so by showing how these cryptographic primitives give rise to a hard computational problem that lies in the complexity class PPAD, for which finding Nash equilibrium is known to be complete.

Previous proposals for basing PPAD-hardness on program obfuscation considered a strong “virtual black-box” notion that is subject to severe limitations and is unlikely to be realizable for the programs in question. In contrast, for indistinguishability obfuscation no such limitations are known, and recently, several candidate constructions of indistinguishability obfuscation were suggested based on different hardness assumptions on multilinear maps.

Our result provides further evidence of the intractability of finding a Nash equilibrium, one that is extrinsic to the evidence presented so far.

Joint work with Omer Paneth and Alon Rosen (http://eccc.hpi-web.de/report/2015/001/)

The edit distance (a.k.a. the Levenshtein distance) between two strings is defined as the minimum number of insertions, deletions or substitutions of symbols needed to transform one string into another. The problem of computing the edit distance between two strings is a classic computational task, with a well-known algorithm based on dynamic programming. Unfortunately, all known algorithms for this problem run in nearly quadratic time.

In this paper we provide evidence that the near-quadratic running time bounds known for the problem of computing edit distance might be tight. Specifically, we show that, if the edit distance can be computed in time O(n^{2-delta}) for some constant delta>0, then the satisfiability of conjunctive normal form formulas with N variables and M clauses can be solved in time M^{O(1)} 2^{(1-epsilon)N} for a constant epsilon>0. The latter result would violate the Strong Exponential Time Hypothesis, which postulates that such algorithms do not exist.

Joint work with Piotr Indyk. To appear at STOC 15.

Machine learning relies on the assumption that unseen test cases of a classification problem follow the same distribution as observed training data. However, this principle can break down when machine learning is used to make important decisions about the welfare (employment, education, health) of strategic individuals. Knowing information about the classifier, such individuals may manipulate their attributes in order to obtain a better classification outcome. As a result of this behavior---often referred to as gaming---the performance of the classifier may deteriorate sharply. Indeed, gaming is a well-known obstacle for using machine learning methods in practice; in financial policy-making, the problem is widely known as Goodhart's law. In this paper, we formalize the problem, and pursue algorithms for learning classifiers that are robust to gaming.

We model classification as a sequential game between a player called "Jury" and a player called "Contestant". Jury designs a classifier, and Contestant receives an input to the classifier drawn from a distribution. Before being classified, Contestant may change his input based on Jury's classifier. However, Contestant incurs a cost for these changes according to a cost function. Jury's goal is to achieve high classification accuracy with respect to Contestant's original input and a known ideal classification function, assuming Contestant plays best response. Contestant's goal is to achieve a favorable classification outcome while taking into account the cost of achieving it.

For a natural class of "separable" cost functions, and certain generalizations, we obtain computationally efficient learning algorithms which are near optimal, achieving a classification error that is arbitrarily close to the theoretical minimum. Surprisingly, the only assumption we place on the target function is that it stems from a statistically learnable concept class, a minimal assumption even permitting concept classes that are computationally hard to learn. For general cost functions, designing an approximately optimal strategy-proof classifier, for an inverse polynomial approximation, is NP-hard.

Joint work with Nimrod Megiddo, Christos Papadimitriou and Mary Wootters.

We consider the question of whether provable guarantees are obtainable in combinatorial optimization in the presences of error and noise. We provide initial answers, by focusing on the question of maximizing monotone submodular functions under matroid constraints.

Based on joint work with Avinatan Hassidim.

Graph matching is one of the most well-studied problems in combinatorial optimization, with applications ranging from scheduling and object recognition to numerical analysis and computational chemistry. Nevertheless, until recently very little was unknown about this problem in real-life {dynamic networks}, which aim to model the constantly changing physical world.

In the first part of the talk we will discuss our work on the dynamic maximum matching problem. In the second part of the talk we will highlight some of our work on a few related problems.

Expander codes are error correcting codes which are notable for their efficient decoding algorithms. Expander codes are formed from two ingredients: an expander graph, and a smaller "inner code." Since they were introduced by Sipser and Spielman in 1996, there's been a long line of work establishing and refining the statement, "If the inner code has good distance then so does the expander code, and further you can decode it very efficiently." In this talk, I'll discuss two recent works establishing similar statements for two other properties, local-correctability and list-recoverability. That is, if the inner code has decent locality, the expander code is locally correctable; if the inner code has good list-recoverability, the expander code has a linear-time list-recovery algorithm. Corollaries include constructions of high-rate locally-correctable codes with polynomial query complexity, and a new step towards linear-time, optimally-list-decodable codes. This is based on joint work with Brett Hemenway and Rafail Ostrovsky.

Bio: Mary Wootters is a NSF postdoctoral fellow at Carnegie Mellon University, and is currently also a visiting scientist at the Simons Institute at Berkeley. She received her PhD in math from the University of Michigan in 2014, and her B.A. in math and computer science from Swarthmore College in 2008.

Semantic word embeddings represent words as vectors in an attempt to capture their meaning, and are useful for many NLP tasks. The simplest ones use SVD and LSA, but more recently have been constructed using nonlinear/nonconvex techniques such as deep nets and energy-based models. Recently Mikolov et al (2013) showed that such embeddings exhibit linear structure that can be used to solve "word analogy tasks" such as man: woman :: king: ??. Subsequently, Levy and Goldberg (2014) and Pennington et al (2014) tried to explain why such linear structure should arise in embeddings derived from nonlinear methods. We point out gaps in these explanations and provide a more complete explanation in the form of a loglinear generative model for text corpora that directly models latent semantic structure in words and involves random walk in the context space. A rigorous mathematical analysis is performed to obtain explicit expressions for word cooccurence probabilities, which leads to a surprising explanation for why word embeddings exist and why they exhibit linear structure that allows solving analogies. Our model and its analysis leads to several counterintuitive predictions, which are also empirically verified.

We think our methodology and generative model may be useful in other settings.

Joint work with Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski (listed in alphabetical order).

We prove an average-case depth hierarchy theorem for Boolean circuits over the standard basis of AND, OR, and NOT gates. Our hierarchy theorem says that for every $d \geq 2$, there is an explicit $n$-variable Boolean function $f$, computed by a linear-size depth-$d$ formula, which is such that any depth-$(d-1)$ circuit that agrees with $f$ on $(1/2 + o_n(1))$ fraction of all inputs must have size $\exp(n^{\Omega(1/d)}).$ This answers an open question posed by Håstad in his Ph.D. thesis.

Our average-case depth hierarchy theorem implies that the polynomial hierarchy is infinite relative to a random oracle with probability 1, confirming a conjecture of Håstad, Cai, and Babai. We also use our result to show that there is no "approximate converse" to the results of Linial, Mansour, Nisan and Boppana on the total influence of small-depth circuits, thus answering a question posed by O'Donnell, Kalai, and Hatami.

A key ingredient in our proof is a notion of random projections which generalize random restrictions.

Joint work with Ben Rossman and Rocco Servedio.

Persuasion is intrinsic to many human endeavors. For example, merchants persuade customers through advertising campaigns which selectively highlight a product's positive attributes and obscure its defects, lawyers persuade judges and juries through oral arguments which selectively highlight some pieces of evidence and downplay others, and political candidates persuade voters by framing the debate on issues of the day in order to further their political aspirations. In all these cases, and many others, persuasion can be viewed as the act of exploiting an informational advantage in order to effect change in the decisions of others through communication. For such communication to be credible, and hence persuasive, it must be grounded in truth, and yet it may cherry-pick the truths revealed while obscuring others.

The ubiquity of persuasive communication has spawned a vast literature concerned with understanding and exploring the space of information structures in strategic interactions. Perhaps no model is more basic and fundamental than the Bayesian Persuasion model of Kamenica and Gentzkow. Here there are two players, a sender and a receiver. The receiver must take one of a number of actions with a-priori unknown payoff. The sender, on the other hand, has access to additional information regarding the payoffs of the various actions for both players. The sender can commit to revealing a noisy signal regarding the realization of the payoffs of various actions, and in doing so would like to maximize her payoff in expectation assuming that the receiver rationally acts to maximize his own payoff.

This paper studies the optimization task facing the sender, that of designing a near optimal signaling scheme, from an algorithmic perspective. Despite being the fundamental building block for the study of signaling in strategic settings, the Bayesian Persuasion problem has not been examined algorithmically prior to our work. A-priori, it is unclear whether the sender's optimal signaling strategy can be succinctly represented, let alone efficiently computed. Somewhat surprisingly, we nevertheless discover that optimal or near optimal signaling schemes can be computed efficiently --- in time polynomial in the number of actions --- under fairly general conditions. We consider two models: one in which action payoffs are i.i.d from an explicitly-described marginal distribution, and another in which the joint distribution of action payoffs is arbitrary and given by a black-box sampling oracle.

We show two results for the case in which the payoffs of different actions are i.i.d and given explicitly: a polynomial-time (exact) algorithmic solution, and a “simple” (1 1/e) approximation algorithm. Both results hinge on a “symmetry characterization” of the optimal signaling scheme. The former result also involves a connection to auction theory, and in particular to Border’s characterization of the space of feasible reduced-form auctions. We then consider general distributions given by a black-box sampling oracle. We show that, even in this very general setting, the persuasion problem admits a fully polynomial-time approximation scheme (FPTAS) in a bi-criteria sense. As our main technical tool to prove this theorem, we study the algorithmics of a natural and abstract question on vector spaces, which we term dispersion testing: Given two distributions A and B supported on a finite-dimensional vector space, decide whether A is a mean-preserving spread of B, and if so compute the inverse of the associated spreading map. We employ tools from convex optimization and optimal transport theory to design an approximate test for the dispersion testing problem, and relate the persuasion problem to dispersion testing via a randomized Turing reduction employing the Ellipsoid method.

This is joint work with Haifeng Xu.

Bio: Shaddin Dughmi is an Assistant Professor in the Department of Computer Science at USC, where he is a member of the Theory Group. He received a B.S. in computer science, summa cum laude, from Cornell University in 2004, and a PhD in computer science from Stanford University in 2011. He is a recipient of the NSF CAREER award, the Arthur L. Samuel best doctoral thesis award, and the ACM EC best student paper award.

Unaggregated data, in a streamed or distributed form, is prevalent and comes from diverse sources such as interactions of users with web services and IP traffic. Data elements have {\em keys} (cookies, users, queries) and elements with different keys interleave.

Analytics on such data typically utilizes statistics expressed as a sum over keys in a specified segment of a function $f$ applied to the frequency (the total number of occurrences) of the key. In particular, {\em Distinct} is the number of active keys in the segment, {\em Sum} is the sum of their frequencies, and both are special cases of {\em frequency cap} statistics, which cap the frequency by a parameter $T$. One important application of cap statistics is staging advertisement campaigns, where the cap parameter is the limit of the maximum number of impressions per user and we estimate the total number of qualifying impressions.

The number of distinct active keys in the data can be very large, making exact computation of queries costly. Instead, we can estimate these statistics from a sample. An optimal sample for a given function $f$ would include a key with frequency $x$ with probability roughly proportional to $f(x)$. But while such a "gold-standard" sample can be easily computed over the aggregated data (the set of key-frequency pairs), exact aggregation itself is costly and slow. Ideally, we would like to compute and maintain a sample without aggregation.

We present a sampling framework for unaggregated data that uses a single pass (for streams) or two passes (for distributed data) and state proportional to the desired sample size. Our design unifies classic solutions for Distinct and Sum. Specifically, our $\ell$-capped samples provide nonnegative unbiased estimates of any monotone non-decreasing frequency statistics, and close to gold-standard estimates for frequency cap statistics with $T=\Theta(\ell)$. Furthermore, our design facilitates multi-objective samples, which provide tight estimates for a specified set of statistics using a single smaller sample.

This talk should be accessible to a wider audience. The results will appear in KDD 2015.

Submodular functions appear in many places, ranging from algorithmic game theory to machine learning and information theory. I will survey some of my work in this area, in particular algorithms using two relaxations that are useful for submodular optimization: the Lovasz extension (for minimization) and the multilinear extension (for maximization). I will also describe some recent work whose goal is to make these algorithms more efficient.