Stanford Theory Seminar (formerly known as Stanford Algorithms Seminar/AFLB) is usually held in Gates 463A.

Faculty Contact: Mary Wootters

Student Contact: Ofir Geri

Click here for directions.

It is well-known that the expressivity of a neural network depends on its architecture, with deeper networks expressing more complex functions. For ReLU networks, which are piecewise linear, the number of distinct linear regions is a natural measure of expressivity. It is possible to construct networks for which the number of linear regions grows exponentially with depth. However, we show that the expressivity of networks is in practice far below the theoretical maximum. At initialization, we prove that the average number of regions along any one-dimensional subspace grows only linearly, instead of exponentially, in the total number of neurons. More generally, the average number of regions in a k-dimensional subspace is upper bounded by the kth power of the number of neurons, irrespective of network architecture. Our theory and empirical results suggest that this behavior persists during training. We conclude that inductive bias may play a more significant role than expressivity in the success of deep networks. Joint work with Boris Hanin.

We introduce a new type of algebra, the Expedient Algebra EXP, for which computations satisfy tight distribution control. Algorithms satisfying such distribution control are guaranteed to support modular time analysis–drastically simplifying static timing. The property ensures that problematic algorithms, for which the exact time is too hard or impossible to analyze with current means, can be distinguished at code level from algorithms supporting an elegant modular time analysis. Little is known about the intrinsic properties of algorithms exhibiting such distribution control. We show that EXP-computations, made reversible through history-keeping, act as closed systems in which entropy is conserved. We refer to such algorithms as "thermodynamic algorithms". Thus modularity of timing is linked to entropy conservation of data flow, sharpening traditional entropy preservation guaranteed by the second law of thermodynamics for reversible systems. Conservation typically holds for the case of energy, but not for entropy. A salient point is that for algorithms satisfying distribution control, entropy is neither created nor destroyed, merely transferred from one form, i.e. quantitative entropy, to another, i.e. positional entropy.

We establish an Entropy Conservation Law ECL. The law expresses the inverse proportionality of positional and quantitative entropy for distributions over series-parallel orders and their duals. We also present a duality theorem which expresses that order established by the expedient product (with history) on labels is proportionally destroyed on indices by a shadow transformation in the dual space. The duality theorem supports the derivation of a computational version of ECL (obtained jointly with M. Fiore during a recent research visit at Cambridge).

The results shed new light on the properties of algorithms for which distribution control, and hence modular timing, is guaranteed.

We consider the problem of finding large cuts in graphs that contain no small cliques. We show that for any r >= 3, every k_r-free graphs with maximum degree d has a cut that cuts at least a 1/2 + Ω(1/d^{1-1/2^{r-2}}) fraction of its edges. This generalizes a result of Shearer that shows every triangle-free d-regular graph has a cut that cuts at least a 1/2 + Ω(1/\sqrt{d}) fraction of its edges. Our proof yields a randomized polynomial-time algorithm.

A 2-player coordination game is a game where both players receive the same payoff, and are therefore collaborating. A network coordination game is a multi-player game where the players are nodes in a graph, and each edge models a 2-player coordination game. The strategy chosen by each node applies simultaneously for all incident edges. We study the PLS-complete problem of finding a pure-strategy Nash equilibrium in such games, when the instance is smoothed. (The payoff values are independent random variables, where the distribution is chosen adversarially under a density constraint.) We show that the natural best-response dynamic finds an equilibrium in polynomial time when the number of strategies that each player may choose from is a constant.

We first introduce smoothness-preserving reductions, and show that the problem admits such a reduction to finding a locally maximal cut in a weigthed graph, which allows us to inherit recent developments in the smoothed analysis of the local-search algorithm for this problem. Note that local-max-cut is also PLS-complete. This reduction, however, only holds when each player is allowed to choose from exactly 2 strategies. In the general case, we show a reduction to finding a cut maximal up to flipping 2 nodes; this variant has not been shown to be efficient in the smoothed case, and so the reduction is conditional.

Second, we explain how the framework for the analysis of local-max-cut may be applied directly to our problem, rather than going through reductions. When players are allowed to choose from k strategies, we show that best-response searches converge in polynomial time, where the exponent is linear in k. The conditional reduction above seeks to eliminate the exponential dependence on k.

(Joint work with Rucha Kulkarni and Ruta Mehta)

We develop theory for using heuristics to solve computationally hard problems in differential privacy. Heuristic approaches have enjoyed tremendous success in machine learning, for which performance can be empirically evaluated. However, privacy guarantees cannot be evaluated empirically, and must be proven --- without making heuristic assumptions. We show that learning problems over broad classes of functions can be solved privately and efficiently, assuming the existence of a non-private oracle for solving the same problem. Our first algorithm yields a privacy guarantee that is contingent on the correctness of the oracle. We then give a reduction which applies to a class of heuristics which we call certifiable, which allows us to convert oracle-dependent privacy guarantees to worst-case privacy guarantee that hold even when the heuristic standing in for the oracle might fail in adversarial ways. Finally, we consider a broad class of functions that includes most classes of simple boolean functions studied in the PAC learning literature, including conjunctions, disjunctions, parities, and discrete half-spaces. We show that there is an efficient algorithm for privately constructing synthetic data for any such class, given a non-private learning oracle. This in particular gives the first oracle-efficient algorithm for privately generating synthetic data for contingency tables. The most intriguing question left open by our work is whether or not every problem that can be solved differentially privately can be privately solved with an oracle-efficient algorithm. While we do not resolve this, we give a barrier result that suggests that any generic oracle-efficient reduction must fall outside of a natural class of algorithms (which includes the algorithms given in this paper).

For a vector space F^n over a field F, an (η, ß)-dimension expander of degree d is a collection of d linear maps Γ_j : F^n \to F^n such that for every subspace U of F^n of dimension at most ηn, the image of U under all the maps, ∑_{j=1}^d Γ_j(U), has dimension at least ßdim(U). Over a finite field, a random collection of d=O(1) maps Γ_j over excellent “lossless” expansion with high probability: ß ≈ d for η ≥ Ω(1/\eta). When it comes to a family of explicit constructions (for growing n), however, achieving even expansion factor β = 1 + ε with constant degree is a non-trivial goal.

We present an explicit construction of dimension expanders over finite fields based on linearized polynomials and subspace designs, drawing inspiration from recent progress on list decoding in the rank-metric. Our approach yields the following: Lossless expansion over large fields; more precisely ß ≥ (1–ε)d and η ≥ (1–ε)/d with d=O_ε(1), when |F| ≥ Ω(n). Optimal up to constant factors expansion over fields of arbitrarily small polynomial size; more precisely ß ≥ Ω(δd) and η ≥ Ω(1/(δd)) with d = O_δ(1), when |F| ≥ n^δ.

Previously, an approach reducing to monotone expanders (a form of vertex expansion that is highly non-trivial to establish) gave (Ω(1), 1+Ω(1))-dimension expanders of constant degree over all fields. An approach based on “rank condensing via subspace designs” led to dimension expanders with ß ≥ Ω(√d) over large finite fields. Ours is the first construction to achieve lossless dimension expansion, or even expansion proportional to the degree.

Based on joint work with Venkatesan Guruswami and Chaoping Xing.

We consider a generalization of the third degree price discrimination problem studied in Bergemann et al. 2015, where an intermediary between the buyer and the seller can design market segments to maximize any linear combination of consumer surplus and seller revenue. Unlike in Bergemann et al. 2015, we assume that the intermediary only has partial information about the buyer's value. We consider three different models of information, with increasing order of difficulty. In the first model, we assume that the intermediary's information allows him to construct a probability distribution of the buyer's value. Next we consider the sample complexity model, where we assume that the intermediary only sees samples from this distribution. Finally, we consider a bandit online learning model, where the intermediary can only observe past purchasing decisions of the buyer, rather than her exact value. For each of these models, we present algorithms to compute optimal or near optimal market segmentation.

(joint work with Nikhil Devanur, Zhiyi Huang, and Xiangning Wang)

Modern cloud storage systems need to store vast amounts of data in a fault tolerant manner, while also preserving data reliability and accessibility in the wake of frequent server failures. Traditional MDS (Maximum Distance Separable) codes provide the optimal trade-off between redundancy and number of worst-case erasures tolerated. Minimum storage regenerating (MSR) codes are a special sub-class of MDS codes that provide mechanisms for exact regeneration of a single code-block by downloading the minimum amount of information from the remaining code-blocks. As a result, MSR codes are attractive for use in distributed storage systems to ensure node repairs with optimal repair-bandwidth. However, all known constructions of MSR codes require large sub-packetization levels (which is a measure of the granularity to which a single vector codeword symbol needs to be divided into for efficient repair). This restricts the applicability of MSR codes in practice.

This talk will present a lower bound that exponentially large sub-packetization is inherent for MSR codes. We will also propose a natural relaxation of MSR codes that allows one to circumvent this lower bound, and present a general approach to construct MDS codes that significantly reduces the required sub-packetization level by incurring slightly higher repair-bandwidth as compared to MSR codes.

The lower bound is joint work with Omar Alrabiah, and the constructions are joint work with Ankt Rawat, Itzhak Tamo, and Klim Efremenko.

Corruption in data can be in the form of erasures (missing data) or errors (wrong data). Erasure-resilient property testing (Dixit, Raskhodnikova, Thakurta, Varma '16) and tolerant property testing (Parnas, Ron, Rubinfeld '06) are two models of sublinear algorithms that account for the presence of erasures and errors in input data, respectively. We discuss the erasure-resilient model and what can and cannot be computed in it. We separate this model from tolerant testing by showing that some properties can be tested in the erasure-resilient model with query complexity independent of the input size n, but require n^{Omega(1)} queries to be tested tolerantly.

To prove the separation, we initiate the study of the role of erasures in local decoding. Local decoding in the presence of errors has been extensively studied, but has not been considered explicitly in the presence of erasures. Motivated by the application in property testing, we begin our investigation with local list decoding in the presence of erasures. We prove an analog of the famous result of Goldreich and Levin on local list decodability of the Hadamard code. Specifically, we show that the Hadamard code is locally list decodable in the presence of a constant fraction of erasures, arbitrary close to 1, with list sizes and query complexity better than in the Goldreich-Levin theorem. The main tool used in proving the separation in property testing is an approximate variant of a locally list decodable code that works against erasures. We conclude with some open questions on the general relationship between local decoding in the presence of errors and in the presence of erasures.

Joint work with Noga Ron-Zewi (Haifa University) and Nithin Varma (Boston University)

In the field of knowledge compilation, a class of Boolean circuits has been studied that strikes a balance between compactness and good algorithmic properties: circuits in Decomposable Negation Normal Form (DNNF). In this talk, I will visit some results about the possibilities and limits of using DNNF circuits in a social choice setting. This includes explaining how one can encode certain voting domains using DNNF circuits in polynomial time, and how this enables efficient use of some voting rules for these domains. It also includes showing lower bounds on the size of DNNF circuits for encoding other voting domains.

How does the brain beget the mind? How do molecules, cells and synapses effect intelligence, reasoning, and language? Despite dazzling progress in experimental neuroscience we don't seem to be making progress in the overarching question -- the gap is huge and a completely new approach seems to be required. As Richard Axel recently put it: "We don't have a logic for the transformation of neural activity into thought."

What kind of formalism would qualify as this "logic"? I will sketch a possible answer.

(Joint work with Santosh Vempala, Dan Mitropolsky, and Mike Collins.)

In this talk we will focus on the high dimensional linear regression problem. The goal is to recover a hidden k-sparse binary vector \beta under n noisy linear observations Y=X\beta+W where X is an n \times p matrix with iid N(0,1) entries and W is an n-dimensional vector with iid N(0,\sigma^2) entries. In the literature of the problem, an apparent asymptotic gap is observed between the optimal sample size for information-theoretic recovery, call it n*, and for computationally efficient recovery, call it n_alg.

We will discuss several new contributions on studying this gap. We first identify tightly the information limit of the problem using a novel analysis of the Maximum Likelihood Estimator (MLE) performance. Furthermore, we establish that the algorithmic barrier n_alg coincides with the phase transition point for the appearance of a certain Overlap Gap Property (OGP) over the space of k-sparse binary vectors. The presence of such an Overlap Gap Property phase transition, which originates in spin glass theory, is known to provide evidence of an algorithmic hardness. Finally, we show that in the extreme case where the noise level is zero, i.e. \sigma=0, the computational-statistical gap closes by proposing an optimal polynomial-time algorithm using the Lenstra-Lenstra-Lov\'asz lattice basis reduction algorithm.

This is joint work with David Gamarnik.

We present an algorithm for approximating the edit distance ed(x,y) between two strings x and y in time parameterized by the degree to which one of the strings x satisfies a natural pseudorandomness property. The pseudorandomness model is asymmetric in that no requirements are placed on the second string y, which may be constructed by an adversary with full knowledge of x. We say that x is (p,B)-pseudorandom if all pairs a and b of disjoint B-letter substrings of x satisfy ed(a,b) >= pB. Given parameters p and B, our algorithm computes the edit distance between a (p,B)-pseudorandom string x and an arbitrary string y within a factor of O(1/p) in time \tilde{O}(nB), with high probability. If x is generated at random, then with high probability it will be (\Omega(1), O(log n))-pseudorandom, allowing us to compute ed(x,y) within a constant factor in near linear time.

We study polynomial-time algorithms for a fundamental statistics problem: estimating the mean of a random vector from i.i.d. samples. Focusing on the heavy-tailed case, we assume only that the random vector X has finite mean and covariance. In this setting, the radius of confidence intervals achieved by the empirical mean are large compared to the case that X is Gaussian or sub-Gaussian. On the other hand, estimators based on high-dimensional medians can achieve tighter confidence intervals, at the cost of potential computational intractability.

We offer the first polynomial time algorithm to estimate the mean with sub-Gaussian-size confidence intervals under such mild assumptions. Our algorithm is based on a new semidefinite programming relaxation of a high-dimensional median. Previous estimators which assumed only existence of finitely-many moments of X either sacrifice sub-Gaussian performance or are only known to be computable via brute-force search procedures requiring time exponential in the dimension.

String similarity measures are among the most fundamental problems in computer science. The notable examples are edit distance (ED) and longest common subsequence (LCS). These problems find their applications in various contexts such as computational biology, text processing, compiler optimization, data analysis, image analysis, etc. In this talk, I'll present fast and parallel algorithms for both problems. In the first part of my talk, I will present an algorithm for approximating edit distance within a constant factor in truly subquadratic time. This question has been open for 3 decades and only recently we were able to give positive answers to it.

In the second part of my talk, I will present MPC algorithms for both edit distance and longest common subsequence. These algorithms can be seen as extensions of the previous ideas to the MPC model. The algorithms are optimal with respect to round complexity, time complexity, and approximation factor.

We present the first protocol allowing a classical computer to interactively verify the result of an efficient quantum computation. We achieve this by constructing a measurement protocol, which allows a classical string to serve as a commitment to a quantum state. The protocol forces the prover to behave as follows: the prover must construct an n qubit state of his choice, non-adaptively measure each qubit in the Hadamard or standard basis as directed by the verifier, and report the measurement results to the verifier. The soundness of this protocol is enforced based on the assumption that the learning with errors problem is computationally intractable for efficient quantum machines.

A random (binary) linear code is a dimension lambda*n (0 < \lambda < 1) linear subspace of the binary n-dimensional hypercube, chosen uniformly from among all such subspaces. Such codes play an important role in the theory of error correcting codes, since they achieve the best known rate vs. distance trade-off, i.e., the Gilbert-Varshamov lower bound. Under a random errors regime, the problem of decoding these codes is known as Learning Parity with Noise, and has many cryptographic applications. This work is motivated by the contrast between the importance of random linear codes and how little we know about them.

Much of the interesting information about a code C is captured by its weight distribution. This is the vector (w_0,w_1,...,w_n) where w_i counts the elements of C with Hamming weight i. In this work we study the weight distribution of random linear codes. In our main result we compute the moments of the random variable w_(gamma*n), where 0 < \gamma < 1 is a fixed constant and n goes to infinity.

This is joint work with Nati Linial.

FinTech can be seen as the meeting of minds between economics and computer science in the digital age. One of its major intellectual foundations is the interdisciplinary subject of multiparty computation, which involves reliable distributed computing and cryptography from the side of computer science, and efficient mechanism design for financial activities from the side of economics. In this talk we discuss some recent work in auction and blockchain from this perspective. For example, is it true that more revenue can always be extracted from an auction where the bidders are more willing to pay than otherwise? Can more revenue be extracted when the bidders are more risk-tolerant than otherwise? We also present some new results on blockchain fees. These results help shed light on some structural questions in economics whose answers are non-obvious.

- 11:00-11:45 Avishay Tal, Simons Institute and Stanford University. Oracle Separation of BQP and the Polynomial Hierarchy
- 11:45-12:30 Badih Ghazi, Google. Resource-Efficient Common Randomness and Secret Key Generation
- 12:30-2:00 lunch
- 2:00-3:15 Motwani colloquium, Shafi Goldwasser, Simons Institute, UC Berkeley. Pseudo Deterministic Algorithms and Proofs
- 3:15-5:30 Happy hour and student talks

**Avishay Tal, Oracle Separation of BQP and the Polynomial Hierarchy**

In their seminal paper, Bennett, Bernstein, Brassard and Vazirani [SICOMP, 1997] showed that relative to an oracle, quantum algorithms are unable to solve NP-complete problems in sub-exponential time (i.e., that Grover's search is optimal in this setting).

In this work, we show a strong converse to their result. Namely, we show that, relative to an oracle, there exist computational tasks that can be solved efficiently by a quantum algorithm, but require exponential time for any algorithm in the polynomial hierarchy (a hierarchy of complexity classes that captures P, NP, coNP, and their generalizations).

The tasks that exhibit this "quantum advantage" arise from a pseudo-randomness approach initiated by Aaronson [STOC, 2010]. Our core technical result is constructing a distribution over Boolean strings that "look random" to constant-depth circuits of quasi-polynomial size, but can be distinguished from the uniform distribution by very efficient quantum algorithms.

Joint work with Ran Raz

**Badih Ghazi, Resource-Efficient Common Randomness and Secret Key Generation**

The task of manipulating randomness has been a subject of intense investigation in computational complexity with dispersers, extractors, pseudorandom generators, condensers, mergers being just a few of the objects of interest. All these tasks consider a single processor massaging random samples from an unknown source.

In this talk, I will discuss a less studied setting where randomness is distributed among different players who would like to convert it to other forms in an efficient manner and with little communication. For instance players may be given access to a source of biased correlated bits and their goal may be to get a common random string out of this source. Even the setting where the source is known leads to some interesting questions that have been explored since the 70s with striking constructions and some surprisingly hard questions. After giving some background, I will describe recent work which explores the task of extracting common randomness from correlated sources with bounds on either the sample complexity or on the number of rounds of interaction.

Based on joint works with T.S. Jayram, Mitali Bafna, Noah Golowich and Madhu Sudan

**Shafi Goldwasser, Pseudo Deterministic Algorithms and Proofs**

Probabilistic algorithms for both decision and search problems can offer significant complexity improvements over deterministic algorithms. One major difference, however, is that they may output different solutions for different choices of randomness. This makes correctness amplification impossible for search algorithms and is less than desirable in setting where uniqueness of output is important such as generation of system wide cryptographic parameters or distributed setting where different sources of randomness are used. Pseudo-deterministic algorithms are a class of randomized search algorithms, which output a unique answer with high probability. Intuitively, they are indistinguishable from deterministic algorithms by an polynomial time observer of their input/output behavior. In this talk I will describe what is known about pseudo-deterministic algorithms in the sequential, sub-linear and parallel setting. We will also describe n extension of pseudo-deterministic algorithms to interactive proofs for search problems where the veri fier is guaranteed with high probability to output the same output on different executions, regardless of the prover strategies.

In a distributed graph algorithm, we have a network of computing nodes, where each node initially knows only its own local neighborhood; the nodes communicate over the network edges in order to solve some problem on the network graph. We are interested in algorithms that are fast, but also do not require a lot of communication between the network nodes.

In this talk I will describe recent algorithms and lower bounds for two graph problems: exact maximum bipartite matching, and testing whether the network contains an even-length cycle of a specific length. Both problems do not have matching upper and lower bounds, so their complexity remains open. The talk will not assume any prior knowledge about distributed computing.

Hardness escalation or query-to-communication lifting is a method of proving tight upper and lower bounds on the complexity of a composed function or relation by a reduction to the query complexity of the base function. This talk will primarily be a tutorial. We will give many applications of lifting, including new and improved circuit lower bounds, as well as lower bounds in game theory, proof complexity and extended formulation complexity. We will sketch the main ideas in the proofs of lifting for randomized communication complexity, and conclude with new directions. This is joint work with Mika Goos and Thomas Watson.

I will present recent advances in approximate nearest neighbor search data structures for general normed spaces. I will explain what non-linear spectral gaps are, and how to use estimates on non-linear spectral gaps to partition large graphs whose vertices lie in a normed space. As a result, I will present the first sub-linear time data structure for approximate nearest neighbor search in high-dimensional normed spaces with approximation which is sub-polynomial in the dimension.

Based on joint work with Alex Andoni, Assaf Naor, Sasho Nikolov, and Ilya Razenshteyn.

We come up with explicit functions for which we can prove tight (up to polynomial factors) upper and lower bounds in the AC^0[2] circuit model. In particular, this implies the first Fixed-Depth Size Hierarchy theorem for this model.

The explicit functions are obtained by constructing explicit AC^0[2] circuits for solving the coin problem, which is defined as follows. For a parameter delta, we have to decide whether a given coin has bias (1+delta)/2 or (1-delta)/2.

Our upper bounds are proved by derandomizing a circuit construction of O'Donnell-Wimmer (2007) and Amano (2009) to reduce the number of samples. Our lower bounds follow from a modification of the Razborov-Smolensky polynomial method.

Joint work with Nutan Limaye (IITB CSE), Karteek Sreenivasiah (IITH CSE), Utkarsh Tripathi (IITB Math) and S Venkitesh (IITB Math).

Linear dynamical systems, a.k.a. Kalman filtering, are a class of time-series models widely used in robotics, finance, engineering, and meteorology. In it's general form (unknown system), learning LDS is a classic non-convex problem, typically tackled with heuristics like gradient descent ("backpropagation through time") or the EM algorithm.

I will present our new "spectral filtering" approach to the identification and control of discrete-time general linear dynamical systems with multi-dimensional inputs, outputs, and a latent state. This approach yields a simple and efficient algorithms for low-regret prediction (i.e. asymptotically vanishing MSE) as well as finite-time control.

We consider a data analyst's problem of purchasing data from strategic agents to compute an unbiased estimate of a statistic of interest. Agents incur private costs to reveal their data and the costs can be {\em arbitrarily correlated} with their data. Once revealed, data are verifiable. This paper focuses on {\em linear} unbiased estimators. We design an individually rational and incentive compatible mechanism that optimizes the worst-case mean-squared error of the estimation, where the worst-case is over the unknown correlation between costs and data, subject to a budget constraint in expectation. We characterize the form of the optimal mechanism in closed-form. We further extend our results to acquiring data for estimating a parameter in regression analysis, where private costs can correlate with the values of the dependent variable but not with the values of the independent variables.

In this talk I will tell you how analyzing economic markets where agents have cognitive biases has lead to better understanding of the communication complexity of local search procedures.

We begin the talk with studying combinatorial auctions with bidders that exhibit endowment effect. In most of the previous work on cognitive biases in algorithmic game theory (e.g., [Kleinberg and Oren, EC'14] and its follow-ups) the focus was on analyzing the implications and mitigating their negative consequences. In contrast, we show how cognitive biases can sometimes be harnessed to improve the outcome.

Specifically, we study Walrasian equilibria in combinatorial markets. It is well known that a Walrasian equilibrium exists only in limited settings, e.g., when all valuations are gross substitutes, but fails to exist in more general settings, e.g., when the valuations are submodular. We consider combinatorial settings in which bidders exhibit the endowment effect, that is, their value for items increases with ownership. Our main result here shows that when the valuations are submodular even a mild level of endowment effect suffices to guarantee the existence of Walrasian equilibrium. In fact, we show that in contrast to Walrsian equilibria with standard utility maximizers bidders -- in which the equilibrium allocation must be a global optimum -- when bidders exhibit endowment effect any local optimum can be an equilibrium allocation.

This raises the natural question of understanding the complexity of computing a local maximum in combinatorial markets. We reduce it to the following communication variant of local search: there is some fixed, commonly known graph G. Alice holds f_A and Bob holds f_B, both are functions that specify a value for each vertex. The goal is to find a local maximum of f_A+f_B, i.e., a vertex v for which f_A(v)+f_B(v) >= f_A(u)+f_B(u) for every neighbor u of v. We prove that finding a local maximum requires polynomial (in the number of vertices) bits of communication.

Based on joint works with Moshe Babaioff, Yakov Babichenko, Noam Nisan, and Sigal Oren.

A martingale is a sequence of random variables that maintain their future expected value conditioned on the past. A $[0,1]$-bounded martingale is said to polarize if it converges in the limit to either $0$ or $1$ with probability $1$. A martingale is said to polarize strongly, if in $t$ steps it is sub-exponentially close to its limit with all but exponentially small probability. In 2008, Arikan built a powerful class of error-correcting codes called Polar codes. The essence of his theory associates a martingale with every invertible square matrix over a field (and a channel) and showed that polarization of the martingale leads to a construction of codes that converge to Shannon capacity. In 2013, Guruswami and Xia, and independently Hassani et al. showed that strong polarization of the Arikan martingale leads to codes that converge to Shannon capacity at finite block lengths, specifically at lengths that are inverse polynomial in the gap to capacity, thereby resolving a major mathematical challenge associated with the attainment of Shannon capacity.

We show that a simple necessary condition for an invertible matrix to polarize over any non-trivial channel is also sufficient for strong polarization over all symmetric channels over all prime fields. Previously the only matrix which was known to polarize strongly was the $2\times 2$ Hadamard matrix. In addition to the generality of our result, it also leads to arguably simpler proofs. The essence of our proof is a ``local definition'' of polarization which only restricts the evolution of the martingale in a single step, and a general theorem showing the local polarization suffices for strong polarization.

In this talk I will introduce polarization and polar codes and, time permitting, present a full proof of our main theorem. No prior background on polar codes will be assumed.

Based on joint work with Jaroslaw Blasiok, Venkatesan Guruswami, Preetum Nakkiran and Atri Rudra.

The language edit distance is a significant generalization of two basic problems in computer science: parsing and string edit distance computation. Given any context free grammar, it computes the minimum number of insertions, deletions and substitutions required to convert a given input string into a valid member of the language. In 1972, Aho and Peterson gave a dynamic programming algorithm that solves this problem in time cubic in the string length. Despite its vast number of applications, in forty years there has been no improvement over this running time.

Computing (min,+)-product of two n by n matrices in truly subcubic time is an outstanding open question, as it is equivalent to the famous All-Pairs-Shortest-Paths problem. Even when matrices have entries bounded in [1,n], obtaining a truly subcubic (min,+)-product algorithm will be a major breakthrough in computer science.

In this presentation, I will explore the connection between these two problems which led us to develop the first truly subcubic algorithms for the following problems with improvements coming for each of these problems after several decades: (1) language edit distance, (2) RNA-folding-a basic computational biology problem and a special case of language edit distance computation, (3) stochastic grammar parsing—fundamental to natural language processing, and (4) (min,+)-product of integer matrices with entries bounded in n^(3-ω-c) where c >0 is any constant and, ω is the exponent of the fast matrix multiplication, believed to be 2.