Workshop on Beyond Worst-Case Analysis September 19-21, 2011, Mackenzie Room, Huang Engineering Center, Stanford University

[Home] [Topics] [Registration] [Program] [Video] [Monday Abstracts] [Tuesday Abstracts] [Wednesday Abstracts] [Short Talks] [Local Arrangements] [Organizers]

## Topics

This workshop will bring together researchers interested in the design and rigorous analysis of algorithms in models that complement the standard worst-case model, to learn about each other's research and identify the most promising avenues for advancing the field. Topics include, but are not limited to: smoothed analysis; planted and semi-random models; average-case analysis; robust models of data; novel input parameterizations and parameterized guarantees; self-improving and prior-independent algorithms. There will be a mix of plenary talks, regular invited talks, and rump/discussion sessions over the 3 days. This workshop will be the first of several taking place during Stanford's 2011-2012 special year on theory.

September 19-21, 2011, Mackenzie Room, Huang Engineering Center, Stanford University.

## Registration

The organizers thank you for registering by sending email with your name and affiliation in the subject line (please use the following format "Subject: Register BWCA: John Smith (Stanford University)") to bwca.cfp@gmail.com by September 5, 2011. There is no cost for attending if you register by this deadline. Lunch and coffee will be provided.

## Program

September
19th (Monday)
20th (Tuesday)
21st (Wednesday)
9:00-9:30 Breakfast and opening remarks Breakfast (provided) Breakfast (provided)
9:30-10:30 Avrim Blum Uri Feige Michael Mitzenmacher
10:30-11:00 Coffee (provided)
11:00-12:00 Nicole Immorlica
Nina Balcan
Kevin Leyton-Brown
Short talks: Yan, Daniely,
Lopez-Ortiz, Mahoney
12:00-1:00 Lunch (provided)
1:00-2:00 Dan Spielman Richard Karp Shang-Hua Teng
2:00-2:30 Coffee (provided)
2:30-4:00 Andrea Montanari
Andrew Goldberg
Susanne Albers
Anupam Gupta
Ryan Williams
Ankur Moitra
David Freeman
4:00-4:30 Coffee (provided)
4:30-5:30 Luca Trevisan Panel Discussion Bernard Chazelle
5:30- Workshop reception

## Video

Video of all lectures and the panel discussion are here (through Stanford) or here (through youtube).

## Abstracts (Monday)

It is often the case in a problem formulation that the objective being optimized is a proxy for some other underlying goal. For example, a distance-based clustering objective (such as k-means or k-median) in the case of clustering protein sequences is really a proxy for a true goal of correctly identifying the functions of the proteins; computing an (approximate) equilibrium in a game may be a proxy for a goal of predicting behavior. In this case, we implicitly assume (or hope) that our instance has the property that a good solution to the objective being measured implies a good solution to the final goal. This talk will discuss how this property in itself can provide structure that an algorithm can potentially use. As one example, for clustering we show that given an instance such that a 1.1 approximation to the k-means or k-median objective would imply being epsilon-close to a desired correct clustering, we can produce solutions that are O(epsilon)-close to that clustering, even though obtaining a 1.1-approximation in general is NP-hard. Furthermore if clusters are large, we can actually get epsilon-close -- as good as if we could approximate the objective to the NP-hard value. We also discuss relaxations of these conditions, as well as work in this direction for the problem of estimating Nash equilibria, and other problems where this approach may be fruitful.

This talk includes work joint with Pranjal Awasthi, Nina Balcan, Anupam Gupta, Or Sheffet, and Santosh Vempala

We introduce a new framework for designing and analyzing algorithms. Our framework applies best to problems that are inapproximable according to the standard worst-case analysis. We circumvent such negative results by designing guarantees for classes of instances, parameterized according to properties of the optimal solution. Given our parameterized approximation, called PArametrized by the Signature of the Solution (PASS) approximation, we design algorithms with optimal approximation ratios for problems with additive and submodular objective functions such as the capacitated maximum facility location problems. We consider two types of algorithms for these problems. For greedy algorithms, our framework provides a justi???cation for preferring a certain natural greedy rule over some alternative greedy rules that have been used in similar contexts. For LP-based algorithms, we show that the natural LP relaxation for these problems is not optimal in our framework. We design a new LP relaxation and show that this LP relaxation coupled with a new randomized rounding technique is optimal in our framework.

Joint work with Uri Feige, Vahab Mirrokni, and Hamid Nazerzadeh.

In Machine Learning, there has been significant interest in using unlabeled data together with labeled data for learning (aka semi-supervised learning) due to the availability of large amounts of unlabeled data in many modern applications. This approach has been intensely explored in the machine learning community, with many heuristics and specific algorithms, as well as various successful experimental results being reported. However, the assumptions these methods are based on are often quite distinct and not captured by standard theoretical models. In this work we describe a PAC-style model designed with semi-supervised learning in mind, that can be used to help thinking about the problem of learning from labeled and unlabeled data and many of the different approaches taken. Our model provides a unified framework for analyzing when and why unlabeled data can help, and in which one can discuss both algorithmic and sample complexity issues. Conceptually, this work highlights the importance of using data-dependent concept spaces for learning.

Smoothed analysis considers the behavior of algorithms assuming that their inputs are subject to slight random perturbations. We will survey some interesting results on the smoothed analysis of algorithms for solving systems of linear equations, systems of algebraic equations, and systems of linear inequalities.

We hope to convey some of the techniques applied in these analyses, and to discuss how they might be improved.

The XOR-satisfiability (XORSAT) problem requires finding an assignment of n Boolean variables that satisfies m exclusive OR (XOR) clauses, whereby each clause constrains a subset of the variables. I will consider random XORSAT instances, drawn uniformly at random from the ensemble of formulae containing n variables and m clauses of size k. This model presents several structural similarities to other ensembles of constraint satisfaction problems, such as k-satisfiability (k-SAT). For many of these ensembles, as the number of constraints per variable grows, the set of solutions shatters into an exponential number of well-separated components. This phenomenon appears to be related to the difficulty of solving random instances of such problems.

I will discuss a precise characterization of the space of solution of random XORSAT instances, its relation with algorithms behavior, and to various notions of spatial mixing and correlation decay. [based on joint work with Morteza Ibrahimi, Yashodhan Kanoria, Matt Kraning]

In this talk we examine the interaction between theoretical analysis and practical evaluation of algorithms. Empirical observations lead to theoretical models, algorithm design and theoretical analysis. Experimental evaluation of the resulting algorithm can either validate the theory or refine it. Refinements can lead to algorithms with better correspondence between theoretical bounds and practical performance. This theory -- experimentation cycle is an example of the scientific method.

We illustrate this process using point-to-point shortest path algorithms for road networks as an example. We describe how some of the recent algorithms with good experimental and practical performance leads to the theoretical notion of highway dimension, which allows theoretical analysis of the underlying methods. The analysis gives insight into the algorithm performance and introduces an unexpected relationship between shortest paths and VC-dimension.

The analysis also suggests that the hub labeling algorithm, never previously studied in the context of large road networks, has better runtime bound than those for the best previous implementations. This motivates an implementation and an empirical study of the hub labeling algorithm, which turns out to be faster in practice as well.

We also apply the some of the above-mentioned techniques to the one-to-all shortest path problem. In theory, this problem can be solved by Dijkstra's algorithm in (essentially) linear time. However, this time bound is for a word RAM model of computation that ignores modern computer architecture features such as locality and parallelism. We describe a new algorithm for the problem that takes advantage of these features to obtain a speedup of up to three orders of magnitude over Dijkstra's algorithm on continental-size networks. This work shows the limitations of theoretical analysis for predicting real-life algorithm performance and the power of the scientific method applied to algorithm design.

Joint work with Ittai Abraham, Daniel Delling, Amos Fiat, Andreas Nowatzyk, and Renato Werneck.

In this survey talk, we review the many open questions and the few things that are known about the average-case complexity of computational problems.

We will talk about questions such as:

- Can we derive the existence of hard-on-average problems from the existence of hard-in-worst-case problems? What is stopping us from proving statements like "Unless P=NP, there are hard-on-average problems in NP under the uniform distribution" or "Unless P=NP, public-key cryptography is possible"?

- What is going on in Levin's famously concise one-page paper on average-case NP-complete problems?

- For some, but not all, optimization problems, the hardest instances to approximate are random ones. What does this have to do with integrality gaps and inapproximability results?

## Abstracts (Tuesday)

The factor graph of an instance of a symmetric constraint satisfaction problem on $n$ Boolean variables and $m$ constraints (CSPs such as k-SAT, k-AND, k-LIN) is a bipartite graph describing which variables appear in which constraints. The factor graph describes the instance up to the polarity of the variables, and hence there are $2^n$ instances of the CSP that share the same factor graph. It is well known that factor graphs with certain structural properties make the underlying CSP easier to either solve exactly (e.g., for tree structures) or approximately (e.g., for planar structures). We are interested in the following question: is there a factor graph for which if one can solve every instance of the CSP with this particular factor graph, then one can solve every instance of the CSP regardless of the factor graph (and similarly, for approximation)? We call such a factor graph {\em universal}. As one needs different factor graphs for different values of $n$ and $m$, this gives rise to the notion of a family of universal factor graphs. This notion is similar in spirit to previously studied notions of preprocessing for closest codeword and closest lattice-vector problems.

The talk will discuss our goals for a systematic study of universal factor graphs, and present some results for max-kSAT. Many questions remain open. Joint work with Shlomo Jozeph.

One way of going beyond worst-case analysis is to ask the question: "How hard is it to solve a given problem in practice, using the best known methods?" We show that, even in settings where formal analysis seems hopeless---e.g., where algorithms are complex black boxes, and -- --where instance distributions are heterogeneous or -- --richly structured---it is possible to apply rigorous statistical methods to answer such questions with high levels of confidence. Specifically, we advocate building "empirical hardness models" which, given a novel problem instance, return an algorithm's predicted runtime. We show that these models can be extremely accurate, describing case studies on state-of-the-art solvers for satisfiability, combinatorial auction winner determination, and mixed integer programming. Our models can be analyzed to determine problem parameters that most affect empirical hardness, opening avenues for further theoretical investigation. Further, we show how models can be extended to capture the (either discrete or continuous) sets of algorithms described by arbitrarily parameterized solvers.

In many practical situations heuristic algorithms reliably give satisfactory solutions to real-life instances of optimization problems, despite evidence from computational complexity theory that the problems are intractable in general. Our long-term goal is to contribute to an understanding of this seeming contradiction, and to put the construction of heuristic algorithms on a firmer footing. In particular, we are interested in methods for tuning and validating heuristic algorithms by testing them on a training set of "typical" instances.

As a step in this direction we describe the evolution and validation of three heuristic algorithms.

(1) A generic algorithm for the class of implicit hitting set problems, which includes feedback vertex set and feedback edge set problems, the max-cut problem, the Steiner tree problem, the problem of finding a maximum feasible subset of a set of linear inequalities, matroid intersection problems and a problem of global genome alignment (joint work with Erick Moreno Centeno).

(2) The Colorful Subgraph Problem: given a graph in which each vertex is assigned a color from a set S, find the smallest connected subgraph containing at least one vertex of each color in S. (joint work with Shuai Li)

(3) The problem of clustering the vertices of a graph into small near-cliques. (joint work with Shuai Li).

In the design of online algorithms standard competitive analysis can lead to overly pessimistic performance guarantees. In the first part of this talk we will review various techniques to overcome such negative results. In the second part we will study list update, a cornerstone problem in the theory of online algorithms. We introduce a new model of locality of reference. Using this model we present a combined theoretical and experimental study in which the theoretically proven and experimentally observed performance guarantees match or nearly match.

Consider the online Steiner tree problem: given a metric space, "demand" points arrive online and have to be connected to some previous demand point immediately and irrevocably. This problem admits an O(log k) competitive algorithm for k requests, and this is optimal. But what if requests are not given by an adversary, but are independent random points from the metric space? What if the *set* of requests is decided by the adversary, but the demands in this set are presented in random order? And what about other optimization problems: facility location, set cover, matchings, etc. In this talk I will survey some work done (much of it in the past decade) on these online problems.

An algorithm is almost never asked to execute on only a single input, and is usually designed to deal with a large set of instances. Therein lies one of the problems with worst-case analysis: just because there exist few (or a set of) instances where the algorithm runs poorly, we know nothing about its performance on our data. But this large set of instances is often unknown, and probably has no simple closed-form description. Methods like average-case analysis assume way too much structure in the distribution of instances we wish to solve. The model of algorithmic self-improvement tries to bridge this gap.

A self-improving algorithm assumes that the inputs it receives are (for now, let's assume) independent samples from a fixed but unknown distribution D. Initially, the algorithm simply behaves as a standard worst-case algorithm, but it collects information about D. Eventually, it exploits the structure of D to run more quickly. Intuitively, if D has low entropy, then the algorithm should be able to run faster. If, on the other hand, D is like a uniform distribution, the algorithm just guarantees a standard average-case running time. A self-improving algorithm tries to figure out which situation it is in (regarding D), and optimizes itself for inputs coming from D. An optimal self-improving algorithm is one that can "turn into" the optimal algorithm for D (regardless of what D is).

For any new model, what better problem to study than that of sorting n numbers. I will talk about an optimal self-improving sorter that can deal with any D that is a product distribution. The amazing feature of the algorithm is that it "self-improves" to become an optimal (upto constant factors) comparison tree for D. There exist lower bounds showing that some restriction of D is necessary. Self-improving algorithms have been extended to computational geometry, and I will give a brief explanation of these results.

Joint work with Nir Ailon, Ken Clarkson, Bernard Chazelle, Ding Liu, and Wolfgang Mulzer.

## Abstracts (Wednesday)

Worst case analysis is fundamental to theoretical computer science, and it has proven incredibly successful. So successful, in fact, that it has proven difficult for us to move away from. In this talk we'll look at other fields for inspiration, see how their analysis begins from other mindsets outside of worst case analysis, and how it connects to CS theory. We'll then consider how the emphasis on worst-case analysis both strengthens and weakens our research process, and whether there is significant room for change.

I will discuss some recent work on the smoothed analysis of multi-objective optimization, PAC learning, and data clustering. I will then conclude the talk by drawing the contrast between these optimization problems and equilibrium computation (in game theory & mathematical economics) from the lens of smoothed analysis.

Joint work with Daniel Spielman (Yale), Heiko Roglin (Bonn University), Adam Kalai (Microsoft New England Lab), Alex Samorodnitsky (Hebrew University), Xi Chen (Columbia University), and Xiaotie Deng (University of Liverpool).

In areas such as planning and finite model-checking, current solution techniques can solve combinatorial problems with millions of variables and constraints. The good scaling behavior of these methods appears to defy what one would expect based on worst-case complexity. This talk will survey a framework for understanding the tractability of real-world problem instances, developed in collaboration with Gomes and Selman. The central concept is that of a "backdoor" to an instance: a small variable subset "sufficient" for solving that instance. Informally, a backdoor measures the "distance" of a problem instance from a polynomial-time solvable subset of instances. Empirical results have shown that many real-world problems possess small backdoors (with respect to polynomial-time heuristics built into the solvers), and much theoretical work on parametrized and exact algorithms can be reinterpreted in the language of backdoors.

Consider an optimization problem with $n$ binary variables and $d+1$ linear objective functions. Each valid solution $x \in \{0,1\}^n$ gives rise to an objective vector in $\R^{d+1}$, and one often wants to enumerate the Pareto optima among them. In the worst case there may be exponentially many Pareto optima; however, it was recently shown that in (a generalization of) the smoothed analysis framework, the expected number is polynomial in $n$ (Roeglin, Teng, FOCS 2009). Unfortunately, the bound obtained had a rather bad dependence on $d$; roughly $n^{d^d}$. In this paper we show a significantly improved bound of $n^{2d}$.

Our proof is based on analyzing two algorithms. The first algorithm, on input a Pareto optimal $x$, outputs a "testimony" containing clues about $x$'s objective vector, $x$'s coordinates, and the region of space $B$ in which $x$'s objective vector lies. The second algorithm can be regarded as a SPECULATIVE execution of the first -- it can uniquely reconstruct $x$ from the testimony's clues and just SOME of the probability space's outcomes. The remainder of the probability space's outcomes are just enough to bound the probability that $x$'s objective vector falls into the region $B$.

This is joint work with Ryan O'Donnell.

Network coding is a routing mechanism that offers increased throughput and improved robustness to random faults in completely decentralized networks. In contrast to traditional routing schemes, however, network coding requires intermediate nodes to modify data packets en route; for this reason, standard signature schemes are unable to ensure message integrity, and it is a challenge to provide resilience to tampering by malicious nodes.

In this talk we will survey cryptographic methods for ensuring integrity in network coding. We will discuss three different methods, whose security depends on the difficulty of (variants of) of three different computational problems, respectively: computing discrete logarithms, factoring large integers, and finding short vectors in lattices. These schemes are examples of "homomorphic signatures," which are schemes that can authenticate computations on signed data. We will conclude with a more general example of homomorphic signatures.

This talk will cover joint research with Dan Boneh, Jon Katz, and Brent Waters.

I will discuss the merits of an algorithmic approach to the analysis of complex self-organizing systems. I will argue that computer science, and algorithms in particular, offer a fruitful perspective on the complex dynamics of multiagent systems: for example, opinion dynamics, bird flocking, firefly synchronization, etc. I will give many examples and try to touch on some of the theory behind them, with an emphasis on their algorithmic nature.

## Short Talks

Pranjal Awasthi: On Nash Equilibria of Approximation-Stable Games

One of the main reasons for wanting to compute a Nash equilibrium of a game is to predict how players will play. However, if the game has multiple equilibria that are far apart, or approximate-equilibria that are far in variation distance from the true Nash equilibrium strategies, then this prediction may not be possible even in principle. Motivated by this, we define the notion of games that are approximation stable, meaning that all epsilon-approximate equilibria are contained inside a small ball of radius delta around a true equilibrium, and investigate a number of their properties. Many natural small games such as matching pennies and rock-paper-scissors are indeed approximation stable. We show furthermore there exist 2-player n-by-n approximation-stable games in which the Nash equilibrium and all approximate equilibria have support Omega(log n). On the other hand, we show all (epsilon,delta) approximation-stable games must have an epsilon-equilibrium of small support yielding an algorithm which improves over the bound of Lipton et al. for games satisfying this condition. We in addition give improved bounds for the case that epsilon and delta are sufficiently close together.

An instance of an optimization problem is stable if the optimal solution does not change upon a small perturbation of the instance. This is a very natural notion for optimization problems where the input is generated some physical measurement. In particular, this notion is very natural in the context of clustering algorithms. Bilu and Linial asked whether stable instances of NP-hard problems may be computationally feasible and this is the question we address here. In this talk I will survey the work of Bilu and Linial and of Awasthi-Blum and Sheffet as well as some of our own recent work.

This is joint work with Nati Linial.

It is well-established that input sequences for paging and list update have locality of reference.

In this paper we analyze the performance of paging and list update algorithms in terms of the amount of locality in the input sequence. This introduces parameterized-style analysis to online algorithms. The idea is to rather than normalizing the performance of an online algorithm by an (optimal) offline algorithm, we explicitly express the behavior of the algorithm in terms of two more natural parameters: the size of the cache and Denning???s working set measure. This technique creates a performance hierarchy of paging algorithms which better reflects their intuitive relative strengths. Also it reflects the intuition that a larger cache leads to a better performance. We obtain similar separation for list update algorithms. Lastly, we show that, surprisingly, certain randomized algorithms which are superior to MTF in the classical model are not so in the parameterized case, which matches experimental results.

Traditional methods of worst-case analysis often fail to provide even qualitative guidance as to what algorithms are appropriate in many large-scale data analysis applications. This fact is usually blamed on the exclusive focus on running time for arbitrary input. A deeper reason is that most problems arising in modern data applications are intractable and since the relaxations and embeddings underlying most worst-case approximation algorithms are not "close enough" to the data to provide such qualitative guidance. Recent empirical work on very large social and information networks, however, has clearly demonstrated that by exploiting the geometric properties associated with these relaxations and embeddings, one can extract insight and perform inference from large, sparse, and noisy graphs in a scalable and robust way. Performing such inference is typically the domain of a statistical technique known as regularization, which usually involves modifying the original objective with a geometric constraint and then exactly optimizing the modified objective. Thus, these results demonstrate a manner in which approximate computation can implicitly lead to statistical regularization, i.e., in which the solutions from worst-case approximation algorithms are not only faster but also "better" for downstream applications than the solutions to their intractable counterparts. This will be illustrated in the context of using spectral versus flow-based approximation algorithms for the graph partitioning problem; and it will be explained in terms of how the geometric embeddings underlying spectral methods versus flow methods provide a form of capacity control to "smooth" or "regularize" the input graph in very different ways. The implications of these results for worst-case approximation algorithms more generally will be discussed.

A prior-independent mechanism has no knowledge about the underlying input prior distribution, yet its expected revenue approximates that of the optimal mechanism tailored for the distribution. This prior-independent approach aims to be a balanced mix of computer science's worst-case analysis approach, and economics' distribution-based average case analysis approach.

We show that a general approach to obtain prior-independent mechanisms is via a reduction to statements about "resource augmentation". Such a statement (like the classic Bulow-Klemperer theorem) says that running the VCG mechanism over an expanded environment with more bidders approximates the optimal mechanism in the original environment. Using this reduction, we show that a simple supply-limiting strategy gives prior-independent mechanisms for several single-parameter and multi-parameter settings.

Based on joint work with Inbal Talgam-Cohen and Tim Roughgarden.

Motivated by applications in online ad allocation, we study the problem of simultaneous approximations for the adversarial and stochastic online budgeted allocation problem. This problem consists of a bipartite graph G = (X,Y,E), where the nodes of Y along with their corresponding capacities are known beforehand to the algorithm, and the nodes of X arrive online. When a node of X arrives, its incident edges, and their respective weights are revealed, and the algorithm can match it to a neighbor in Y . The objective is to maximize the weight of the final matching, while respecting the capacities.

When nodes arrive in an adversarial order, the best competitive ratio is known to be 1 - 1/e, and it can be achieved by the Ranking [16], and its generalizations (Balance Algorithm [14, 19]). On the other hand, if the nodes arrive through a random permutation, it is possible to achieve a competitive ratio of 1 - ?? [9]. In this paper we design algorithms that achieve a competitive ratio better than 1 - 1/e on average, while preserving a nearly optimal worst case competitive ratio. Ideally, we want to achieve the best of both worlds, i.e, to design an algorithm with the optimal competitive ratio in both the adversarial and random arrival models. We achieve this for unweighted graphs, but show that it is not possible for weighted graphs. In particular, for unweighted graphs, under some mild assumptions, we show that Balance Algorithm achieves a competitive ratio of 1 - ?? in a random permutation model. For weighted graphs, however, we show that this is not possible; we prove that no online algorithm that achieves an approximation factor of 1 - 1/e for the worst-case inputs may achieve an average approximation factor better than 97.6% for random inputs. In light of this hardness result, we aim to design algorithms with improved approximation ratios in the random arrival model while preserving the competitive ratio of 1 - \eps in the worst case. To this end, we show an algorithm achieving a competitive ratio of 0.76 for the random arrival model, while having a 1-\eps ratio in the worst case.

This is based on joint work with Vahab Mirrokni and Shayan Oveis Gharan.

## Local Arrangements

For general information about getting to Stanford, see here or here.

Three nearby hotels are the Sheraton, the Westin, and Hotel Keen. Ask for Stanford University or corporate rate.

The workshop venue is in the Huang Center, see the map here. The Mackenzie Room is the "top of the octogon".

You have various options to get to the Huang Center. One is take the Marguerite (Stanford's free shuttle). See here for instructions. Note that the recommended hotels are all very close to the main (University Ave) Palo Alto Caltrain station. A second option is to park at the meters in closest parking lot, see here. We will also have a very limited number of "C" permits available for that lot.

## Organizers

Tim Roughgarden and Luca Trevisan.