Balanced Allocation on Graphs
It is well known that if n balls are inserted into n bins, with high
probability, the bin with maximum load contains (1+o(1)) log n / loglog
n balls. Azar, Broder, Karlin, and Upfal showed that instead of choosing
one bin, if d >= 2 bins are chosen at random and the ball inserted into
the least loaded of the d bins, the maximum load reduces drastically to
loglog n / log d + O(1). However, in the context of hashing
applications, the bins may represent hash buckets of a hash table stored
on a disk and hence the cost of accessing the two bins (for d=2) depends
on the distance between them.
We study the two choice balls and bins process when balls are not allowed to choose any two random bins, but only bins that are connected by an edge in an underlying graph. We show that for n balls and n bins, if the graph is almost regular with degree n^\eps, where \eps is not too small, the previous bounds on the maximum load continue to hold. Precisely, the maximum load is loglog n + O(1/\eps) + O(1). So even if the graph has degree n^{\Omega(1/loglog n)}, the maximum load is O(loglog n). For general \Delta-regular graphs, we show that the maximum load is loglog n + O(\frac{log n}{log (\Delta/log^4 n)}) + O(1) and also provide an almost matching lower bound of loglog n + \frac{log n}{log (\Delta log n)}. Further this does not hold for non-regular graphs even if the minimum degree is high.
Voecking showed that the maximum bin size with d choice load balancing can be further improved to O(loglog n / d) by breaking ties to the left. This requires d random bin choices. We show that such bounds can be achieved by making only two random accesses and querying d/2 contiguous bins in each access. By grouping a sequence of n bins into 2n/d groups, each of d/2 consecutive bins, if each ball chooses two groups at random and inserts the new ball into the least-loaded bin in the lesser loaded group, then the maximum load is O(loglog n / d) with high probability. Furthermore, it also turns out that this partitioning into aligned groups of size d/2 is also essential in achieving this bound.
Joint work with Rina Panigrahy. The talk is based on our SODA 2006 paper.
The Complexity of ICP (Iterative Closest Point)
and k-means: Lower Bounds and Smoothed Analysis
The k-means method and the ICP algorithm are two widely used local
search heuristics. While both run stunningly fast in practice, their
theoretical complexity has been poorly understood. We begin by
presenting exponential time lower bounds for both algorithms. To
reconcile the difference between the observed and theoretical running
times we turn to smoothed analysis of Spielman and Teng. In this
setting all points are randomly perturbed before running the
algorithm. This step breaks brittle lower bounds and allows us to
prove that the smoothed complexity of both algorithms is independent
of the dimensionality of the data, thus yielding a polynomial smoothed
upper bound for ICP, and an O(n^k) smoothed upper bound for k-means.
The Complexity of Nash Equilibrium
In 1951, Nash showed that every game has a mixed Nash equilibrium.
His proof is a reduction to Brouwer's fixpoint theorem and places the
problem of finding equilibria into the realm of 'exponential existence
proofs'. In fact, whether Nash equilibria can be computed in polynomial
time has remained open since that time, and has come under increased
scrutiny during the past two decades. In this talk, I will present some
recent results (joint work with Paul Goldberg and Christos Papadimitriou)
that shed light to this problem, in the negative direction. I will, also,
present positive results (joint work with Alex Fabrikant and Christos
Papadimitriou) that advocate how "the game world is flat".
Oblivious Network Design
Consider the following network design problem: given a network $G = (V,E)$, source-sink pairs $\{s_i, t_i\}$ arrive and desire to send
a unit of flow between themselves. The cost of the routing is this: ifedge $e$ carries a total of $f_e$ flow (from all the terminal
pairs), the cost is given by $\sum_e \load(f_e)$, where $\load$ is some concave cost function; the goal is to minimize the total cost
incurred. However, we want the routing to be oblivious: when terminal pair $(s_i, t_i)$ makes its routing decisions, it does not know
the current flow on the edges of the network, nor the identity of the other pairs in the system. Moreover, it does not even know the
identity of the function $\load$, merely knowing that $\load$ is a concave function of the total flow on the edge. How should it
(obliviously) route its one unit of flow? Can we get competitive algorithms for this problem?
We develop a framework to model oblivious network design problems (of which the above problem is a special case), and give algorithms with poly-logarithmic competitive ratio for problems in this framework (and hence for this problem).
Joint work with MohammadTaghi Hajiaghayi and Harald Raecke.
Achieving Channel Capacity Against Malicious Errors
Suppose you want to communicate a message of k packets on a noisy communication channel. So you judiciously encode it as a redundant collection of n = ck packets and transmit these. What is the fewest number of correct packets one needs to receive in order to have any hope of recovering the message?
Well, clearly one needs at least k correct packets. In this talk, I will describe recent work that gives an encoding scheme that attains this information-theoretic limit: for any desired eps > 0, it enables recovery of the message as long as at least k(1+eps) packets are received intact. The location of the correct packets and the errors on the remaining packets can be picked adversarially by the channel.
This achieves the optimal trade-off (called "capacity") between redundancy and error-resilience for a malicious noise model where the channel can corrupt the transmitted symbols arbitrarily subject to a bound on the total number of errors. Previously, capacity-achieving error-correcting codes were only known for weaker, stochastic models of the channel noise.
These results are obtained in a error-recovery model called list decoding. The talk will introduce and motivate the problem of list decoding, and then give a peek into the algebraic ideas and constructions that lead to the above result. We will also describe some challenging questions that still remain open.
The talk will be self-contained. Based on joint work with Atri Rudra.
New Market Models and Algorithms
The notion of a ``market'' has undergone a paradigm shift with the Internet
-- totally new and highly successful markets have been defined and launched
by companies such as Google, Yahoo!, Amazon, MSN and Ebay. Another major
change is the availability of massive computational power for running these
markets in a centralized or distributed manner.
In view of these new realities, the study of market equilibria, an essentially non-algorithmic theory within Mathamatical Economics, needs to be revived and rejuvenated with new models, ideas, and an inherently algorithmic approach.
In this work we embark on an in-depth study of resource allocation markets. We give strongly polynomial algorithms for finding equilibria in several of these markets; interestingly enough, these algorithms also solve certain nonlinear convex programs. Our algorithms shed light into several aspects of these markets, including efficiency, fairness, competition monotonicity, and rationality of solutions.
Based on joint work with Kamal Jain:
http://www-static.cc.gatech.edu/fac/Vijay.Vazirani/EG.pdf
and Deeparnab Chakrabarty and Nikhil Devanur:
http://eccc.hpi-web.de/eccc-reports/2006/TR06-029/index.html
LOTs (Leaf Only Trees): Fast, Efficient and Robust In-Network Computation
The need for efficient computation of functions of global state
lies at the heart of a wide range of problems in distributed
systems. Examples include load balancing, leader election and barrier
synchronization, distributed file system maintenance, sensor fusion,
coordinated intrusion detection, and top-K queries in stream-oriented
databases. More precisely, we wish to compute a function g on the
values held by the nodes. In practice, computing such functions
involves in-network computation of partial functions, to avoid the
overhead of exchanging Omega(n) sized messages.
Epidemic (or gossip) algorithms are known as the most robust and efficient way to disseminate information in a network. However, they suffer from the "double counting" problem. Solutions to this problem either make gossip brittle or depend on variants of the Flajolet-Martin algorithm, which results in errors of up to 50%.
In this paper, we present the first efficient, accurate and robust
algorithm for in-network computation of functions of global state. We
provide a simple abstraction of a B-tree on which arbitrary functions
can be evaluated. The scheme is fast and efficient: it terminates in
O(ln n) communication rounds and every node sends most of its
messages to nearby nodes. Most useful functions like sum or approximate
histogram can be embedded in the tree, resulting in messages of size
O(ln n). For functions that need to be distorted to be embedded
into the tree, the price is not loss of precision but larger messages,
of size proportional to the degree of distortion. Our analysis includes
bounds on degree of load imbalance and tolerance of node failures.
Testing properties of distributions in sublinear time
The focus of this talk will be sublinear time algorithms for testing
properties of distributions. The input is a discrete distribution
accessible via sampling and the question is to determine if the
distribution has a certain property or is far from every distribution with
that property. Interesting properties include closeness in various norms,
independence, monotonicity in one and higher dimensions, entropy, etc.
Group-theoretic Algorithms for Matrix Multiplication
We present a group-theoretic approach to producing fast algorithms for
matrix multiplication. In this framework, one devises algorithms by
constructing non-abelian groups with certain properties. The algorithms
themselves are natural and are based on taking the discrete Fourier
transform over these groups.
We construct several families of groups that achieve matrix multiplication exponent significantly less than 3 (but not better than the current best bound, 2.376...). This leads to two appealing conjectures, one combinatorial and the other algebraic. Either one would imply that the exponent of matrix multiplication is 2.
This is joint work with Henry Cohn, Bobby Kleinberg, and Balazs Szegedy.
The talk is based on two papers:
1. "A group-theoretic approach to fast matrix multiplication" (FOCS 2003) http://front.math.ucdavis.edu/math.GR/0307321
2. "Group-theoretic algorithms for matrix multiplication" (FOCS 2005)
http://www.cs.caltech.edu/~umans/papers/CKSU05.pdf
Playing the k-armed bandit with adaptive payouts
The "k-armed bandit" is a modified slot machine, which has k arms
you can pull, rather than just one. Each arm has a different payout,
which can change every time you play, but is always in the range [0,1].
If we get to play this machine T times, how well can we do, compared
to the best single arm in hindsight? Specifically, the goal is to minimize
"regret", defined as the random variable Payout(algorithm) minus
Payout(best arm).
The character of this problem depends strongly on how much information is given to the algorithm. In the "full information" version of the game, the algorithm is told the payout for all k arms after each round. This version of the problem is well-understood, and randomized algorithms are known which, with high probability, guarantee regret O(\sqrt{T log k}), which is optimal. In the "bandit" version of the game, the algorithm is only told the payout for the arm it chose in the last round. This setting is harder, with a lower bound of Omega(\sqrt{T k}), and previously best upper bound O(T^{2/3}) (for k fixed).
We present a new randomized algorithm for the bandit version, which achieves regret O(\sqrt{T k log k}) with high probability.
This is joint work with Varsha Dani.
Bypassing the Embedding: Algorithms for low dimensional metrics
The doubling dimension of a metric is the smallest $k$ such that any
ball of radius $2r$ in the metric can be covered using $2^k$ balls of
radius $r$. This concept for abstract metrics has been proposed as a
natural analog to the dimension of a Euclidean space. If we could
embed metrics with low doubling dimension into low dimensional
Euclidean spaces, they would inherit several algorithmic and
structural properties of the Euclidean spaces. Unfortunately however,
such a restriction on dimension does not suffice to guarantee
embeddibility.
In this talk, we'll explore the option of bypassing the embedding for
several applications. We'll derive results analogous to Euclidean
spaces for low dimensional abstract metric spaces. In particular,
we'll show that low dimensional abstract spaces admit: quasipolynomial
time approximation scheme for various optimization problems including
the TSP; short approximate distance labels; compact approximate
shortest path routing schemes.
Some New Results for An Old Data Structure, Bloom Filters
We present some new results for Bloom filters, a randomized data
structure for set membership queries based on hashing. Bloom
filters require very small amounts of space, but allow false
positives; they may return that a query item is in the set even
if it is not.
We first show that Bloom filters, which normally require k hash functions, can be modified so that only 2 hash functions are required, with no effect on the asymptotic false positive probability. This may simplify the use of Bloom filters in many practical situations.
We then show how Bloom filters can be used in conjunction with the power of two choices to design very effective hash tables, suitable for use in routers and other situations that require highly efficient and effective hash tables under strict memory limitations.
Time permitting, we will also present further results of our ongoing
work on the Bloom filter data structure.
Approximation Algorithms for Confluent Flow
Packet flows in IP networks are confluent, meaning that at any
node all the flow for a given destination exits along one edge. To what
extent does this affect the congestion properties of the flow? This
problem has a combinatorial component --- the problem of characterizing
the worst-case gap between ordinary and confluent flow --- as well as an
algorithmic component: the problem of computing a confluent flow which
approximately minimizes congestion. We will present nearly tight
solutions to both problems in the case of single-sink confluent flows, by
demonstrating:
(A) For any single-sink instance of the confluent flow problem, one can compute, in polynomial time, a solution whose congestion is at most 1+ln(k) times the minimum congestion of an ordinary (splittable) flow, where k is the degree of the sink node.
(B) The gap between confluent and splittable flow can be as large as ln(k) in some instances.
(C) The minimum congestion of a confluent flow is Omega(log k)-hard to approximate, even on single-sink instances.
For single-sink confluent flow instances in k-connected graphs, these results can be substantially improved: there exists a confluent flow whose congestion is within a factor of 2 of the optimal splittable flow. Surprisingly, the proof of this fact uses techniques from algebraic topology.
These results suggest many attractive open problems regarding confluent flow. At the end of the talk we will survey some of these problems.
This is joint work with Jiangzhuo Chen, Laszlo Lovasz, Rajmohan
Rajaraman, Ravi Sundaram, and Adrian Vetta.
An Algorithmic Study of Ad Auctions
The main source of the dramatic revenues of Google, Yahoo and MSN is the
innovative auction used by these search engine companies for advertising.
These auctions allow other companies to bid for ads displayed along with
the results of any search keyword.
There are a number of algorithmic and game theoretic issues that arise in
this context. I will talk about several abstractions which lead to very
interesting questions in the context of algorithms and mechanism design. I
will briefly talk about the problem of online allocation of advertising
space to achieve optimal performance. Then I will discuss designing
truthful auctions when the bidders have budget restrictions or when the
goods are arriving in an online fashion.
Algorithms for path-planning
Consider a FedEx delivery man who has to deliver packages to various
addresses in the city. Each package has a value and a time-window within
which it must be delivered. The goal is to maximize the total value of
packages delivered within their time-windows. "Path-planning" problems
such as the above arise in a variety of applications such as robot
navigation and supply chain management. We present the first
approximation, an O(log^2 n) approximation, for this "time-windows"
problem, as well as a constant factor approximation for its special case,
Orienteering, where each package has the same delivery deadline.
Rational Secure Computation and Ideal Mechanism Design
We put forward and implement Rational Secure Computation, a stronger
notion of secure computation that does not depend on players’ honesty,
but solely on their rationality.
The key to our result is showing that the ballot box---the venerable
device used throughout the world to privately and correctly compute the
tally of secret votes---can actually be used to securely compute ANY
function of secret inputs.
Our work bridges the fields of Game Theory and Cryptography, and has
broad implications for Mechanism Design. In particular, we exhibit a
polynomial-time algorithm that, given ANY social-choice function, finds
a mechanism that implements it, if any exists.
Joint work with Sergei Izmalkov and Matt Lepinski.