SAS is the Stanford Algorithms Seminar (formerly the AFLB -- Algorithms for Lunch Bunch). It is run by Piotr Indyk. Meetings are generally Thursdays at 4:15 p.m.

Talks are held in the Gates Building, near the Main Quad of Stanford's campus. Click here for directions.

## Contents:

1. 24 Sep 1998: Jeffrey D. Oldham (Stanford). Combinatorial Generalized Circulation Algorithms.
2. 5 Nov 1998: Yevgeniy Dodis (MIT). Space-Time Tradeoffs for Graph Properties.
3. 12 Nov 1998: Kasturi Varadarajan (DIMACS). A Divide-and-Conquer Algorithm for Min-Cost Perfect Matching in the Plane.
4. 14 Jan 1999: Abhiram Ranade (IIT Mumbai). On Partitioning Geometrically Embedded Graphs.
5. 21 Jan 1999: SODA'99 week - no talk
6. 4 Feb 1999: Uriel Feige (Weizmann and Compaq SRC). Noncryptographic Selection Protocols.
7. 11 Feb 1999: Ronitt Rubinfeld (Cornell and IBM Almaden). Highly Efficient PCPs for Optimization Problems.
8. 18 Feb 1999: Valerie King (Victoria and Berkeley). Fully Dynamic Algorithms for Directed Graphs.
9. 4 Mar 1999: Madhu Sudan (MIT). Chinese Remaindering with Errors.
10. 11 Mar 1999: Sridhar Rajagopalan (IBM Almaden). Trawling the web for cyber-communities: the Campfire project.
11. 18 Mar 1999: Spring break - no talk.
12. 25 Mar 1999: Adam Klivans (MIT). Graph Nonisomorphism Has Subexponential Size Proofs Unless The Polynomial-Time Hierarchy Collapses.
13. 8 Apr 1999: David Zuckerman (UT Austin and UC Berkeley). TBA.
14. 15 Apr 1999: Lieven Vandersypen (Stanford). Experimental Demonstration of Quantum Computing.
15. 22 Apr 1999: Miklos Ajtai (IBM Almaden). Determinism versus Non-determinism for Linear-time RAMs with memory restrictions.
16. 29 Apr 1999: Santosh Vempala (MIT and UC Berkeley). Fast Algorithms for Low-rank Approximation.
17. 6 May 1999: STOC'99 week - no talk.
18. 13 May 1999: Amir Ronen (Hebrew U). Algorithmic Mechanism Design.
19. 20 May 1999: Soumen Chakrabarti (IIT Mumbai). Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery.
20. 27 May 1999: Edith Cohen (AT&T). Connection caching.

## 24 September 1998

Gates Building 498. 4:15 p.m.

## Jeffrey D. Oldham (Stanford University)

### Combinatorial Generalized Circulation Algorithms

Generalized network circulation problems generalize normal network flow problems by specifying a flow multiplier mu(a) for each arc a. For every unit of flow entering arc a, mu(a) units of flow exit. Using multipliers permit a flow problem to model transforming one type into another, e.g., currency exchange, and modification of the amount of flow, e.g., water evaporation from canals or accrual of interest in bank accounts.

We present a strongly polynomial algorithm for the restricted generalized uncapacitated transshipment problem, a generalized version of the shortest path problem. We present a left-distributive closed semiring which permits use of the Bellman-Ford algorithm to solve this problem given a guess for the value of the optimal solution. Using Megiddo's parametric search scheme, we can compute the optimal value in strongly polynomial time.

Using the restricted generalized uncapacitated transshipment algorithm, we present fully polynomial-time approximation schemes for the generalized versions of the maximum flow problem, the minimum-cost flow problem, the concurrent flow problem, the multicommodity maximum-flow problem, and the multicommodity minimum-cost flow problem. For all of these problems except the maximum flow variant, these combinatorial algorithm schemes are the first polynomial-time algorithms not based on interior point methods.

If there is time, we may present preliminary implementation results for the restricted generalized transshipment problem. The implementation is joint work with Amit Patel.

Email: oldham@cs.stanford.edu
WWW: http://theory.stanford.edu/~oldham/
Related paper: http://Theory.Stanford.EDU/~oldham/publications/generalized/

## 6 November 1998

Gates Building 498. 3:15 p.m.

## Yevgeniy Dodis (MIT)

### Space-Time Tradeoffs for Graph Properties

We initiate a study of space-time tradeoffs in the cell-probe model under restricted preprocessing power. In this setup, we are given the following:

• (a) a source function f(y,q) where y is the static input and q is a dynamic query,
• (b) a function family F for preprocessing, and
• (c) a parameter s indicating the amount of preprocessing space.
The goal is to preprocess the static input y to create a data structure D so that the dynamic queries q can be quickly answered. The data structure D contains s bits of information about y and each bit corresponds to a function in F applied to y. The time'' t to answer a query is measured in terms of the number of bits read from D. Intuitively, this models the situation where the data access is slow/unreliable. We study the dependence between the space s and the time t for a given source function f and preprocessing family F.

Classically, space-time tradeoffs have been studied in this model under the assumption that family F is an unrestricted function family. In this setting, a large gap exists between the expected and the provable lower bounds. Augmenting the model with parameter F that characterizes the preprocessing power, makes for a more realistic computational model and unifies many fundamental questions previously studied in isolated settings. In particular, the extreme settings of our model correspond to the cell probe and the generalized decision tree complexities. Thus our general model combines together the issues of reusing'' limited space and adaptive expressibility'' in computing f. Using graph properties as candidates for the source function f, we develop new techniques for deriving space-time tradeoffs in this model. In doing so, we illustrate various facets of our model and strengthen some existing results.

This is joint work with Sanjeev Khanna (Bell Labs).

Email: yevgen@theory.lcs.mit.edu
WWW: http://theory.lcs.mit.edu/~yevgen/
Related paper:

## 12 November 1998

Gates Building 498. 4:15 p.m.

### A Divide-and-Conquer Algorithm for Min-Cost Perfect Matching in the Plane

Given a set V of 2n points in the plane, the min-cost perfect matching problem is to pair up the points (into n pairs) so that the sum of the Euclidean distances between the paired points is minimized. We present an O(n^(3/2) log^5 n)-time algorithm for computing a min-cost perfect matching in the plane, which is an improvement over the previous best algorithm of Vaidya by nearly a factor of n. Vaidya's algorithm is an implementation of the algorithm of Edmonds, which runs in n phases, and computes a matching with i edges at the end of the i-th phase. Vaidya shows that geometry can be exploited to implement a single phase in roughly O(n^(3/2)) time, thus obtaining an O(n^(5/2) \log^4 n)-time algorithm. We improve upon this in two major ways. First, we develop a variant of Edmonds' algorithm that uses geometric divide-and-conquer, so that in the conquer step we need only O(n^(1/2)) phases. Second, we show that a single phase can be implemented in O(n \log^5 n) time.

Email: krv@dimacs.rutgers.edu
Related paper: FOCS'98

## 14 January 1999

Gates Building 498. 4:15 p.m.

### On Partitioning Geometrically Embedded Graphs

We define the class of geometrically sparse graphs and show that the class contains graphs arising in finite element computations and n-body computations based on the Fast Multipole Method. An extremely simple randomized algorithm is presented for partitioning geometrically sparse graphs. In case of graphs arising from the finite element method in 3 dimensions, our algorithm gives optimal O(N^{2/3}) size bisectors where N is the number of vertices. This matches previous constructions based on a completely different approaches. Our algorithm also gives optimal O(N^{2/3}) size bisectors for graphs arising from the Fast Multipole Method; this improves the previous results.

## 4 February 1999

Gates Building 498. 4:15 p.m.

## Uriel Feige (Weizmann and Compaq SRC)

### Noncryptographic Selection Protocols

We consider leader election in the full information model when almost half the players are faulty (as in Ben-Or and Linial 1989, Alon and Naor 1993, Russel and Zuckerman FOCS98 etc.). Boppana and Narayanan showed that whenever the fraction of good players is bounded away from 1/2, the probability of choosing a good leader is bounded away from 0. We give a simple protocol and a simple proof for this theorem. We also prove that the probability of choosing a good leader tends to 0 as the fraction of good players tends to 1/2, resolving a known open question. Moreover, we show that the rate of convergence is polynomial.

Email: feige@pa.dec.com
Related paper: available upon request.

## 11 February 1999

Gates Building 498. 4:15 p.m.

## Ronitt Rubinfeld (Cornell and IBM Almaden)

### Highly Efficient PCPs for Optimization Problems

Consider the following scenario: A client sends a computational request to a consulting'' company on the internet, by specifying an input and a computational problem to be solved. The company then computes the answer and sends it back to the client. An obvious issue that arises, especially in the case that the company does not have a well established reputation, is: why should the client believe the answer to be correct? We consider the case when it is enough for the client to know that the answer is close to correct. We show that for a number of optimization functions, a very short dialogue between the client (the verifier) and the company (the prover) can convince the client of the approximate correctness of the answer. In fact, in our protocols, the running time of the verifier is significantly less than the size of the input. In contrast, results such as IP=PSPACE, MIP=NEXPTIME and NP=PCP(log n,1) require a verifier which runs in Omega(n) time. We give protocols that convince sublinear time verifiers of approximate lower bounds on the solution quality of fractional packing problems, a subclass of linear programming problems. We give other protocols which allow a prover to convince a polylogarithmic time verifier of the existence of a large cut or a large matching in a graph. Our techniques also apply to several other optimization problems such as bin packing, max flow and scheduling. The protocols are simple to describe, and the talk will be essentially self-contained.

Joint work with Funda Ergun (Bell Labs) and S. Ravi Kumar (IBM Almaden).

## 18 February 1999

Gates Building 498. 4:15 p.m.

## Valerie King (Victoria and Berkeley)

### Fully Dynamic Algorithms for Directed Graphs

A dynamic graph algorithm is a data structure which stores information about an incrementally changing graph and processes queries about a given property of the current graph. A fully'' dynamic graph algorithm permits update operations which change the graph by inserting or deleting an edge, and possibly inserting or deleting a vertex. A dynamic graph algorithm is efficient'' if it performs updates in time faster than that needed to compute the graph property from scratch. The study of dynamic graph problems on undirected graphs has met with much success, for a number of graph properties. Directed graph problems have proven to be much tougher and very little is known. In this talk, I'll try to give an overview of the techniques applicable to directed graphs, as well as explain my recent work with Garry Sagert on maintaining the transitive closure of a directed graph.

Email: val@csr.uvic.ca
WWW: http://csr.uvic.ca/~val/val.html
Related paper: STOC'99, available upon request.

## 4 March 1999

Gates Building 498. 4:15 p.m.

### Chinese Remaindering with Errors

The Chinese Remainder Theorem states that a positive integer m is uniquely specified by its remainder modulo k relatively prime integers p_1,...,p_k, provided m < p_1 x p_2 ... p_k. Thus the residues of m modulo relatively prime integers p_1 < p_2 < ... < p_n form a redundant representation of m if m < p_1 x ... x p_k and k < n. This suggests a number-theoretic construction of an error-correcting code'' that has been implicitly considered often in the past. Such a code may be of use in cryptographic settings as an alternate mechanism (to the polynomial-based scheme) for secret sharing. However to use such a code for sharing secrets in the presence of dishonest verifiers an efficient error-correcting algorithm is required (and was not known previously). In this talk we describe a polynomial-time algorithm for error-correction. Specifically, given n residues r_1,...,r_n and an agreement parameter t, we find a list of all integers m < \prod_{i=1}^k p_i such that (m mod p_i) = r_i for at least t values of i in {1,...,n}, provided t = Omega(sqrt{kn (log p_n)/(log p_1)}). (In English, t may be very small compared to n.) Joint work with Oded Goldreich and Dana Ron.

.

## 11 March 1999

Gates Building 498. 4:15 p.m.

### Trawling the web for cyber-communities: the Campfire project

The web harbors a large number of communities -- groups of content-creators sharing a common interest -- each of which manifests itself as a set of interlinked web pages. Newgroups and commercial web directories together contain of the order of 20000 such communities; our particular interest here is on emerging communities -- those that have little or no representation in such fora. The subject of this talk is the systematic enumeration of over 100,000 such emerging communities from a web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment. We also present a probabilistic model for the evolution of the web graph based on our experimental observations. We show that our algorithms run efficiently in this model, and use the model to explain several statistical phenomena on the web that emerged during our experiments.

This is joint work with Ravi Kumar, Prabhakar Raghavan and Andrew Tomkins.

## 25 March 1999

Gates Building 498. 4:15 p.m.

### Graph Nonisomorphism Has Subexponential Size Proofs Unless The Polynomial-Time Hierarchy Collapses

We establish hardness versus randomness trade-offs for a broad class of randomized procedures. In particular, we create efficient nondeterministic simulations of constant round interactive proofs under a complexity theoretic assumption. We show that every language with a bounded round Arthur-Merlin game has subexponential size membership proofs unless the polynomial-time hierarchy collapses. This provides the first strong evidence that graph nonisomorphism has subexponential size proofs.

We extend the pseudo-random generators of Nisan and Wigderson and Impagliazzo and Wigderson to create an "all purpose" derandomization technique applicable to any randomized procedure with the following property: for any fixed input, the complexity of checking whether the procedure succeeds on a given random bit sequence is not too high. To demonstrate the generality of our framework, we derandomize a variety of probabilistic complexity theoretic constructions (in each case assuming a sufficiently strong worst-case hardness assumption) including the Valiant-Vazirani random hashing technique, the construction of matrices with high rigidity, and universal traversal sequences.

Critical to our argument is the "black-box" nature of the proofs found in the construction of the pseudo-random generator of Nisan and Wigderson [NW94]. We show how the ideas from [NW94] and our work relate to the recent breakthrough extractor construction of Trevisan.

Join work with Dieter van Melkebeek.

Email: klivans@math.mit.edu
Related paper: ftp://ftp.eccc.uni-trier.de/pub/eccc/reports/1998/TR98-075/index.html

## 8 April 1999

Gates Building 498. 4:15 p.m.

## David Zuckerman (UT Austin and UC Berkeley)

### Collective Sampling and Leader Election in the Perfect Information Model

This talk concerns three related problems in distributed computing: leader election, collective sampling, and collective coin-flipping. In collective coin-flipping, n players wish to generate a random bit. The difficulty is that some subset of players collude to bias the output of the bit. In the perfect information model, all communication is by broadcast, and the bad players have unlimited computational power and may wait to see the inputs of the good players. Thus, for example, PARITY can be controlled by any 1 player, while in MAJORITY no coalition of O(sqrt n) players can force a particular output to occur with probability 1-o(1). In general, protocols may have many rounds: the players are synchronized between rounds. In leader election, the goal is to elect a good leader with probability bounded away from 0. We give a leader election protocol that is resilient against coalitions of size beta n, for any beta < 1/2, and takes log^* n + O(1) rounds. We will focus on the main ingredient of this protocol: a new collective sampling protocol. For a set S of large enough (polynomial) size, this protocol generates an element s \in S in a single round so that for any subset T of S,

Pr [s \in T] <= |T| |S|^{-alpha(1-beta)}

for a constant alpha > 0. This collective sampling protocol is not subsumed by Feige's recent leader election protocol.

We also give a lower bound for both collective coin-flipping and leader election in the case where each player can broadcast only 1 bit per round (and the good players bits are unbiased). In particular, we show that any protocol with linear resilience must take at least [1/2 - o(1)] log^* n rounds. The primary component of this result is a new bound on the influence of random sets of variables on Boolean functions.

This is joint work with Alex Russell and Mike Saks.

## 15 April 1999

Gates Building 498. 4:15 p.m.

## Lieven Vandersypen (Stanford U)

### Experimental Demonstration of Quantum Computing

Quantum computers can in principle exploit quantum-mechanical effects to perform computations (such as factoring large numbers or searching an unsorted database) more rapidly than classical computers. But loss of coherence, noise, and manufacturing problems make constructing large-scale quantum computers difficult. In this talk, I will describe how we built a simple quantum computer using nuclear spins as quantum bits and how the nuclear spins can be initialized, manipulated and read out using nuclear magnetic resonance techniques. I will then take Grover's algorithm as an example to show how an abstract algorithm can be translated into a sequence of physical operations and will present the results of recent experiments demonstrating quantum computing and quantum error detection.

Email: lieven@snowmass.stanford.edu
WWW: http://www-snow.stanford.edu/~lieven/
Related resources: http://squint.stanford.edu/

## 22 April 1999

Gates Building 498. 4:15 p.m.

### Determinism versus Non-determinism for Linear-time RAMs with memory restrictions

Our computational model is a random access machine with n read only input registers each containing c log n bits of information and a read and write memory. We measure the time by the number of accesses to the input registers. We show that for all k there is an epsilon> 0 so that if n is sufficiently large then the elements distinctness problem cannot be solved in time kn with epsilon n bits of read and write memory, that is, there is no machine with this values of the parameters which decides whether there are two different input registers whose contents are identical. We also show that there is a simple decision problem that can be solved in constant time (actually in two steps) using non-deterministic computation, while there is no deterministic linear time algorithm with epsilon nlog n bits read and write memory which solves the problem. More precisely if we allow kn time for some fixed constant k, then there is an epsilon>0 so that the problem cannot be solved with epsilon nlog n bits of read and write memory if n is sufficiently large. The decision problem is the following:

Find two different input registers, so that the Hamming distance of their contents is at most 1/4 clog n.

1/4 can be replaced by any fixed gamma less than 1/2 if c is sufficiently large with respect to gamma. We actually show that the promise problem :

Decide whether all occurring Hamming distances are greater than (1/2 -gamma)clog n or there is at least one which is smaller than gamma clog n

where gamma>0 is an arbitrarily small constant, cannot be solved by a nonlinear algorithm with the described limitations even if we know that we only get inputs where one of these conditions hold. (In this case epsilon may depend on gamma too).

## 29 April 1999

Gates Building 498. 4:15 p.m.

## Santosh Vempala (MIT and UC Berkeley)

### Fast Algorithms for Low-rank Approximation

We present randomized algorithms for approximating a given m x n matrix by a low-rank matrix. In general, low-rank approximations serve two purposes: they reduce space requirements (always), and they provide more transparent representations (often). Standard methods to compute low-rank approximations via the singular value decomposition take time that is polynomial in m and n. This can be prohibitively expensive for applications such as information retrieval. In contrast, the running time of our first algorithm depends only on the quality of the desired approximation and not on the size of the matrix, but it assumes that the entries of the matrix can be sampled in a specific manner. Our second algorithm needs no assumptions and has a running time that is linear in the number of non-zero entries.

We will then show how to apply the algorithms to text and image retrieval, VLSI design, clustering point sets and searching the web, along with some (preliminary) experimental results.

Joint work with Alan Frieze (CMU), Ravi Kannan, Petros Drineas (Yale), and V.Vinay (IISc).

## 13 May 1999

Gates Building 498. 4:15 p.m.

## Amir Ronen (Hebrew U)

### Algorithmic Mechanism Design

One interesting aspect of algorithms designed for the Internet environment is that the computers that are to participate belong to different users and thus will not necessarily follow the algorithm but may manipulate it according to their self interest. The algorithm designer thus should attempt to design the distributed algorithm in a way that will ensure that the participants' self interest is maximized by behaving correctly.

Following notions from game theory, micro-economics, and, in particular, from mechanism design, we suggest a framework for studying such algorithms. We also suggest extensions to the basic framework -- extensions that are motivated by the algorithmic domain. We apply the standard tools of mechanism design to algorithmic problems and observe their limitations.

We proceed by analyzing a representative problem, task scheduling, that can not be handled by the standard tools. We present both upper and lower bounds as well as theorems regarding several key issues including approximations, the power of randomness, and the implications of our model extensions.

Joint work with Noam Nisan

## 20 May 1999

Gates Building 498. 4:15 p.m.

## Soumen Chakrabarti (IIT Mumbai)

### Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery

The next generation of Web databases will answer questions that combine page contents, meta-data, and hyperlink structure in powerful ways, such as find links from a finance page to an environmental protection page made over the last year.'' A key problem in answering such queries is to discover web resources related to the topics involved in such queries. We demonstrate that for distributed hypertext, a keyword-based find similar'' search based on a giant all-purpose crawler is neither necessary nor adequate for resource discovery. Instead, we propose a system called a "Focused Crawler". The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of "topics". The topics are specified not using keywords, but using exemplary documents. Rather than collecting and indexing all accessible web documents to be able to answer all possible ad-hoc queries, a focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl, and avoids irrelevant regions of the web. This leads to significant savings in hardware and network resources, and helps keep the crawl more up-to-date.

To achieve such goal-directed crawling, we design two hypertext mining programs that guide our crawler: a "classifier" that evaluates the relevance of a hypertext document with respect to the focus topics, and a "distiller" that identifies hypertext nodes that are great access points to many relevant pages within a few links. We report on extensive focused-crawling experiments using several topics at different levels of specificity. Focused crawling acquires relevant pages steadily while standard crawling quickly loses its way, even though they are started from the same root set. Focused crawling is robust against large perturbations in the starting set of URLs. It discovers largely overlapping sets of resources in spite of these perturbations. It is also capable of exploring out and discovering valuable resources that are dozens of links away from the start set, while carefully pruning the millions of pages that may lie within this same radius. Our anecdotes suggest that focused crawling is very effective for building high-quality collections of web documents on specific topics, using modest desktop hardware.

## 27 May 1999

Gates Building 498. 4:15 p.m.

## Edith Cohen (AT&T)

### Connection caching

Communication between clients and servers in the Web is performed by sending HTTP (HyperText Transfer Protocol) messages using TCP (Transmission Control Protocol) connections. When an HTTP request is issued and there is no already-established connection, a new connection is established. The emerging HTTP/1.1 protocol supports connection re-use and states that TCP connections remain open idle in anticipation of future requests, until explicitly closed. Fewer connection-establishments reduce latency and overhead. The number of open connections a host can support, however, is limited. This tradeoff pose connection-management as a caching problem.

A fundamental difference between connection-caching and standard caching is that connection caching involves multiple interacting caches: Each open connection is cached by two hosts, and may be terminated by either; the other node is forced to accept closing decision. Policies may also assume different levels of inter-host communication.

We propose and study a theoretical model for connection caching. While the ordinary off-line caching problem is easily solved in polynomial time, we show that the off-line connection caching problem is NP-complete, and in fact hard to approximate. Single-cache replacement policies can be cast as connection-caching policies when implemented locally at each node. We provide a tight 2-approximation bound for a distributed Belady's algorithm (which is optimal for a single cache). We show that any (on-line) c-competitive single-cache policy can be converted into a distributed (on-line) 2c-competitive connection-caching algorithm. We provide a tight 2k-1 bound for the competitiveness of connection-caching LRU (single-cache LRU is k-competitive), where k is the size of the largest cache in the network. Unlike the situation in a single-cache, LRU is not the best deterministic algorithm: we construct a family of k-competitive connection-caching policies. We also discuss variants of the model allowing different levels of inter-host communication.

This is joint work with Haim Kaplan and Uri Zwick.