SAS is the Stanford Algorithms Seminar (formerly the AFLB -- Algorithms for Lunch Bunch). It is run by Mayur Datar and Kamesh Munagala. Meetings are generally Thursdays at 4:15 p.m.

Talks are held in the Gates Building, near the Main Quad of Stanford's campus. Click here for directions.

## Contents:

1. 7 Oct 1999: Uri Zwick (Tel Aviv University). All Pairs Shortest Paths in Undirected Graphs.
2. 9 Dec 1999: Marek Karpinski (University of Bonn). Sorting by Reversals is Hard to Approximate Within Certain Constant.
3. 6 Jan 2000: Matthew Andrews (Bell Labs). The effects of connection duration on network performance.
4. 7 Jan 2000 (Friday !): Chandra Chekuri (Bell Labs). A PTAS for the Multiple Knapsack Problem.
5. 13 Jan 2000 : Eran Halperin (Tel Aviv Univ.). Approximation algorithms for the vertex cover problem in graphs and hypergraphs.
6. 20 Jan 2000 : Martin Strauss (AT&T Research). Testing and Spot-Checking of Data Streams.
7. 3 Feb 2000 : Andrew Goldberg (Intertrust STAR Labs). Gomory-Hu Algorithms: an Experimental Study.
8. 17 Feb 2000 : Ron Fagin (IBM Almaden). Random Walks with "Back Buttons".
9. 24 Feb 2000 : Andris Ambainis (UC Berkeley). Quantum lower bounds by quantum arguments.
10. 2 Mar 2000 : Sridhar Rajagopalan (IBM Almaden). The web graph: structure and interpretation.
11. 9 Mar 2000 : Anupam Gupta (UC Berkeley). Constant Factor Approximation Algorithms for a Class of Classification Problems
12. 6 Apr 2000 : S. Cenk Sahinalp (Case Western). Approximate Sequence Nearest Neighbours with Block Operations.
13. 13 Apr 2000 : Cynthia Dwork (IBM Almaden). Zaps and Apps.
14. 20 Apr 2000 : T. C. Hu (UC San Diego). Local Indexing Algorithms.
15. 4 May 2000 : Mark Huber (Dept of Statistics, Stanford). A new approach to perfect sampling from nasty distributions.
16. 11 May 2000 : Alon Rosen (Weizmann). On the round-complexity of concurrent zero-knowledge.
17. 18 May 2000 : Robert Krauthgamer (Weizmann) . Approximation algorithms for minimum bisection.
18. 25 May 2000 : Sampath Kannnan (AT&T and Univ of Pennsylvania) . New Instability Results for Contention Resolution Protocols.
19. 8 June 2000 : Dharmendra S. Modha (IBM Almaden) . Clustering Hypertext with Applications to Web Searching.

## 7 October 1999

Gates Building 498. 4:15 p.m.

## Uri Zwick (Tel Aviv University)

### All Pairs Shortest Paths in Undirected Graphs

We show that the ALL PAIRS SHORTEST PATHS (APSP) problem for UNDIRECTED graphs with INTEGER edge weights taken from the range {1,2,...,M} can be solved using only a logarithmic number of DISTANCE PRODUCTS (also known as min/plus products) of matrices with elements in the range {1,2,...,M}. As a result, we get an algorithm for the APSP problem in such graphs that runs in about M*n^omega time, where n is the number of vertices in the input graph, M is the largest edge weight in the graph, and omega<2.376 is the exponent of matrix multiplication. This improves, and also simplifies, an M^{(omega+1)/2}*n^omega time algorithm of Galil and Margalit. The result may also be seen as an extension of Seidel's algorithm for the unweighted version of the problem.
Joint work with Avi Shoshan

## 9 December 1999

Gates Building 498. 4:15 p.m.

## Marek Karpinski (University of Bonn)

### Sorting by Reversals is Hard to Approximate Within Certain Constant

We prove that the problem MIN-SBR of sorting a permutation by the minimum number of reversals is hard to approximate (NP-hard by randomized reductions) within any constant factor less than some explicit threshold. This excludes an existence of a PTAS for this problem under usual assumptions, thus settling a question which was open for some time. The proof method uses certain new explicit approximation hardness techniques for bounded dependency, and bounded degree optimization problems. The MIN-SBR problem has been motivated and extensively studied in computational molecular biology, but existence of PTASs remained an open issue. This problem connects also to the well known problem of sorting by prefix reversals (or, so called, pancake sorting). Our nonapproximability result for MIN-SBR is also in sharp contrast to its signed version for which efficient exact algorithms have been designed recently.

(Joint work with P.Berman.)

## 6 January 2000

Gates Building 498. 4:15 p.m.

## Matthew Andrews (Bell Labs)

### The effects of connection duration on network performance

In this talk we examine the dependence of network performance on the duration of the connections. If the connections have a short duration then the paths can change rapidly, thereby making it more difficult to achieve good performance. We begin by showing that the FIFO scheduling discipline can create network instability even if the connections are fixed (and hence have infinite duration). This extends an earlier result which states that FIFO can be unstable if the connection durations are short. We then consider the Generalized Processor Sharing (GPS) discipline and show that it can be unstable if the connection durations are short. This is in contrast to the fixed connection case where GPS is known to be stable and have good delay properties. For the case of short durations, the best known delay bounds are exponential in the maximum path length of a connection. We show that in fact, the standard method for deriving delay bounds can only give exponential bounds in this case. We also show that a large class of scheduling disciplines can produce exponential delays.

This work is joint with Lisa Zhang and will be presented at SODA '00.

## 7 January 2000

Gates Building 498. 4:15 p.m.

## Chandra Chekuri (Bell Labs)

### A PTAS for the Multiple Knapsack Problem

The Multiple Knapsack problem (MKP) is a natural and well known generalization of the single knapsack problem and is defined as follows. We are given a set of items and bins (knapsacks) such that each item has a profit and a size associated with it, and each bin has its own capacity constraint. The goal is to pack items into the bins so as to maximize the profit of packed items. MKP is a special case of the Generalized Assignment problem (GAP) where the profit and the size of an item can vary based on the bin that it is assigned to. GAP is APX-hard and a $2$-approximation for it is implicit in the work of Shmoys and Tardos. Thus far, this was also the best known approximation for MKP. We will present a polynomial time approximation scheme (PTAS) for MKP. The main technical idea underlying our scheme is an approximation-preserving reduction from a general instance of MKP with $n$ itmes to an instance with $n$ itmes and $O(\log n)$ distinct sizes and profits. We also discuss the complexity of simple generalizations of MKP which turn out to be APX-hard.

Joint work with Sanjeev Khanna.

## 13 January 2000

Gates Building 498. 4:15 p.m.

## Eran Halperin (Tel Aviv Univ)

### Approximation algorithms for the vertex cover problem in graphs and hypergraphs.

We present improved algorithms for finding small vertex covers in bounded degree graphs and hyprgraphs. We use semidefinite programming to relax the problems, and introduce new rounding techniques for these relaxations. On graphs with maximum degree at most \Delta, the algorithm achieves a performance ratio of roughly 2 - 2 \ln\ln \Delta / \ln \Delta for large \Delta, which improves the previously known ratio of 2 - (\ln \Delta + O(1)) / \Delta obtained by Halldorsson and Radhakrishnan. Using similar techniques, we also present improved approximations for the vertex cover problem in hypergraphs. For k-uniform hypergraphs with n vertices, we achieve a ratio of roughly k - (k-1)\ln\ln n / \ln n for large n, and for k-uniform hypergraphs with maximum degree at most \Delta, the algorithm achieves a ratio of roughly k - k(k-1)\ln\ln \Delta / \ln \Delta for large \Delta.

## 20 January 2000

Gates Building 498. 4:15 p.m.

## Martin Strauss (AT&T Research)

### Testing and Spot-Checking of Data Streams.

We consider the tasks of *testing* and *spot-checking* for *data streams*. These testers and spot-checkers are potentially useful in real-time or near real-time applications that process huge datasets. Crucial aspects of the computational model include the space complexity of the testers and spot-checkers (ideally much lower than the size of the input stream) and the number of passes that the tester or spot-checker must make over the input stream (ideally one, because the original stream may be too large to store for a second pass). A sampling-tester [Goldreich-Goldwasser-Ron] for a property P samples some (but usually not all) of its input and, with high probability, outputs PASS if the input has property P and FAIL if the input is *far* from having P, for an appropriate sense of "far." A streaming-tester for a property P of one or more input streams takes as input one or more data streams and, with high probability, outputs PASS if the streams have property P and FAIL if the streams are far from having P. A sampling-tester can make its samples in any order; a streaming-tester sees the input from left to right. We consider the *groupedness property* (a natural relaxation of the sortedness property). We also revisit the sortedness property, first considered in [Ergun-Kannan-Kumar-Rubinfeld-Viswanathan] in the context of sampling spot-checkers, and the property of detecting whether one stream is a permutation of another (either directly or via the SORTED-SUPERSET property, a technical property that is equivalent to PERMUTATION under some conditions). We show that there are properties efficiently testable by a streaming-tester but not by a sampling-tester and other (promise) problems for which the reverse is true.

## 3 February 2000

Gates Building 498. 4:15 p.m.

## Andrew Goldberg (Intertrust STAR Labs)

### Gomory-Hu Algorithms: an Experimental Study.

Gomory-Hu trees, which represent all pair-wise cuts in an undirected graph, have important applications, in particular in reliability theory. In theory, the best algorithms for the problem require n-1 minimum s-t cut computations, where n is the number of vertices in the graph. We conduct an experimental study of two algorithms for the problem, one due to Gomory and Hu and another one due to Gusfield. We show that although obtaining an efficient implementation of the former algorithm is not easy, such an implementation is more robust than a good implementation of Gusfield's algorithm. We also introduce heuristics which in further improve robustness of the Gomory-Hu algorithm.

This is joint work with Kostas Tsioutsiouliklis (Princeton University)

## 17 February 2000

Gates Building 498. 4:15 p.m.

### Random Walks with "Back Buttons"

We introduce "backoff processes", an idealized stochastic model of browsing on the world-wide web, which incorporates both hyperlink traversals and use of the "back button." With some probability the next state is generated by a distribution over out-edges from the current state, as in a traditional Markov chain. With the remaining probability, however, the next state is generated by clicking on the back button, and returning to the state from which the current state was entered by a "forward move". Repeated clicks on the back button require access to increasingly distant history. This process has fascinating similarities to and differences from Markov chains. In particular, like Markov chains, backoff processes always have a limit distribution, and there is a polynomial-time algorithm to compute this distribution. Unlike Markov chains, the limit distribution may depend on the start state.

This talk represents joint research with Anna Karlin, Jon Kleinberg, Prabhakar Raghavan, Sridhar Rajagopalan, Ronitt Rubinfeld, Madhu Sudan, and Andrew Tomkins. The talk will be self-contained.

## 24 February 2000

Gates Building 498. 4:15 p.m.

## Andris Ambainis (UC Berkeley)

### Quantum lower bounds by quantum arguments

Almost all known quantum algorithms can be analyzed in a query model where the algorithm accesses input by querying it in a specific position and the complexity of the algorithm is measured by the number of queries that it makes. The most famous such algorithm is Grover's algorithm for searching an unordered list of $n$ elements in $O(\sqrt{n})$ time. Another example is period-finding which is the basis of Shor's factoring algorithm. In my talk, I will show a new method for proving lower bounds on quantum query algorithms. Traditional methods use a classical adversary that runs the algorithm with one input and then modifies the input. We use a quantum adversary that runs the input with a superposition of inputs. Using this method, we give a new proof that Grover's algorithm is optimal and show $\Omega(\sqrt{n})$ lower bounds for 2 other problems (AND of ORs and inverting a permutation) for which only weaker bounds have been known before. Besides giving better bounds, our method also allows to treat all of these problems in a uniform way. (The previous results were proven via variety of different techniques.)

## 2 March 2000

Gates Building 498. 4:15 p.m.

### The web graph: structure and interpretation.

The study of the web as a graph is not only fascinating in its own right, but also yields valuable insight into web algorithms for crawling, searching and community discovery, and the sociological phenomena which characterize its evolution. We report on experiments on local and global properties of the web graph using two crawls each with over 200M pages and 1.5 billion links. Our study indicates that the macroscopic structure of the web is considerably more intricate than suggested by earlier experiments on a smaller scale.

Collaborators: Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Raymie Stata, Andrew Tomkins, Eli Upfal and Janet Wiener.

## 9 March 2000

Gates Building 498. 4:15 p.m.

## Anupam Gupta (UC Berkeley)

### Constant Factor Approximation Algorithms for a Class of Classification Problems

n a traditional classification problem, we wish to assign labels from a set $L$ to each of $n$ objects so that the labeling is consistent with some observed data that includes pairwise relationships among the objects. Kleinberg and Tardos recently formulated a general classification problem of this type, the metric labeling problem'', and gave an $O(\log |L| \log \log |L|)$ approximation algorithm for it based on rounding the solution of a linear program.

We consider an important case of the metric labeling problem, in which the metric is the truncated linear metric. This is a natural non-uniform and robust metric which arises in a number of applications, especially in vision. We give a combinatorial 4-approximation algorithm for this metric. Our algorithm is a natural local search algorithm, where the local steps are based on minimum cut computations in an appropriately constructed graph. Our method extends previous work by Boykov, Veksler and Zabih on more restricted classes of metrics.

Joint work with Eva Tardos.

## 6 April 2000

Gates Building 498. 4:15 p.m.

## S. Cenk Sahinalp (Case Western)

### Approximate Sequence Nearest Neighbours with Block Operations.

We study sequence nearest neighbors problem. Let D be a database of n sequences; we would like to preprocess D so that given any on-line query sequence Q we can quickly find a sequence S in D for which d(S,Q) < d(S,T) for any other sequence T in D. Here d(S,Q) denotes the distance between sequences S and Q, and is defined to be the minimum number of edit operations needed to transform one to another (all edit operations will be reversible so that d(S,T) = d(T,S) for any two sequences T and S). These operations correspond to the notion of similarity between sequences in intended application. Such edit operations include character edits (inserts, replacements, deletes etc), block edits (moves, copies, deletes, reversals) and block numerical transformations (scaling by an additive or a multiplicative constant). We present the first known efficient algorithm for approximate'' nearest neighbor search for sequences with preprocessing time and space polynomial in size of D and query time near-linear in size of Q. We assume the distance d(S,T) between two sequences S and T is the minimum number of character edits and block operations needed to transform one to the other; the approximation factor we achieve is O(log l log*^2 l), where l is the size of the longest sequence in D.

Joint work with S. Muthukrishnan.

## 13 April 2000

Gates Building 498. 4:15 p.m.

### Zaps and Apps.

An interactive proof system is witness-independent if it is infeasible to determine which of two or more witnesses is used in performing the proof. This talk introduces the notion of a zap: a two-round (one message from the verifier and a response from the prover) witness-indistinguishable protocol, where the verifier's message can be fixed once-and-for-all" and applied to any instance. We present a zap for every language in NP, and discuss some surprising applications of zaps.

Joint work with M. Naor.

## 20 April 2000

Gates Building 498. 4:15 p.m.

## T. C. Hu (UC San Diego)

### Local Indexing Algorithms.

Many graph and network algorithms,such as DFS, BFS, shortest path, minimum spanning tree, maximum adjacency (Nagamochi & Ibaraki) can be unified into a general framework. We can also find hybrid algorithms. We will discuss results on getting minimum cuts in a network directly.

## 4 May 2000

Gates Building 498. 4:15 p.m.

## Mark Huber (Dept of Statistics, Stanford)

### A new approach to perfect sampling from nasty distributions.

The problem of how to generate samples from high dimensional distributions has applications in a wide variety of areas, from statistical mechanics to selforganizing lists. Commonly, Monte Carlo Markov chain techniques are used, where a Markov chain is run "for a long time". Our approach to these problems is fundamentally different, and requires no knowledge of how long to run the chain. Unlike other perfect sampling techniques, our method is not bound to the classic Markov chains for these problems, and so is the first to acheive linear time algorithms for sampling from these distributions.

## 11 May 2000

Gates Building 498. 4:15 p.m.

## Alon Rosen (Weizmann)

### On the round-complexity of concurrent zero-knowledge

Designing protocol that retain their zero-knowledge properties when executed concurrently is a challenging problem. We present a lower bound on the number of rounds required by Concurrent Zero-Knowledge proofs for languages in NP. It is shown that at least eight rounds of interaction are essential for black-box simulation of non-trivial proof systems (i.e. for languages that are not in BPP). This improves previously known lower bounds, and rules out several candidates for constant-round Concurrent Zero-Knowledge. In particular, we investigate the Richardson-Kilian protocol (which is the only protocol known to be Concurrent Zero-Knowledge in the vanilla model), and show that for an apparently natural choice of its main parameter (which yields a 9-round protocol), the protocol is not likely to become constant-round Concurrent Zero-Knowledge.

## 18 May 2000

Gates Building 498. 4:15 p.m.

## Robert Krauthgamer (Weizmann)

### Approximation algorithms for minimum bisection

A bisection of a graph with $n$ vertices is a partition of its vertices into two sets, each of size $n/2$. The bisection size is the number of edges connecting the two sets. Finding a bisection of minimum size is NP-hard. We devise approximation algorithms for the minimum bisection problem. One algorithm (joint work with Uri Feige and Kobbi Nissim [STOC 2000]) finds a bisection that is within roughly $\sqrt{n}$ of optimal. A more involved algorithm (joint work with Uri Feige [manuscript]) achieves a better approximation ratio which is polylogarithmic in $n$. For planar graphs, we further improve the approximation ratio to $O(\log n)$. No sublinear (in $n$) approximation ratio was previously known for bisection.

## 25 May 2000

Gates Building 498. 4:15 p.m.

## Sampath Kannnan (AT&T and University of Pennsylvania)

### New Instability Results for Contention Resolution Protocols

Contention resolution is the problem of designing protocols for use in a multiple-access channel such as the ethernet. In such a channel, a message gets through as long as it is the only one attempting to use the channel at a particular time. The goal of protocols is some form of stability, for example, the requirement that the channel clears periodically. Commonly studied classes of protocols include acknowledgement-based protocols and the more specialized backoff protocols. In this talk we will show that all backoff protocols are unstable if messages arrive at a rate higher than .42 and all acknowledgement-based protocols are unstable if the arrival rate exceeds .53.

Joint work with Leslie Goldberg, Mike Paterson (Univ. of Warwick) and Mark Jerrum (Univ. of Edinburgh)

## 8 June 2000

Gates Building 498. 4:15 p.m.

## Dharmendra S. Modha (IBM Almaden)

### Clustering Hypertext with Applications to Web Searching

Clustering separates unrelated documents and groups related documents, and is useful for discrimination, disambiguation, summarization, organization, and navigation of unstructured collections of hypertext documents. We propose a novel clustering algorithm that clusters hypertext documents using words (contained in the document), out-links (from the document), and in-links (to the document). The algorithm automatically determines the relative importance of words, out-links, and in-links for a given collection of hypertext documents. We annotate each cluster using six information nuggets: summary, breakthrough, review, keywords, citation, and reference. We employ web searching as an application to illustrate our results. Anecdotally, when applied to the documents returned by AltaVista in responses to the query abduction, our algorithm separates documents about alien abduction'' from those about child abduction.''

This is joint work with W. Scott Spangler and will appear in the ACM Hypertext 2000 Conference.