Stanford Theory Lunch

Stanford Theory Lunch Speaker List
Spring 2004

Thursdays, 12:15pm, in the Theory Lounge (wing 4B in the Gates Computer Science building)

Schedule

Apr 1	Benny Pinkas, HP Labs, Princeton Efficient Private Matching and Set Intersection
Apr 8	T. C. Hu, University of California, San Diego Shortest paths and the Knapsack problem
Apr 15	Gagan Aggarwal Secure computation of the k-th ranked element
Apr 22	Zoe Abrams Set K-Cover Algorithms for Energy Efficient Monitoring in Wireless Sensor Networks
Apr 29	Sriram Sankaranarayanan Generating Invariants for Petri Nets
May 6	Mihaela Enachescu Scale Free Aggregation in Sensor Networks
May 13	Aaron Bradley Analysis of Metabolic Pathways
May 20	Sergei Vassilvitskii Distinct Value Estimation
May 27	Shubha Nabar Symmetric vs. Asymmetric Multiple-Choice Algorithms
Jun 4	Sanatan Rai Talagrand's inequality and stochastic combinatorial optimisation problems

Abstracts

Efficient Private Matching and Set Intersection

Benny Pinkas, HP Labs, Princeton

We consider the problem of computing the intersection of private datasets of two or more parties, where the datasets contain lists of elements taken from a large domain. The computation must be private in the sense that the parties do not learn any information in addition to the intersection itself.

We present protocols, based on the use of homomorphic encryption and balanced hashing, for both semi-honest and malicious environments. For lists of length $k$, we obtain $O(k)$ communication overhead and $O(k\ln\ln k)$ computation. The protocol for the semi-honest environment is secure in the standard model, while the protocol for the malicious environment is secure in the random oracle model.

We also consider the problem of approximating the size of the intersection, show a linear lower-bound for the communication overhead of solving this problem, and provide a suitable secure protocol. Lastly, we briefly discuss some other variants of the matching problem.

Shortest paths and the Knapsack problem

T.C. Hu, University of California, San Diego

The knapsack problem is a NP-complete number problem and the shortest path algorithm of Dijkstra is polynomial. We introduce a new algorithm like the shortest paths to solve all instances of b of the knapsack(all possible capacities of the knaspack).

The time complexity is polynomial on n(the no. of items) and on the weight of the best item,indepdent of b.

Secure computation of the k-th ranked element

Gagan Aggarwal

The problem of secure computation is defined as follows: several parties with private inputs wish to compute a function of their joint inputs, and require that the process of computing the function does not reveal to an adversarial party (or a coalition of such parties) any information that cannot be computed using the input of the adversary and the output of the function. There exist well known solutions for secure computation of any function. The general method employed by these solutions is to construct a combinatorial circuit that computes the required function, and then run a distributed protocol that securely evaluates the circuit. These generic solutions tend to be inefficient for many real-world problems.

In this talk, we focus on a specific problem: Given two or more parties possessing large, confidential datasets, we consider the problem of securely computing the k-th ranked element of the union of the datasets, e.g. the median of the values in the datasets. We investigate protocols with sublinear computation and communication costs. In the two-party case, we show that the k-th ranked element can be computed in log k rounds, where the computation and communication costs of each round are O(log M), where log M is the number of bits needed to represent each element of the input data. The protocol can be made secure against a malicious adversary, and can hide the sizes of the original datasets. In the multi-party setting, we show that the k-th ranked element can be computed in log M rounds, with O(s log M) overhead per round, where s is the number of parties. The multi-party protocol can be used in the two-party case and can also be made secure against a malicious adversary.

This is joint work with Nina Mishra and Benny Pinkas at HP Labs.

Set K-Cover Algorithms for Energy Efficient Monitoring in Wireless Sensor Networks

Zoe Abrams

We present a strategy for energy efficient monitoring in WSNs that partitions the sensors into covers, and then activates the covers iteratively in a round-robin fashion. This approach takes advantage of the overlap created when many sensors monitor a single area. Our work builds upon previous work by Slijepcevic and Potkonjak, where the model is first formulated. We have designed three approximation algorithms for a variation of the SET K-COVER problem, where the objective is to partition the sensors into covers such that the number of covers that include an area, summed over all areas, is maximized. The first algorithm is randomized and partitions the sensors, in expectation, within a fraction 1 - 1/e (~.63) of the optimum. We present two other deterministic approximation algorithms. One is a distributed greedy algorithm with a 1/2 approximation ratio and the other is a centralized greedy algorithm with a 1 - 1/e approximation ratio.

We show that it is NP-Complete to guarantee better than 15/16 of the optimal coverage, indicating that all three algorithms perform well with respect to the best approximation algorithm possible in polynomial time, assuming P not equal NP. Simulations indicate that in practice, the deterministic algorithms perform far above their worst case bounds, consistently covering more than 72% of what is covered by an optimum solution. Simulations also indicate that the increase in longevity is proportional to the amount of overlap amongst the sensors. The algorithms are fast, easy to use, and according to simulations, significantly increase the longevity of sensor networks. The randomized algorithm in particular seems quite practical.

This is joint work with Ashish Goel and Serge Plotkin.

Generating Invariants for Petri Nets

Sriram Sankaranarayanan

Petri nets are useful for modelling concurrent systems. They have been used widely in the modelling of a variety of protocols, complex manufacturing systems, metabolic networks, gene regulatory nets and so on. The analysis of Petri nets seeks to discover key properties like boundedness, reachability, deadlock-freedom, liveness and so on. The classical theory of Petri nets provides answers if the Petri net satisfies some special structural properties. Many analysis problems are undecidable/intractable for "general" Petri nets. In this talk, I will describe a new technique for generating linear invariants for general Petri nets using techniques from linear programming. I will also present some exciting applications to the analysis of biochemical reaction mechanisms.

This is joint work with Henny Sipma and Zohar Manna.

Scale Free Aggregation in Sensor Networks

Mihaela Enachescu

Sensor networks are distributed data collection systems, frequently used for monitoring environments in which ``nearby'' data has a high degree of correlation. This induces opportunities for data aggregation, that are crucial given the severe energy constraints of the sensors. Thus it is very desirable to take advantage of data correlations in order to avoid transmitting redundancy. In our model, we formalize a notion of correlation, that can vary according to a parameter $k$.

We propose a very simple randomized algorithm for routing information on a grid of sensors in a way which promotes data aggregation. We prove that this simple scheme is a constant factor approximation (in expectation) to the optimum aggregation tree {\em simultaneously} for all correlation parameters $k$.

The key idea of our randomized analysis is to relate the expected collision time of random walks on the grid to scale free aggregation.

This is joint work with Ashish Goel, Ramesh Govindan, and Rajeev Motwani.

Analysis of Metabolic Pathways

Aaron Bradley

Computer scientists and control engineers view metabolic pathways as complex nonlinear transition systems. In principle, this perspective facilitates the answering of queries, for example, about reachable or unreachable spaces, about equilibrium points, and about behavior under perturbation. However, the nonlinearity of the rates of reactions precludes direct static analysis for all but the most trivial of systems. Thus, most prior work has analyzed pathways via simulation.

One initial approach to static analysis ignores rate information, modeling just the reactions as a Petri net. This approach yields biologicially relevant invariants; however, these invariants are weak. We propose a linear abstraction of metabolic pathways called a reaction network, which incorporates rate information via concentration partitioning. To address uncertainty in rate or concentration data, the abstraction allows intervals in the system description. Additionally, we present an invariance analysis of reaction networks based on polyhedra.

In the talk, we will discuss the simulation, Petri net, and reaction network analyses in the context of several metabolic pathways. Concurrently, billions of such pathways will be busily working in the audience.

Distinct Value Estimation (Theory & some Practice)

Sergei Vassilvitskii

The number of distinct elements in a dataset plays an important role in query optimization, particularly in large databases; and a correct answer can speed up a query by orders of magnitude. The problem has a rich history and dozens of different heuristic estimators have been proposed. In the first analytical paper on the problem Charikar, Chaudhuri, Motwani & Narasayya showed that the problem is hard and gave a strong lower bound together with a matching algorithm. We make one powerful (yet very reasonable) relaxation by assuming that the values in the dataset follow a Zipfian distribution. This assumption allows us to break the lower bound, and give an estimator with much better error guarantees. Finally, we test the new estimator on several synthetic and real world datasets to see its performance in practice.

This is join work with Rajeev Motwani.

Symmetric vs. Asymmetric Multiple-Choice Algorithms

Shubha Nabar

Abstract from the paper on "Symmetric vs. Asymmetric Multiple-Choice Algorithms" by Berthold Voecking which will be presented:

Multiple-choice allocation algorithms have been studied intensively over the last decade. These algorithms have several applications in areas of load balancing, routing, resource allocation and hashing. The underlying idea is simple and can be explained best in the balls-and-bins model: Instead of assigning balls (jobs, requests or keys) simply at random to bins (machines, servers, or positions in a hash table), choose first a small set of bins at random, inspect these bins and place the ball in to one of the bins containing the smallest number of balls among them.

The simple idea of first selecting a small set of alternatives at random and then making the final choice after careful inspection of these alternatives leads to great improvements against algorithms that place their decisions simply at random. We illustrate the power of this principle in terms of simple balls-and-bins processes. In particular, we study recently presented algorithms that treat bins asymmetrically in order to obtain a better load balancing. We compare the behavior of these asymmetric schemes with symmetrics schemes and prove that asymmetric schemes achieve a better load balancing than their symmetric counterparts.

Talagrand's inequality and stochastic combinatorial optimisation problems

Sanatan Rai

Consider $n$ points distributed iid randomly uniformly in the unit cube. Form the complete graph on these points, with edge weights equal to the Euclidean distance between the terminal vertices. It is well known that the optimal travelling salesman tour converges to median value in probability as $n$ grows large. Talagrand's inequality is an example of a concentration of measure inequality that is very useful for proving such results. In this talk we shall describe Talagrand's inequality and its applications to the TSP and MST problems on $n$ randomly distributed points.

(archive of past announcements)