Efficient Private Matching and Set Intersection
Benny Pinkas, HP Labs, Princeton
We consider the problem of computing the intersection
of private datasets of two or more parties, where the datasets contain lists of elements
taken from a large domain. The computation must be private in the sense that the parties do
not learn any information in addition to the intersection itself.
We present protocols, based on the use of homomorphic encryption and balanced hashing,
for both semi-honest and malicious environments. For lists of length $k$, we obtain $O(k)$
communication overhead and $O(k\ln\ln k)$ computation. The protocol for the semi-honest
environment is secure in the standard model, while the protocol for the malicious
environment is secure in the random oracle model.
We also consider the problem of approximating the size of the intersection, show a
linear lower-bound for the communication overhead of solving this problem, and provide a
suitable secure protocol. Lastly, we briefly discuss some other variants of the matching
problem.
|
Shortest paths and the Knapsack problem
T.C. Hu, University of California, San Diego
The knapsack problem is a NP-complete number problem and the shortest path algorithm of
Dijkstra is polynomial. We introduce a new algorithm like the shortest paths to solve all
instances of b of the knapsack(all possible capacities of the knaspack).
The time complexity is polynomial on n(the no. of items) and on the weight of the best
item,indepdent of b.
|
Secure computation of the k-th ranked element
Gagan Aggarwal
The problem of secure computation is defined as follows: several parties with private
inputs wish to compute a function of their joint inputs, and require that the process of
computing the function does not reveal to an adversarial party (or a coalition of such
parties) any information that cannot be computed using the input of the adversary and the
output of the function. There exist well known solutions for secure computation of any
function. The general method employed by these solutions is to construct a combinatorial
circuit that computes the required function, and then run a distributed protocol that
securely evaluates the circuit. These generic solutions tend to be inefficient for many
real-world problems.
In this talk, we focus on a specific problem: Given two or more parties possessing
large, confidential datasets, we consider the problem of securely computing the k-th ranked
element of the union of the datasets, e.g. the median of the values in the datasets. We
investigate protocols with sublinear computation and communication costs. In the two-party
case, we show that the k-th ranked element can be computed in log k rounds, where the
computation and communication costs of each round are O(log M), where log M is the number
of bits needed to represent each element of the input data. The protocol can be made secure
against a malicious adversary, and can hide the sizes of the original datasets. In the
multi-party setting, we show that the k-th ranked element can be computed in log M rounds,
with O(s log M) overhead per round, where s is the number of parties. The multi-party
protocol can be used in the two-party case and can also be made secure against a malicious
adversary.
This is joint work with Nina Mishra and Benny Pinkas at HP Labs.
|
Set K-Cover Algorithms for Energy Efficient Monitoring in
Wireless Sensor Networks
Zoe Abrams
We present a strategy for energy efficient monitoring in WSNs that partitions the
sensors into covers, and then activates the covers iteratively in a round-robin fashion.
This approach takes advantage of the overlap created when many sensors monitor a single
area. Our work builds upon previous work by Slijepcevic and Potkonjak, where the model is
first formulated. We have designed three approximation algorithms for a variation of the
SET K-COVER problem, where the objective is to partition the sensors into covers such that
the number of covers that include an area, summed over all areas, is maximized. The first
algorithm is randomized and partitions the sensors, in expectation, within a fraction 1 -
1/e (~.63) of the optimum. We present two other deterministic approximation algorithms. One
is a distributed greedy algorithm with a 1/2 approximation ratio and the other is a
centralized greedy algorithm with a 1 - 1/e approximation ratio. We show that it is
NP-Complete to guarantee better than 15/16 of the optimal coverage, indicating that all
three algorithms perform well with respect to the best approximation algorithm possible in
polynomial time, assuming P not equal NP. Simulations indicate that in practice, the
deterministic algorithms perform far above their worst case bounds, consistently covering
more than 72% of what is covered by an optimum solution. Simulations also indicate that the
increase in longevity is proportional to the amount of overlap amongst the sensors. The
algorithms are fast, easy to use, and according to simulations, significantly increase the
longevity of sensor networks. The randomized algorithm in particular seems quite
practical.
This is joint work with Ashish Goel and Serge Plotkin.
|
Generating Invariants for Petri Nets
Sriram Sankaranarayanan
Petri nets are useful for modelling concurrent systems. They have been
used widely in the modelling of a variety of protocols, complex
manufacturing systems, metabolic networks, gene regulatory nets and so
on. The analysis of Petri nets seeks to discover key properties like
boundedness, reachability, deadlock-freedom, liveness and so on. The
classical theory of Petri nets provides answers if the Petri net
satisfies some special structural properties. Many analysis problems
are undecidable/intractable for "general" Petri nets. In this talk, I
will describe a new technique for generating linear invariants for
general Petri nets using techniques from linear programming. I will
also present some exciting applications to the analysis of biochemical
reaction mechanisms.
This is joint work with Henny Sipma and Zohar Manna.
|
Scale Free Aggregation in Sensor
Networks
Mihaela Enachescu
Sensor networks are distributed data collection systems,
frequently used for monitoring environments in which
``nearby'' data has a high degree of correlation. This
induces opportunities for data aggregation, that are
crucial given the severe energy constraints of the
sensors. Thus it is very desirable to take advantage of
data correlations in order to avoid transmitting
redundancy. In our model, we formalize a notion of
correlation, that can vary according to a parameter $k$.
We propose a very simple randomized algorithm for routing
information on a grid of sensors in a way which promotes
data aggregation. We prove that this simple scheme is a
constant factor approximation (in expectation) to the
optimum aggregation tree {\em simultaneously} for all
correlation parameters $k$.
The key idea of our randomized analysis is to relate the
expected collision time of random walks on the grid to
scale free aggregation.
This is joint work with Ashish Goel, Ramesh Govindan, and Rajeev
Motwani.
|
Analysis of Metabolic Pathways
Aaron Bradley
Computer scientists and control engineers view metabolic pathways as
complex nonlinear transition systems. In principle, this perspective
facilitates the answering of queries, for example, about reachable or
unreachable spaces, about equilibrium points, and about behavior under
perturbation. However, the nonlinearity of the rates of reactions
precludes direct static analysis for all but the most trivial of systems.
Thus, most prior work has analyzed pathways via simulation.
One initial approach to static analysis ignores rate information, modeling
just the reactions as a Petri net. This approach yields biologicially
relevant invariants; however, these invariants are weak. We propose a
linear abstraction of metabolic pathways called a reaction network, which
incorporates rate information via concentration partitioning. To address
uncertainty in rate or concentration data, the abstraction allows
intervals in the system description. Additionally, we present an
invariance analysis of reaction networks based on polyhedra.
In the talk, we will discuss the simulation, Petri net, and reaction
network analyses in the context of several metabolic pathways.
Concurrently, billions of such pathways will be busily working in the
audience.
|
Distinct Value Estimation (Theory & some Practice)
Sergei Vassilvitskii
The number of distinct elements in a dataset plays an important role in
query optimization, particularly in large databases; and a correct
answer can speed up a query by orders of magnitude. The problem has a
rich history and dozens of different heuristic estimators have been
proposed. In the first analytical paper on the problem Charikar,
Chaudhuri, Motwani & Narasayya showed that the problem is hard and gave
a strong lower bound together with a matching algorithm. We make one
powerful (yet very reasonable) relaxation by assuming that the values in
the dataset follow a Zipfian distribution. This assumption allows us to
break the lower bound, and give an estimator with much better error
guarantees. Finally, we test the new estimator on several synthetic and
real world datasets to see its performance in practice.
This is join work with Rajeev Motwani.
|
Symmetric vs. Asymmetric Multiple-Choice
Algorithms
Shubha Nabar
Abstract from the paper on "Symmetric vs. Asymmetric Multiple-Choice
Algorithms" by Berthold Voecking which will be presented:
Multiple-choice allocation algorithms have been studied intensively over
the last decade. These algorithms have several applications in areas of
load balancing, routing, resource allocation and hashing. The underlying
idea is simple and can be explained best in the balls-and-bins model:
Instead of assigning balls (jobs, requests or keys) simply at random to
bins (machines, servers, or positions in a hash table), choose first a
small set of bins at random, inspect these bins and place the ball in to
one of the bins containing the smallest number of balls among them.
The simple idea of first selecting a small set of alternatives at random
and then making the final choice after careful inspection of these
alternatives leads to great improvements against algorithms that place
their decisions simply at random. We illustrate the power of this
principle in terms of simple balls-and-bins processes. In particular, we
study recently presented algorithms that treat bins asymmetrically in
order to obtain a better load balancing. We compare the behavior of
these asymmetric schemes with symmetrics schemes and prove that
asymmetric schemes achieve a better load balancing than their symmetric
counterparts.
|
Talagrand's inequality and stochastic combinatorial
optimisation problems
Sanatan Rai
Consider $n$ points distributed iid randomly uniformly in the unit
cube. Form the complete graph on these points, with edge weights equal
to the Euclidean distance between the terminal vertices. It is well
known that the optimal travelling salesman tour converges to median
value in probability as $n$ grows large. Talagrand's inequality is an
example of a concentration of measure inequality that is very useful
for proving such results. In this talk we shall describe Talagrand's
inequality and its applications to the TSP and MST problems on $n$
randomly distributed points.
|
|