We consider hash tables with timeouts. The context is a stream of keyed items that arrive over time. When a new item arrives, we want to know if it has been seen recently where recently is defined by a fixed lifespan. Contrasting regular hash tables we have no explicit deletion of keys; a key dies when it has not been seen for a lifespan.
We can think of hash tables with timeouts as a model for short term memory or as a dictionary for a sliding window. Hash tables with time-outs have numerous applications as front-ends for processing of Internet traffic, e.g., maintaining sessions with time-outs, tracing packets through networks, aggregation of traffic information from routers, and helping firewalls reuse decisions from recent packets with the same key. Often we need very fast implementations to keep up with the traffic.
Linear probing is one of the most popular implementation of regular hash tables. They use simple portable code exploiting that sequential access is much more efficient than random access on a normal computer. Linear probing first makes a random acces to a hash location, and then it probes the next cells sequentially. Knuth's 1963 study of the number of probes initiated the area of analysis of algorithms.
The most popular software implementation of hash tables with timeouts appears to be tumbling windows: time is divided into consecutive windows each spanning a lifespan. Separate regular hash tables are maintained for the current and for the previous window, so when a new item arrives, we have at most two regular hash tables to consider.
We propose time-reversed linear probing tables to deal with timeouts. In many ways this is a perfect adaptation of regular linear probing where keys are cleared at no cost. We inherit the theoretical bounds known for single regular insertions---no deletions are needed. We performed an experimental comparison with alternatives like tumbling windows. Our time-reversed linear probing is simpler, faster and more space efficient. Relative to tumbling windows, we use only half the space while gaining 20-28% in speed. Our cost per item is close to that of a single random memory access.
Contrasting tumbling windows, our time-reversal works directly for the more genral case where different items may have different lifespans.
This talk is based on joint work with Chinmay Karande, Pushkar Tripathi and Wang Lei.
Linial, London and Rabinovich and Aumann and Rabani proved that the min-cut max-flow ratio for general maximum concurrent flow problems (when there are $k$ commodities) is $O(\log k)$. This ratio is independent of the size of the graph, and only depends on the number of commodities. Here we attempt to derive a more general theory of steiner cut and flow problems. Our structural results are motivated by the meta question: Suppose we are givan a $poly(\log n)$ approximation algorithm, integrality gap or competitive ratio for a flow or cut problem- when can we give a $poly(\log k)$ guarantee for a generalization of this problem to a Steiner cut or flow problem?
Thus we require that these guarantees be independent of the size of the graph, and only depend on the number of commodities (or the number of terminal nodes in a Steiner cut problem). For many natural applications (when $k=n^(o(1)}$) this yields much stronger guarantees.
We construct vertex-sparsifiers that approximately preserve the value of all terminal multicommodity flows. We prove such sparsifiers exist through zero-sum games and metric geometry, and we construct such sparsifiers through oblivious routing guarantees. These results let us reduce a broad class of multicommodity-type problems to a uniform case (on $k$ nodes) at the cost of a loss of a $poly(\log k)$ in the approximation guarantee. We then give $poly(\log k)$ approximation algorithms for a number of problems for which such results were previously unknown, such as requirement cut, I-multicut, oblivious $0$-extension, and natural generalizations of oblivious routing, min-cut linear arrangement and minimum linear arrangement.