CS 361A - Autumn Quarter 2005-06
(Advanced Data Structures and Algorithms)

News Flash    Administrivia    Signup    Overview    Handouts/Homeworks    Lecture Schedule    Readings

 

News Flash

Please note the change in office hours for Dilys Thomas.

 

Administrivia
        Instructor:   Rajeev Motwani
        Teaching Assistant: Dilys Thomas
(dilys@stanford.edu)
        Class Schedule: Mon/Wed, 3:15-4:30, Gates B08

        Office Hours:  
                                Dilys Thomas
             Tue/Thu 2-4pm (Location: Gates 482, Phone: 723-4532)
                                Rajeev Motwani          Mon/Wed, 4:30pm (right after the lectures)

Course URL:  http://theory.stanford.edu/~rajeev/cs361.html

 

Class Sign-up To sign up for this course, please send email to Dilys Thomas with the following information: name, department, status (Phd/MS/UG, year), area (Databases, Systems, Theory, etc), and email address.  Please also include registration status (credit, pass/fail/audit).

Mailing Lists and Newsgroup We have set up a class mailing list to help you get the latest information regarding the class. The email lists are auto-populated using current course enrolment information. The main list will be cs361a-aut0506-all@lists.stanford.edu. Those who audit the course can subscribe by sending an email to majordomo@lists.stanford.edu with the following text in the body of the mail: subscribe cs361a-aut0506-guests. Those who are auditing the course and have filled the signup sheet in class on September 28 have already been added onto the guest list. We also have a newsgroup su.class.cs361a for the class.

Grading Since this course will be treated as a graduate research seminar, we expect that most students will register pass/fail (and not for a letter grade). If you do choose to sign up for a letter grade, be sure to mention this in your sign-up email to the TA. We will give out 3-4 homeworks, one of which will serve as a take-home midterm exam. There will no final exam. The scores on these homeworks as well as class participation will determine your final grade.

 

Course Overview

Efficient strategies for complex data-structuring problems are essential in the design of fast algorithms for a variety of applications, including combinatorial optimization, databases and data mining, information retrieval and web search, and geometric applications. We will give a systematic exposition of the central ideas in the design of such data structures. The second main theme of this course will be the design and analysis of online algorithms and data stream algorithms. The field of competitive analysis of online algorithms got its start in the amortized analysis for data structures and forms a natural extension of some of the ideas we will discuss in the earlier part of the course. We will present some of the main ideas and motivating applications for this class of algorithms. Time permitting, we will also cover some topics in the related area of algorithms and data structures in the stream model of computation. The material to be covered will be drawn from the following list:

Advanced Data Structures:  hash tables (universal hashing, perfect hashing, locality-sensitive hashing, Bloom filters); data structures for combinatorial optimization (union-find, Fibonacci heaps, dynamic trees, dynamic graph structures); self-adjusting data structures (lists, splay trees); search trees (red-black trees, self-adjusting trees,  treaps, skip lists, finger search trees, biased search trees); fault-tolerant and persistent data structures; suffix trees and string searching; databases/data-mining/data-stream  (histograms, indexes, hashing, synopses and sketches, sliding windows); geometric and kinetic data structures.

Online and Data Stream Algorithms: paging/caching problems; abstractions (k-server problem, request-answer games, and metrical task systems); scheduling and load balancing; network algorithms; data migration/replication in distributed computing; stream algorithms and data structures for database problem.

This course should be of interest to graduate students in computer science and related fields, especially those with a mathematical bent of mind. We will assume familiarity with basic material in algorithms, combinatorics, and probability theory (at the level of the core undergraduate courses on these topics).

 

Handouts and Homeworks

 

Handout

Date

Topic

Download

1

Mon, Sep 26

Reading List

ps or pdf

2

Mon, Sep 26

Notes for Lectures 1 & 2

ps or pdf

3

Mon, Oct 3

Notes for Lectures 3 & 4

hard-copy only

4

Mon, Oct 10

Notes for Lecture 5

hard-copy only

5

Mon, Oct 10

Notes for Lecture 6

hard-copy only

6

Mon, Oct 10

Homework 1

ps or pdf

7

Wed, Oct 12

Notes for Lectures 7 & 8

hard-copy only

9

Wed, Oct 19

Notes for Lectures 9 & 10

hard-copy only

10

Mon, Oct 24

Solutions for Homework 1

hard-copy only

11

Mon, Oct 24

Homework 2

ps or pdf

12

Wed, Oct 26

Notes for Lecture 11

hard-copy only

13

Wed, Oct 26

Notes for Lecture 12

hard-copy only

14

Wed, Nov 2

Notes for Lecture 13

hard-copy only

15

Wed, Nov 2

Notes for Lecture 14

hard-copy only

16

Wed, Nov 9

Additional slides for Lecture 14

ppt

17

Wed, Nov 9

Notes for Lecture 15

ppt

18

Wed, Nov 9

Notes for Lecture 16/17

ppt

19

Mon, Nov 14

Solutions for Homework 2

hard-copy only

20

Mon, Nov 14

Homework 3

ps or pdf

21

Mon, Nov 28

Notes for Lecture 18

ppt

22 Wed, Nov 30 Notes for Lecture 19

ppt

23 Wed, Nov 30 Notes for Lecture 20

ppt

24

Wed, Nov 30

Homework 4

ps or pdf

 

 

Lecture Schedule

Lecture

Date

Topic

Lecture Notes

1
2

Mon, Sep 26
Wed, Sep 28

Should Tables be Sorted?

Handout 2 (ps, pdf)

3
4

Mon, Oct 3
Wed, Oct 5

Hashing: Universal and Perfect

Handout 2 (ps, pdf)
Handout 3 (hard-copy only)

5

Mon, Oct 10

Amortization and List Update

Handout 4 (hard-copy only)

6

Wed, Oct 12

Disjoint Sets and Union-Find

Handout 5 (hard-copy only)

7

8

Mon, Oct 17

Wed, Oct 19

Competitive Analysis and Paging

Handout 7 (hard-copy only)

9

10

Mon, Oct 24

Wed, Oct 26

Randomized Online Algorithms

Handout 9 (hard-copy only)

11

Mon, Oct 31

Self-Adjusting Search Trees

Handout 12 (hard-copy only)

12

Wed, Nov 2

Treaps: Randomized Search Trees

Handout 13 (hard-copy only)

13

Mon, Nov 7

Skip Lists

Handout 14 (hard-copy only)

14

Wed, Nov 9

Caching Queues
Self-Adjusting & Fibonacci Heaps

Handout 16 (ppt)

Handout 15 (hard-copy only)

15

Mon, Nov 14

Hashing for Massive/Streaming Data

Handout 17 (ppt)

16
17

Wed, Nov 16 Mon, Nov 28

Synopses, Samples, and Sketches

Handout 18 (ppt)

18

Wed, Nov 30

Fingerprints, Min-Hashing and Document Similarity

Handout 21 (ppt)

19 Mon, Dec 5 Near Neighbors Handout 22 (ppt)
20 Wed, Dec 7 Data Mining: Association Rules Handout 23 (ppt)

 

 

Reading List

Text-books: There is no required text-book for this course but the following may be useful for some of the we plan to cover. At least the first three are recommended (but not required), the rest are useful reading if you are interested in delving further into the topics.

Randomized Algorithms, R. Motwani and P. Raghavan, Cambridge University Press, 1995.

Data Structures and Network Algorithms, R.E. Tarjan, SIAM 1983.

Online Computation and Competitive Analysis, A. Borodin and R. El-Yaniv, Combridge University Press, 1998.

Introduction to Algorithms, T.H. Cormen, C.E.Leiserson, R.L. Rivest, and C. Stein,  McGraw-Hill, 2002.

Managing Gigabytes: Compressing and Indexing Documents and Images, I.H. Witten, A. Moffat and T. C. Bell, Morgan Kauffman, 1999.

Modern Information Retrieval, Baeza-Yates and Ribeiro-Neto, Addison-Wesley, 1999.

 

Lectures 1 and 2 - Should tables be sorted?

Lectures 3 and 4 - Hashing: Universal and Perfect

 

Lecture 5 - Amortization and List Update Problem

 

Lecture 6 - Disjoint Sets and Union-Find

 

Lectures 7 and 8 - Competitive Analysis and Paging

 

Lectures 9 and 10 - Randomized Online Algorithms

 

Lecture 11 - Self-Adjusting Search Trees

 

Lecture 12 - Treaps: Randomized Search Trees

 

Lecture 13 - Skip Lists

 

Lecture 14 (part 1) - Caching Queues

 

Lecture 14 (part 2) - Self-Adjusting and Fibonacci Heaps

 

Lecture 15 -- Hashing for Massive/Streaming Data

 

 Lecture 16 and 17 -- Synopses, Samples, and Sketches

Lecture 18 -- Fingerprints, Min-Hashing, and Document Similarity

 

Lecture 19 -- Near Neighbors

  Nearest Neighbors

    Random Projections

 

Lecture 20 -- Data Mining: Association Rules