CS 369
A Study of Perturbation Techniques for Data Privacy


Instructors:   Cynthia Dwork and Nina Mishra and Kobbi Nissim
Class Schedule: Tuesdays, 1-3:00pm, Gates 259
Office Hours:   email to schedule
Number of Units: 2

Course Overview

The digital age has enabled widespread access to and collection of data. While there are several advantages to ubiquitous access to data, there is also the potential for breaching the privacy of individuals.

In a statistical database, personal information about n individuals is typically stored (n is usually very large). A statistical database system gives users the ability to obtain aggregate statistical information (like medians, averages, counts) and yet also preserve the privacy of individuals. Typical applications include medical, financial, and census data.

The course will study techniques for simultaneously enabling access to aggregate data and preserving privacy. Data perturbation is a classical technique for solving this problem. There are two flavors of data perturbation. In one version, the data are perturbed once, and the perturbed values are published. In the second version, the data are held secret; the database algorithm computes the true response to queries, and adds noise to the answer, reporting only the noisy answer. Both versions of the problem have a rich literaure.
The goal of the course is to

Background in probability, statistics, cryptography, and algorithms would be helpful.


The course can only be taken pass/fail and not for a letter grade. There will not be any exams.  The grade in the class will be based on a presentation done the last day of class (June 1st) on a paper selected from the list below. Please communicate which paper(s) you plan to present by May 27, 2004. If you'd like to discuss a paper that's not on the list, please email the instructors.





Week Dates
March 30
Introduction to Privacy - Motivating Applications
Introduction to Cryptography, Secure Function Evaluation
Kobbi Nissim
April 6
Introduction to Cryptography, Semantic Security
Query Restriction
Kobbi Nissim
Nina Mishra
April 13
Query Auditing
Nina Mishra
April 20
Sampling Contingency Tables
Cynthia Dwork
Persi Diaconis
April 27
Cell Suppression Nina Mishra
May 4
Input Perturbation
Cynthia Dwork
May 11
Input Perturbation: Randomization Approach
Limits of Perturbation, Output Perturbation
Kobbi Nissim
May 18
Output Perturbation Kobbi Nissim
May 25
Synthetic Datasets
Cynthia Dwork
June 1
Students discuss their writeups



Reading List



   Secure Function Evaluation

   Semantic Security

Query Restriction/Auditing

Sampling Contingency Tables

Cell Suppression

Data Swapping

Input Perturbation

Output Perturbation

Synthetic Datasets