CS 369
A Study of Perturbation Techniques for Data Privacy
Instructors:
Cynthia Dwork and Nina
Mishra and
Kobbi Nissim
Class Schedule: Tuesdays,
1-3:00pm, Gates 259
Office Hours: email to
schedule
Number of Units: 2
The digital age has enabled widespread access to and collection of data. While there are several advantages to ubiquitous access to data, there is also the potential for breaching the privacy of individuals.
In a statistical database, personal information about n individuals is typically stored (n is usually very large). A statistical database system gives users the ability to obtain aggregate statistical information (like medians, averages, counts) and yet also preserve the privacy of individuals. Typical applications include medical, financial, and census data.
The course will study techniques for simultaneously enabling access to aggregate data and preserving privacy. Data perturbation is a classical technique for solving this problem. There are two flavors of data perturbation. In one version, the data are perturbed once, and the perturbed values are published. In the second version, the data are held secret; the database algorithm computes the true response to queries, and adds noise to the answer, reporting only the noisy answer. Both versions of the problem have a rich literaure.
The goal of the course is to
- understand the different definitions of data privacy
- understand the techniques for achieving privacy
- assess to what extent the suggested measures actually provide data privacy
- suggest new definitions of privacy
- suggest new algorithms for providing privacy
Background in probability, statistics, cryptography, and algorithms would be helpful.
Grading
The course can only be taken pass/fail and not for a letter grade. There will not be any exams. The grade in the class will be based on a presentation done the last day of class (June 1st) on a paper selected from the list below. Please communicate which paper(s) you plan to present by May 27, 2004. If you'd like to discuss a paper that's not on the list, please email the instructors.
Query Restriction/Auditing
Input Perturbation
Week | Dates |
Topic |
Lecturer |
1 |
March 30 |
Introduction to Privacy -
Motivating Applications Introduction to Cryptography, Secure Function Evaluation |
Kobbi Nissim |
2 |
April 6 |
Introduction to Cryptography,
Semantic
Security Query Restriction |
Kobbi Nissim Nina Mishra |
3 |
April 13 |
Query Auditing |
Nina Mishra |
4 |
April 20 |
Sampling Contingency Tables |
Cynthia Dwork Persi Diaconis |
5 |
April 27 |
Cell Suppression | Nina Mishra |
6 |
May 4 |
Input Perturbation |
Cynthia Dwork |
7 |
May 11 |
Input Perturbation:
Randomization Approach Limits of Perturbation, Output Perturbation |
Kobbi Nissim |
8 |
May 18 |
Output Perturbation | Kobbi Nissim |
9 |
May 25 |
Synthetic Datasets |
Cynthia Dwork |
10 |
June 1 |
Students discuss their writeups |
All |
Books/Surveys
Applications
Cryptography
Secure Function Evaluation
Semantic Security
Query
Restriction/Auditing
Sampling
Contingency Tables
Cell
Suppression
Data Swapping
Input
Perturbation
Output
Perturbation
Revealing Information while Preserving Privacy. I. Dinur and K. Nissim, PODS 2003. (abstract) (ppt) (html) (pdf)
Privacy Preserving Data Mining on Vertically Partitioned Databases. C. Dwork and K. Nissim. Manuscript. 2004.
Synthetic Datasets