My interests are in the design and analysis
of algorithms for unearthing patterns in massively large, dynamic datasets.
There are several aspects of data mining that I find particularly intriguing:
- Internet: Many new data sets are generated by virtue of the Internet. My
work investigates algorithms for mining this data, for example, to improve
web search or to discover communities in social networks.
- Scalability: Modern data sets are massively large and often
streaming. Given a known algorithm that only works on a small, static
data set, I think about how best to modify the algorithm for a large, dynamic data set,
while also approximately retaining the original functionality.
- Clustering: A process that, given a collection of points, groups
similar points together and places dissimilar points apart. The points
can vary from vertices in a graph to points in a metric space. I study
algorithms for efficiently discovering good clusters.
- Privacy: Many data sets that are mined today contain confidential
information.
My research seeks to strike a fine balance between
simultaneously enabling the discovery of large-scale statistical patterns
while disabling the recovery of private information.
Keywords: Data Mining, Machine Learning, and Privacy-Preserving Algorithms
| |