Nina Mishra


Search Labs, Microsoft Research
Former Associate Professor

Department of Computer Science
University of Virginia

 

Research Interests
My interests are in the design and analysis of algorithms for unearthing patterns in massively large, dynamic datasets.   There are several aspects of data mining that I find particularly intriguing:
  • Internet: Many new data sets are generated by virtue of the Internet.  My work investigates algorithms for mining this data, for example, to improve web search or to discover communities in social networks.
  • Scalability: Modern data sets are massively large and often streaming.  Given a known algorithm that only works on a small, static data set, I think about how best to modify the algorithm for a large, dynamic data set, while also approximately retaining the original functionality.
  • Clustering: A process that, given a collection of points, groups similar points together and places dissimilar points apart.  The points can vary from vertices in a graph to points in a metric space.  I study algorithms for efficiently discovering good clusters.
  • Privacy: Many data sets that are mined today contain confidential information.  My research seeks to strike a fine balance between simultaneously enabling the discovery of large-scale statistical patterns while disabling the recovery of private information.


Keywords: Data Mining, Machine Learning, and Privacy-Preserving Algorithms

 

Biography
 

Search Labs, Microsoft Research. 2007-present.
Associate Professor, CS Department, University of Virginia, 2005-2008.
Acting Faculty, CS Department, Stanford University, 2002-2005.
Senior Research Scientist, HP Labs, 1997-2005.
PhD Computer Science, University of Illinois at Urbana-Champaign, 1997.

 


Edited Proceedings  
Theoretical Advances in Data Clustering.  Special Issue of the Machine Learning Journal.  With Rajeev Motwani.  2004.

Proceedings of the Twentieth International Conference on Machine Learning.  With Tom Fawcett.  2003.

Journal Articles  

Secure Computation of the Median (and Other Elements of Specified Ranks).
With Gagan Aggarwal and Benny Pinkas.
Journal of Cryptology.
To Appear.

Finding Strongly-Knit Clusters in Social Networks.
With Robert Schreiber, Isabelle Stanton and Robert E. Tarjan.
Internet Mathematics.
5(2):155-174, 2009.

Privately Finding Specifications.
With Westley Weimer.
IEEE Transactions on Software Engineering.
34(1):21-32, 2008.

A New Conceptual Clustering Framework.
With Dana Ron and Ram Swaminathan.
Machine Learning journal.  Kluwer Academic Publishers.
56 (1-3): 115-151, 2004.

Version Spaces and the Consistency Problem.
With Haym Hirsh and Leonard Pitt.
Artificial Intelligence Journal (AIJ).
156(2):115-138, 2004.

Clustering Data Streams: Theory and Practice.
With Sudipto Guha, Adam Meyerson, Rajeev Motwani and Liadan O'Callaghan.
IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE)
Special issue on Online Analysis and Querying of Continuous Data Streams.
15(3): 515-528, 2003.

Efficient Read-Restricted Monotone CNF/DNF Dualization by Learning with Membership Queries.
With Carlos Domingo, Leonard Pitt.
Machine Learning journal. Kluwer Academic Publishers.
37(1): 89-110 (1999).

Learning from a Consistently Ignorant Teacher.
With Michael Frazier, Sally A. Goldman, Leonard Pitt.
Journal of Computer and System Sciences (JCSS).
Special Issue on Computational Learning Theory.
52(3): 471-492 (1996).


Conference/Workshop/Other  
 

Adapting to the Shifting Intent of Search Queries
with Umar Syed and Alex Slivkins.
Neural Information Processing Systems (NIPS). 2009.

Releasing Search Queries and Clicks Privately
with Aleksandra Korolova, Krishnaram Kenthapadi and Alex Ntoulas.
Proceedings of the 18th International World Wide Web Conference (WWW). 2009.

Generating Labels from Clicks
with Rakesh Agrawal, Alan Halverson, Krishnaram Kenthapadi and Panayiotis Tsaparas.
Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM). 2009.

Spatial Scan Statistics for Graph Clustering
with Bei Wang, Jeff Phillips, Robert Schreiber, Dennis Wilkinson and Robert E. Tarjan.
SIAM Conference on Data Mining. 2008.

A Survey of Query Auditing Techniques for Data Privacy
with Shubha Nabar, Krishnaram Kenthapadi and Rajeev Motwani.
Book chapter in Privacy-Preserving Data Mining: Models and Algorithms.
To be published by Kluwer Academic Publishers. 2008.

Clustering Data Streams
with Sudipto Guha.
Book chapter in Data Stream Management: Processing High-Speed Data Streams.
To be published by Springer-Verlag. 2007.

Clustering Social Networks
with Robert Schreiber, Isabelle Stanton and Robert E. Tarjan.
5th Workshop On Algorithms And Models For The Web-Graph (WAW). 2007.

Towards Robustness in Query Auditing
with Shubha Nabar, Bhaskara Marthi, Krishnaram Kenthapadi and Rajeev Motwani.
32nd International Conference on Very Large Data Bases (VLDB). 2006.

When Random Sampling Preserves Privacy
with Kamalika Chaudhuri.
26th Annual International Cryptology Conference (CRYPTO). 2006.

Privacy via Pseudorandom Sketches
With Mark Sandler.
25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 2006.

Finding Closely-Related Groups of Objects in Very Large Datasets
With Robert Schreiber and Robert Tarjan.
Hewlett-Packard Technical Conference (HP TechCon), 2006.

Sublinear Projective Clustering with Outliers
with Rajeev Motwani and Sergei Vassilvitskii.
15th Annual Fall Workshop on Computational Geometry and Visualization, 2005.

Simulatable Auditing
With Krishnaram Kenthapadi and Kobbi Nissim.
24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), 2005.
Earlier version "How Auditors May Inadvertently Compromise Your Privacy" appeared in PORTIA Workshop on Sensitive Data in Medical, Financial, and Content-Distribution Systems, 2004.

Enabling Privacy for the Paranoids (Vision Paper)
With Gagan Aggarwal, Mayank Bawa, Prasanna Ganesan, Hector Garcia-Molina, Krishnaram Kenthapadi, Rajeev Motwani, Utkarsh Srivastava, Dilys Thomas, Jennifer Widom and Ying Xu
Proceedings of the 30th Intl Conference on Very Large Databases (VLDB), 2004.

On Identifying Stable Ways to Configure Systems.
With Gagan Aggarwal, Mayur Datar and Rajeev Motwani.
Proceedings of the 1st International Conference on Autonomic Computing (ICAC). 2004.

Secure Computation of the kth Ranked Element.
With Gagan Aggarwal and Benny Pinkas.
Advances in Cryptology - EUROCRYPT 2004: International Conference on the Theory and Applications of Cryptographic Techniques. 2004.

On Finding Large Conjunctive Clusters..
With Dana Ron and Ram Swaminathan.
Proceedings of the 16th Annual Conference on Learning Theory (COLT). 2003.

Towards the Visualization of Overlapping Sets.
With Xavier Boyen and Liadan O'Callaghan.
DIMACS Workshop on Computational Geometry. 2002.

Large Clusters of Web Pages.
With Dana Ron and Ram Swaminathan.
Workshop on Algorithms and Models for the Web Graph (WAW). 2002.

Streaming Data Algorithms for High-Quality Clustering.
With Liadan O'Callaghan, Adam Meyerson, Sudipto Guha, and Rajeev Motwani.
Proceedings of the 18th International Conference on Data Engineering (ICDE). 2002.

Sublinear Time Approximate Clustering.
With Dan Oblinger and Leonard Pitt.
Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 2001.

Clustering Data Streams.
With Sudipto Guha, Rajeev Motwani and Liadan O'Callaghan.
41st Annual Symposium on Foundations of Computer Science (FOCS), IEEE Press, 2000.

Learning from a Monotonous, Ignorant Teacher.
PhD Thesis.  UIUC Technical Report, UIUCDCS-R-97-2023, October 1997.

Generating all Maximal Independent Sets of Bounded-Degree Hypergraphs.
With Leonard Pitt.
Proceedings of the 10th Annual Conference on Computational Learning Theory (COLT).  pp. 211-217, ACM Press, 1997.

Version Spaces without Boundary Sets.
With Haym Hirsh, Leonard Pitt.
Proceedings of the 14th National Conference on Artificial Intelligence (AAAI). pp. 491-496. 1997.

Learning from a Consistently Ignorant Teacher.
With Michael Frazier, Sally A. Goldman, Leonard Pitt.
Proceedings of the Annual Conference on Computational Learning Theory (COLT). 1995.



Professional Activities  
Editorial Boards


Program Chair


Program Committees

  • WSDM'10 Intl Conference on Web Search and Data Mining
  • PODS'09 Principles of Database Systems
  • AAAI'08: Conference on Artificial Intelligence
  • KDD'08: Knowledge Discovery and Data Mining
  • KDD'07: Knowledge Discovery and Data Mining
  • KDD'06: Knowledge Discovery and Data Mining
  • ICML'06: International Conference on Machine Learning
  • ICML'05: International Conference on Machine Learning
  • KDD'05: Knowledge Discovery and Data Mining
  • SDM'05: SIAM International Conference on Data Mining
  • KDD'04: Knowledge Discovery and Data Mining
  • SDM'03: SIAM International Conference on Data Mining
  • ICDM'02: IEEE International Conference on Data Mining
  • SDM'02: SIAM International Conference on Data Mining
  • KDD'01: Knowledge Discovery and Data Mining
  • IJCAI'01: International Joint Conference on Artificial Intelligence
  • ICML'00:. International Conference on Machine Learning

NSF Panelist: 2002, 2004, 2007.

Advisory Board, ICML'04


Teaching  
 

CS302: Theory of Computation.  University of Virginia.  Spring 2007.

CS651: Internet Algorithms.  University of Virginia.  Fall, 2006.

CS851: Data Mining Algorithms.  University of Virginia.  Spring, 2006.

CS369C: Clustering Algorithms. Stanford University. Spring, 2005.

CS369: A Study of Perturbation Techniques for Data Privacy. With Cynthia Dwork and Kobbi Nissim. Stanford University. Spring, 2004.

CS361A: Advanced Algorithms for Internet ApplicationsWith Rajeev Motwani. Stanford University. Autumn, 2002.