Workshop program (April 8, 2013)

08:50 - 09:00


09:00 - 09:15

Opening Speech

09:15 - 10:30



Challenges to De-anonymization and Privacy protection in Online Advertising
Peng Liu, Sohu Inc.

10:30 - 11:00

Coffee Break

11:00 - 12:30

Research Session 1


Empirical Privacy and Empirical Utility of Anonymized Data
Graham Cormode, Cecilia Procopiuc, Entong Shen, Divesh Srivastava, Ting Yu

Privacy-Protecting Index for Outsourced Databases
Chung-Min Chen, Andrzej Cichocki, Allen McIntosh, Euthimios Panagos

On Syntactic Anonymity and Differential Privacy
Tamir Tassa, Chris Clifton

12:30 - 13:30


13:30 - 14:30

Tutorial Session


Building Blocks of Privacy: Differentially Private Mechanisms
Graham Cormode

14:30 - 15:00

Invited Talk


Accurate Analysis of Large Private Datasets
Vibhor Rastogi

15:00 - 15:30

Coffee Break

15:30 - 16:30

Research Session 2


On Information Leakage by Indexes over Data Fragments
Sabrina De Capitanti di Vimercati, Sara Foresti, Sushil Jajodia, Stefano Paraboschi, Pierangela Samarati

Privacy against Aggregate Knowledge Attacks
Olga Gkountouna, Katerina Lepenioti, Manolis Terrovitis



KEYNOTE: 09:15 - 10:30, April 8, 2013

Challenges to De-anonymization and Privacy protection in Online Advertising, Peng Liu, Sohu Inc.


ABSTRACT: Privacy protection, as well as its opposite problem - de-anonymization, have attracted serious attention in research community. Online advertising is one of the most important web products that highly rely on users' behaviorial data, and we encounter some interesting applications resort to privacy technologies. In this talk, we introduce two typical challenges arise in supply and demand sides. On supply-side, de-ano nymization requirements occurs when we try to link cookies in an ad network with accounts in an open SNS. This problem suffers from few quasi-identifiers, but can be solvable given a small percentage of supervised data. On demand side, how to protect advertiser's data privacy, is important to establish a stable and advertiser-beneficial market. To avoid an advertiser inferring valuable audience segments of its competitors, we need systematic study on the corresponding privacy protection technology. We will go over background and applications of these problems, and provide our perspective of potential solutions.

BIO: Dr. Peng Liu joined Microsoft Research Asia after received his Ph.D. from Tsinghua University in January, 2005, and focused on research in human-computer interface in speech group. In January 2009, Peng joined Yahoo! Beijing R&D Center as one of the key founders, and acted as senior scientist. In September 2010, he co-founded Yahoo! Beijing Labs, and acted as head of the advertising and recommendation group. Peng was responsible for algorithm and infrastructure of several Yahoo! Global advertising products, and made significant contribution to a couple of Yahoo!'s key markets. In June 2011, Peng joined MediaV, a leading demand-side advertising technology company in China, as Chief scientist, and helped the company achieved 500 million RMB annual revenue. He is now the General Manager of Advertising Technology Center of, and driving display advertising and personalized recommendation products in Sohu Group.


TUTORIAL: 13:30 - 14:30, April 8, 2013

Building Blocks of Privacy: Differentially Private Mechanisms, Graham Cormode

ABSTRACT: The model of differential privacy has become widely accepted as a suitable way to release information about private data while preserving the privacy of the individuals contributing to the data.  One reason for its popularity is the availability of several "mechanisms" that produce output meeting the differential privacy guarantee.  A second reason is that there are simple rules for reasoning about the composition of multiple differentially private outputs.  Taken together, this provides a set of building blocks and cement which can be used to produce algorithms for private data release. In this tutorial, I will review some of the popular mechanisms for differential privacy, and show some of the ways that they can be used for a variety of different data releases.

BIO: Graham Cormode is a Principal Member of Technical Staff in the Database Management Group at AT&T Shannon Laboratories in New Jersey. Previously, he was a researcher at Bell Labs, after postdoctoral study at the DIMACS center in Rutgers University from 2002-2004. His PhD was granted by the University of Warwick in 2002. His work considers aspects of managing and working with large amounts of data, with particular emphasis on privacy and anonymization, and large scale analytics.


INVITED TALK: 14:30 - 15:00, April 8, 2013

Accurate Analysis of Large Private Datasets, Vibhor Rastogi

ABSTRACT: Today, no individual has full control over access to his personal information. Private data collected by various hospitals and universities, and also by websites like Google and Facebook, contain valuable statistical facts that can be mined for research and analysis, e.g., analyze outbreak of diseases, detect traffic patterns on the road, or understand browsing trends on the web, but concerns about individual privacy severely restricts its use, e.g., privacy attacks led AOL to recently pull-off its published search-log data.

To remedy this, much recent work focuses on data analysis with formal privacy guarantees. This has given rise to differential privacy considered by many as the golden standard of privacy. However, few practical techniques satisfying differential privacy exist for complex analysis tasks (e.g., analysis involving complex query operators), or new data models (e.g., data having temporal correlations or uncertainty). In this talk, I will discuss techniques that fill this void. I will first discuss a query answering algorithm that can handle joins (previously, no private technique could accurately answer join queries arising in many analysis tasks). This algorithm makes several privacy-preserving analyses over social network graphs possible for the first time. Then I will discuss a query-answering technique over time-series data, which enables private analysis of GPS traces and other temporally-correlated data. Third, I will discuss an access control mechanism for uncertain data, which enables enforcing security policies on RFID-based location data.

BIO: Vibhor Rastogi is currently a Research Scientist at Google (since June 2013), before which he was a Research Scientist at Yahoo (Oct 2010-May 2013). Prior to joining Yahoo research, Vibhor did his Ph.D. in Computer Science at University of Washington, advised by Prof. Dan Suciu. Vibhor received a B.Tech. in Computer Science and Engineering from the Indian Institute of Technology (IIT) Bombay in 2005, and an M.S. in Computer Science from University of Washington in 2007.


RESEARCH SESSION 1: 11:00 - 12:30, April 8, 2013

Empirical Privacy and Empirical Utility of Anonymized Data, Graham Cormode, Cecilia Procopiuc, Entong Shen, Divesh Srivastava, Ting Yu

ABSTRACT: Procedures to anonymize data sets are vital for companies, government agencies and other bodies to meet their obligations to share data without compromising the privacy of the individuals contributing to it. Despite much work on this topic, the area has not yet reached stability. Early models (k-anonymity and L-diversity) are now thought to offer insufficient privacy. Noise-based methods like differential privacy are seen as providing stronger privacy, but less utility. However, across all methods sensitive information of some individuals can often be inferred with relatively high accuracy. In this paper, we reverse the idea of a ‘privacy attack,’ by incorporating it into a measure of privacy. Hence, we advocate the notion of empirical privacy, based on the posterior beliefs of an adversary, and their ability to draw inferences about sensitive values in the data. This is not a new model, but rather a unifying view: it allows us to study several well-known privacy models which are not directly comparable otherwise. We also consider an empirical approach to measuring utility, based on a workload of queries. Consequently, we are able to place different privacy models including differential privacy and early syntactic models on the same scale, and compare their privacy/utility tradeoff. We learn that, in practice, the difference between differential privacy and various syntactic models is less dramatic than previously thought, but there are still clear domination relations between them.

Privacy-Protecting Index for Outsourced Databases, Chung-Min Chen, Andrzej Cichocki, Allen McIntosh, Euthimios Panagos

ABSTRACT: In this paper, we present dithered B-tree, a B-tree index structure that can serve as a building block for realizing efficient system implementations in the area of secure and private database outsourcing. A dithered B-tree prevents a third party that searches this index structure from learning whether or not the search term (i.e., key) is present in the database. This privacy related property is crucial in application domains where the party responsible for answering a query is not allowed to learn whether a specific value exists in the database.

On Syntactic Anonymity and Differential Privacy, Tamir Tassa, Chris Clifton

ABSTRACT: Recently, there has been a growing debate over approaches for handling and analyzing private data. Research has identified issues with syntactic anonymity models. Differential privacy has been promoted as the answer to privacy-preserving data mining. We discuss here issues involved and criticisms of both approaches, and conclude that both have their place. We identify research directions that will enable greater access to data while improving privacy guarantees.


RESEARCH SESSION 2: 15:30 - 16:30, April 8, 2013

On Information Leakage by Indexes over Data Fragments, Sabrina De Capitanti di Vimercati, Sara Foresti, Sushil Jajodia, Stefano Paraboschi, Pierangela Samarati

ABSTRACT: Data fragmentation has recently emerged as a complementary approach to encryption for protecting confidentiality of sensitive associations when storing data at external parties. In this paper, we discuss how the use of indexes, typically associated with the encrypted portion of the data, while desirable for providing effectiveness and efficiency in query execution, can - combined with fragmentation - cause potential leakage of confidential (encrypted or fragmented) information. We illustrate how the exposure to leakage varies depending on the kind of indexes. Such observations can result useful for the design of approaches assessing information exposure and for the definition of safe (free from inferences) indexes in fragmented data.

Privacy against Aggregate Knowledge Attacks, Olga Gkountouna, Katerina Lepenioti, Manolis Terrovitis

ABSTRACT: This paper focuses on protecting the privacy of individuals in publication scenarios where the attacker is expected to have only abstract or aggregate knowledge about each record. Whereas, data privacy reseah usually focuses on defining stricter privacy guarantees that assume increasingly more sophisticated attack scenarios, it is also important to have anonymization methods and guarantees that will address any attack scenario. Enforcing a stricter guarantee than required increases unnecessarily the information loss.

Consider for example the publication of tax records, where attackers might only know the total income, and not its constituent parts. Traditional anonymization methods would protect user privacy by creating equivalence classes of identical records. Alternatively, in this work we propose an anonymization technique that generalizes attributes, only as much as needed to guarantee that aggregate values over the complete record, will create equivalence classes of at size k. The experimental evaluation on real data shows that the proposed method produces anonymized data that lie closer to the original ones, with respect to traditional anonymization algorithms.