Secure Data Sharing and Distributed Computations

This action explores the problems of sharing user ”information” (rather than ”raw data”), while preserving the privacy of the user. There are two aspects to this problem : secure data intensive computations and secure data sharing. With the former, user’s raw data can be transformed (e.g. aggregated) to compute the information to be shared. Privacy is brought by the transformation algorithm, which can be seen as an anonymization scheme. Most works in the state of the art investigate the tradeoffs between the privacy provided by such schemes and their utility, measured by the variation of results of algorithms (e.g. data mining algorithms) when run on the raw data and on the anonymized data.  The approach followed in SMIS is orthogonal: we do not contribute to new anonymization models but provide means to securely execute the anonymization process thanks to distributed trusted hardware devices used in conjunction to an untrusted Cloud. Compared to traditional Secure Multiparty  Computation, our approach leads to important performance gains, backed by generic and simple protocols. It is however sometimes not always possible to provide anonymous data for a given task. Secure data sharing tackles this issue by allowing the user to keep control over the dissemination of her data, e.g. by minimizing the data shared. SMIS focuses on simplicity of the sharing models instead of trying to increase the expressiveness (and then the complexity) of privacy models.

Asymmetric Architecture Computing: This research direction studies the secure execution of various algorithms on data stored in an unstructured network of Trusted Cells so that each user can keep control over her data. The data could be stored locally in a trusted cell or encrypted on some external cloud. Execution takes place on a specific infrastructure called the Asymmetric Architecture: the network of trusted cells, supported by an untrusted cloud supporting IaaS or PaaS. Our objective is to show that many different algorithms and computing paradigms can be executed on the Asymmetric Architecture, thus achieving secure and private computation. Our initial work started showing it was possible to implement secure computation of PPDP (Privacy Preserving Data Publishing) algorithms implementing most state of the art anonymization models (k-anonymity, C-diversity, differential privacy) and proving the security of the proposed protocols even when facing malicious adversaries. We have pursued this stream of work by extending it to the secure execution of different types of computing paradigms : we have studied how to support SQL (a declarative language) and MapReduce (a distributed programming language) on the Asymmetric Architecture. A typical application would be to compute aggregates over smart meters without disclosing any individual’s raw data (e.g., compute the mean energy consumption per time period and district). This can be done using an SQL query or a MapReduce process. Distributed queries identifying individuals also makes sense assuming the identified subjects consent to disclose the reduced set of information participating to the query result (i.e., a small subset/aggregation/reorganization of their raw data). Computing SQL queries on this infrastructure leads to two major and different problems: (1) securely computing aggregates over this data and (2) securely computing joins between data hosted by different trusted cells, which can be generalized to the computation of graph queries on this data. Computing MapReduce processes on the Asymmetric Architecture means maintaining the flexibility and efficiency of MapReduce, while adding security into the mix. We have shown that it is possible to achieve seamless integration of distributed MapReduce processing using tokens, while maintaining reasonable performance.

Secure spatio-temporal distributed processing: Mobile participatory sensing could be used in many applications such as vehicular traffic monitoring, pollution tracking, or even health surveying (e.g., to allow measuring in real-time the individual exposure to environmental risk factors or the propagation of an epidemic). However, its success depends on finding a solution for querying a large number of users which protects user location privacy and works in real-time. We addressed these issues and proposed PAMPAS, a privacy-aware mobile distributed system for efficient data aggregation in mobile participatory sensing. In PAMPAS, mobile devices enhanced with secure hardware, called secure probes, perform distributed query processing, while preventing users from accessing other users’ data. Secure probes exchange data in encrypted form with help from an untrusted supporting server infrastructure. PAMPAS uses two efficient, parallel, and privacy-aware protocols for location-based aggregation and adaptive spatial partitioning of secure probes. Our experimental results and security analysis demonstrate that these protocols are able to collect, aggregate and share statistics or derived data in real-time, without any privacy leakage.

Minimum Exposure: When users request a service, the service provider usually asks for personal documents to tailor its service to the specific situation of the applicant. For example, the rate and duration of consumer’s loans are usually adapted depending on the risk based on the income, assets or past lines of credits of the borrower. In practice, an excessive amount of personal data is collected and stored. Indeed, a paradox is at the root of this problem: service providers require users to expose data in order to determine whether that data is needed or not to achieve the purpose of the service. We explore a reverse approach, where service providers would publicly describe the data they require to complete their task, and where software (placed, depending on the context, on the client, on the server, or in a trusted hardware component) would use those descriptions to determine a minimum subset of information to expose.