This internship may open the way for a PhD thesis or an engineer position in SMIS/PETRUS research team.
An increasing amount of personal data is automatically gathered and stored on servers by administrations, hospitals, insurance companies, etc. Citizen themselves often rely on Internet companies for data storage due to the high reliability and availability offered by Internet services. However, these benefits are counterbalanced by the privacy risks incurred by centralization. We propose a radically different way of considering the management of personal data. It builds upon the emergence of new portable and secure devices combining the security of smart cards and the storage capacity of NAND Flash chips. By embedding a full-fledged Personal Data Server (PDS) in such devices, the user’s control of how her sensitive data is shared by others (by whom, for how long, according to which rule, for which purpose) can be fully reestablished and convincingly enforced (cf. the Trusted Cell vision paper ).
The PlugDB engine is a Personal Data Server capable of storing data (currently tuples and documents) in tables and BLOBs, indexing them and querying them in SQL (tables) and keyword queries (documents). PlugDB engine is embedded in speciﬁc secure devices called smart token (see ﬁgure here below) designed by SMIS research team and assembled by electronic SMEs. The personal DB is hosted encrypted in NAND Flash and the PlugDB engine code runs in the microcontroller.
The goal of this internship is to design a secure personal document manager on top of PlugDB. The idea is to provide a service equivalent to Digiposte.fr, in a decentralized version, with all the user’s data managed from the secure device mentioned above, thanks to PlugDB. Digiposte is a document manager proposed by “La Poste”, which allow retrieving automatically a set of personal documents from a large number of commercial web sites (banks, sellers, administrations, etc). While the service offered is undoubtedly interesting, the user has no other way than communicating his own credentials (login and password) to Digiposte, such that the service can log on web site in place of the user and retrieve associated documents (bills, accounts, tax form, etc). Our goal is to develop a version of this service running on a device owned by the user, thus avoiding disclosing user’s credentials which may be exposed to attacks.
More precisely, the goal is to develop a set of web site “scrapper” than can log-in, in place of the user, retrieve available documents, encrypt the document and index it in PlugDB. In addition, documents may be wrapped (i.e., analysis and data extraction from the document, e.g., amount of a bill, list of products, etc.) and the extracted data could be inserted in PlugDB for further use (added value services based on queries on the data and/or metadata).
The intern could use available open-source libraries for scrapping/wrapping like Weboob (http://weboob.org/ allows to extract documents from a rather large number of commercial web sites).
Required skills: Java/C Programming, storage and indexing data structures, knowledge in cryptography is a plus.
Advisors: Luc Bouganim and Philippe Pucheral
Localization: PETRUS team https://www.inria.fr/en/teams/petrus (ex SMIS https://www.inria.fr/equipes/smis), localized at UVSQ, 45 avenue des Etats Unis – 78035 Versailles (http://tinyurl.com/comeUVSQ)
 N. Anciaux, P. Bonnet, L. Bouganim, B. Nguyen, I. Sandu Popa, P. Pucheral, Trusted Cells: A Sea Change for Personal Data Services, in: Proceedings of the 6th biennial Conference on Innovative Database Research (CIDR 2013), Asilomar, United States, January 2013.