Internship proposal: Defining and implementing a keyword-faceted search query engine

This internship may open the way for a PhD thesis.

An increasing amount of personal data is automatically gathered and stored on servers by administrations, hospitals, insurance companies, etc. Citizen themselves often rely on Internet companies for data storage due to the high reliability and availability offered by Internet services. However, these benefits are counterbalanced by the privacy risks incurred by centralization. We propose a radically different way of considering the management of personal data. It builds upon the emergence of new portable and secure devices combining the security of smart cards and the storage capacity of NAND Flash chips. By embedding a full-fledged Personal Data Server (PDS) in such devices, the user’s control of how her sensitive data is shared by others (by whom, for how long, according to which rule, for which purpose) can be fully reestablished and convincingly enforced (cf. the Trusted Cell vision paper [1]).

The PlugDB engine is a Personal Data Server capable of storing data (currently tuples and documents) in tables and BLOBs, indexing them and querying them in SQL (tables) and keyword queries (documents). PlugDB engine is embedded in specific secure devices called smart token (see figure here below) designed by SMIS research team and assembled by electronic SMEs. The personal DB is hosted encrypted in NAND Flash and the PlugDB engine code runs in the microcontroller.


Designing a storage and indexing engine for PDSs is challenging, because of the inherent hardware constraints of smart tokens. First, existing secure microcontrollers (SMCUs) have a tiny RAM (up to 64 KB today). Second, the NAND Flash stable store badly supports random writes. Third, the NAND Flash storage does not benefit from the SMCU’s tamper-resistance. These constraints lead to contradictory objectives: executing queries with acceptable performance with a tiny RAM entails indexing massively the embedded database in Flash, while index updates generate fine-grain random writes, and then unacceptable NAND Flash write costs and cryptographic overhead. We have already addressed these issues in the context of relational databases [2] and keyword search on documents [3].


The goal of this internship is to design and implement a faceted search storage, indexing and query engine [4] proposing similar functionalities to those of NoSQL systems (e.g., see the keyword and faceted search on Amazon). The solution can be inspired by the techniques proposed in [2] and [3], though the problem to be solved is slightly different. In faceted search, the data are documents (files) and queries may combine keywords and facet values to which the documents belong. Despite these differences, the proposed design should cope with the same aforementioned constraints (cryptographic issues could be ignored at least in a first step). Real implementation on the smart token is not required (simulation and extensive evaluation would be then mandatory) but would be a great result for the internship.


Required skills

C Programming, storage and indexing data structures, knowledge in cryptography is a plus.

Advisors: Luc Bouganim and Iulian Sandu Popa.


Localization:     PETRUS team (ex SMIS,                    localized at UVSQ, 45 avenue des Etats Unis – 78035 Versailles (



[1]   N. Anciaux, P. Bonnet, L. Bouganim, B. Nguyen, I. Sandu Popa, P. Pucheral, Trusted Cells : A Sea Change for Personnal Data Services, in : Proceedings of the 6th biennal Conference on Innovative Database Research (CIDR 2013), Asilomar, United States, January 2013.

[2]   N. Anciaux, L. Bouganim, P. Pucheral, Y. Guo, L. Le Folgoc, S. Yin, MILo-DB: a personal, secure and portable database machine, Distributed and Parallel Databases 32, 1, March 2014.

[3]   N. Anciaux, S. Lallali, I. Sandu Popa, P. Pucheral, A Scalable Search Engine for Mass Storage Smart Objects, in: Proceedings of the 41th International Conference on Very Large Databases (VLDB), Kohala Coast, Hawaii, United States, August 2015.

[4]   O. Ben-Yitzhak, N. Golbandi, N. Har’El, R. Lempel, A. Neumann, S. Ofek-Koifman, D. Sheinwald, E. J. Shekita, B. Sznajder, S. Yogev: Beyond basic faceted search. WSDM 2008: 33-44