Focus on a joint research project: OAKSAD

OAKSAD (2013-2015)

Languages and techniques for efficient large-scale web data management

Principal Investigators: 

  • Dr. Ioana Manolescu, OAK project-team, Inria Saclay Île de France
  • Prof. Alin Deutsch, University of California San Diego

Research objectives:

Data with complex structure and semantics is increasingly created by users and applications and shared through a variety of means, mostly based on the Web. Popular Web formats are W3C’s XML for structured documents and RDF for Semantic Web data; the JSON model is also increasingly used as a flexible way of encoding and sharing data. Efficient data management algorithms and tools are needed to help users apprehend and exploit such large volumes and variety of data, based on: highlevel languages; optimization algorithms; and efficient evaluation primitives.

Scientific achievements:

Most significant results of OAKSAD concern so far: (i) efficient models, algorithms and techniques for highly efficient data stores and (ii) massively parallel algorithms for processing Web data, in particular Linked Open Data through SPARQL, and nested data through PigLatin.

Publications and Awards:

  • 2 Conference papers, 2 Technical reports.
  • Software: CliqueSquare, a platform for massively parallel RDF processing; PigReuse, an optimization library deployed within the Apache Pig project; Estocada (development ongoing), a platform for efficiently storing massive data on top of heterogeneous stores.

Selected publication:

Ioana Ileana, Bogdan Cautis, Alin Deutsch, Yannis Katsis: “Complete yet Practical Search for Minimal Query Reformulation under Constraints”, ACM Conference of the Special Interest Group in the Management of Data (SIGMOD), 2014.