Coordinator: Claudia-Lavinia Ignat (COAST)
Participants: Claudia-Lavinia Ignat (COAST), Shadi Ibrahim (MYRIADS), Gérald Oster (COAST), Guillaume Pierre (MYRIADS), Amine Ismail (hive)
Context
For availability and performance reasons data is replicated.
Several users have to be able to update concurrently the replicas of the same data without loosing their modifications. hive platform relies on IPFS and mutable data support is offered by means of the mutable file system API of IPFS. However, there is no support for merging concurrent changes.
Data replication raises also the question about replica placement. Depending on the nature and use of the data, placement is a critical issue. Many data-based application performances are improved when data is local (i.e. when data does not need to be moved for executing some tasks). Data locality can be approached in two ways: either considering the placement of data as input and then try to place tasks accordingly, or trying to place data to improve the future jobs based on this data. Replica placement has also its importance when considering data consistency. In highly distributed environments, maintaining data consistency has a cost which depends on the distance between replicas.
In order to ensure a certain degree of data redundancy as well as an increase in overall performance, data on hive platform is sharded. Sharding is a storage technique in which large data is broken up into smaller chunks that are stored on different machines. In order to ensure data confidentiality the chunks are encrypted.
Objective
In this axis we plan to propose a replication mechanism over sharded encrypted data that merges concurrent changes and that optimises the cost of this merging by a suitable replica placement