Reliable and cost-efficient data placement and repair in P2P storage over immutable data

Coordinator: Shadi Ibrahim (MYRIADS)
Participants: Shadi Ibrahim (MYRIADS), Thomas Lambert (COAST), Frédéric Giroire (COATI), Claudia-Lavinia Ignat (COAST), Guillaume Pierre (MYRIADS), Stéphane Pérennes (COATI), Amine Ismail (hive)

Context

Recently, there is a growing trend toward highly distributed storage solutions by storing and sharing data across geo-distributed connected devices from the edge of the network to large scale data centres. An appealing solution — which we are exploring within the Inria-HIVE collaborative framework — is utilizing the available storage and compute resources of connected devices (mobile/desktops) across the world to form a P2P storage system that provides data storage and sharing in a cost-efficient way. However, this requires to deal with several issues including node failures, node availabilities (churns), how to guarantee data availability and avoid data loss, etc.

Erasure coding (EC) has been progressively used in storage systems to provide high data availability with relatively less storage and energy cost compared to replication. Unlike replication, where several copies (mostly 3 copies) of the same data are stored across several storage nodes (one copy per node), EC breaks data into fixed-size fragments which are encoded and stored across several storage nodes. The original data can then be recovered from a subset of the encoded fragments. For example, under the well-known Reed-Solomon code RS(n,k), a data block is split into n smaller blocks called data chunks, and then used to compute k parity chunks. Any n out of the n+k chunks are sufficient to rebuild the original data block.

In recent years, EC has been deployed in data analytic systems and in-memory storage systems on cached (hot) data. EC can be an ideal candidate for large scale peer-to-peer storage systems (exploits parallel read and write of data, involves large number of nodes in storing and repairing data). However, unlike previous efforts where EC is mainly used for achieved data in P2P system, performing EC on the critical path of data access (which is the case in this project) in large scale P2P storage system (to store hot and frequently accessed data) poses many research challenges on how to ensure high data availability and meet data and node dynamicity, and on how to provide cost-effective and heterogeneity-aware data repair.

Objective

The objective of this axis is how to provide cost-efficient yet reliable data management when deploying erasure codes (EC) in large scale trusted peer-to-peer cloud storage systems.

Comments are closed.