Results

Long Presentation

Short Presentation

Descent Context

“DeSceNt aims to ease the writing of distributed programs on a federation of plug computers.” 
Plug computers are a generation of low-cost computers, such as Raspberry pi (25$), HummingBoard, Banana Pi, beagleBone, MinnowBoard Max, Odriod-U3, Udoo, Radxa Rock, Low-cost smartphones. They offer a cheap and readily available infrastructure to deploy domestic on-line software.
A federation  interconnects a large number of socially organized, autonomous and heterogeneous participants that agree on a minimal set of services. The concept of federation already exist in various domains with federated database, federated social networks, networks, federation of cloud
Why a federation of plugs? 
A federation of Plug computers open the opportunity for everyone to create cheap nano-clusters of domestic servers, host data and services and federate these resources with their friends, colleagues, and families based on social links. This project’s vision is that plug computers can form the core component of a social-based infrastructure organized around the concept of federation.
Compared to exisitng services, a federation of plugs allow users to better keep control over their data. Instead of spreading informations across multiple isolated providers and waste time to synchronize them, data can be defragmented on personal plugs and available for global search. This vision is clearly promoted by personal information systems, with companies and organizations selling home servers such as YunoHost, Amahi, ArkOS, OwnCloud, CozyCloud, Tonido, Freedombox
Use-cases
A Federation of plugs should be able to deploy services like Google drive, with google docs, google slides,Dropbox, distributed social networks (Diaspora,mediagoblin), distributed recommendation system (whatsup), federated store.

What we need and Organization

To be able to build a federation of plugs, we need several basic components:

  • Reliable communications between plugs:
    • publish-subscribe, multicast are basic communication primitives for programming on federated infrastructure.
    • On the context of plugs, we should be able to foster and delineate small scale sub-topologies (e.g. a node and its 2-hop neighbors) and optimize communication costs.
  • Federated data structures and consistency criteria to write programs
    • Distributed infrastuctures have their data structures, it can be DHT for P2P network, or NoSql stores for cloud computing. Federations needs their own data structures.
    • Consistency in distributed systems in rules by the CAP theorem (Consistency, Availability and Partition Tolerance). our constraints are pushing for keeping Availability and Partition Tolerance.
    • If strong consistency has to be sacrificed, what kind of consistency we can reach? On which data structures?
  • We need security enforcement and usage control of this social infrastructure
    • Switching from centralized architecture to distributed are not solving privacy issues, it transforms them. In some way, we switch from one big brother to thousands of potential attackers. How monitoring federations and at least detect milicious?
    • Basic motivations for federations is control on data usage. So we should be able to know how data on a plug is really used ?

These needs motivated Descent organization the following tasks:

  • Task1: Federated Social Infrastructure. We want provide to federated program developer a communication component for handling reliable communication on a federation of plugs. Such component should be able to foster and delineate small scale sub-topologies (e.g. a node and its 2-hop neighbors), that demonstrate a high level of locality, either in the data space (similar users, similar changes, similar interests), on the network plane (close to each other), or geographically (close geographically).
  • Task2: Quasi-causality and quasi-CRDT. Given a federated social infrastructure produced by task 1, the objective is to provide probabilistic causal delivery and probabilistic Conflict-free Replicated Data Types (CRDT) structure such as sequences, setsand graphs.
  • Task3: Non-monotonic disorderly programming. Given quasi-CRDT data structures produced by task 2, the objective is to deliver a language able to compose Quasi-CRDTand verify properties such as confluence. We aim to integrate Quasi-CRDT into dedicated languages such BloomL and go beyond monotonicity.
  • Task4: Securing federation of plugs. Given a federated social infrastructure produced by task 1, the objective is to secure the federation of plugs by monitoring divergence evolution of streams on each node.
  • Task5: Usage Control in federation of plugs. Given a federated social infrastructure produced by task 1, the objective is to attach usage control policies to each data retrieved from the federation and to ensure usage policies at plug level.

Task 1: Federated Social Infrastructure

“We want provide to federated program developer a communication component for handling reliable communication on a federation of plugs…”
The goal of Task 1 is to construct overlay topologies that connect plug computers so as to support federation of plug computer for personal cloud service. We detail 2 selected results.

Peer-sampling protocols constitute a fundamental mechanism for a number of large-scale distributed applications. The recent introduction of WebRTC facilitated the deployment of decentralized applications over a network of browsers. However, deploying existing peer-sampling protocols on top of WebRTC raises issues about their lack of adaptiveness to sudden bursts of popularity over a network that does not manage addressing or routing. Spray is a novel random peer-sampling protocol that dynamically, quickly, and efficiently self-adapts to the network size. Our experiments show the flexibility of Spray and highlight its efficiency improvements at the cost of small overhead. We embedded Spray in a real-time decentralized editor running in browsers and ran experiments involving up to 600 communicating web browsers. The results demonstrate that Spray significantly reduces the network traffic according to the number of participants and saves bandwidth.

  • Polystyrene: The Decentralized Data Shape that Never Dies

One key building block is the construction of emerging topologies created using decentralized topology construction protocols. Such decentralized topology construction protocols organize nodes along a predefined topology (e.g. a torus, ring, or hypercube) and can in many contexts ranging from routing and storage systems, to publish-subscribe and event dissemination. These protocols however typically assume no correlation between the physical location of nodes and their positions in the topology, and, as a result, do not handle catastrophic failures well, in which a whole region of the topology disappears. When this occurs, the overall shape of the system typically gets lost. This is highly problematic in applications such as the ones considered in the Descent project, in which overlay nodes are used to map a virtual data space, be it for routing, indexing or storage. In this work, we proposed a novel decentralized approach that can recover from catastrophic correlated failures and reform the system’s original topology when this happens even if a large (consecutive) portion of the topology fails. Our approach relies on the dynamic decoupling between physical nodes and virtual ones enabling a fast reshaping. Our results show that a 51,200-node torus converges back to a full torus in only 10 rounds after 50% of the nodes have crashed. Our protocol is both simple and flexible and provides a novel form of collective survivability that goes beyond the current state of the art.

More precisely, the shape-preserving decentralized protocol we propose (called Polystyrene), comes in the form of an add-on layer that can be plugged into any decentralized topology construction algorithm. The simple intuition behind our work consists in decoupling the positions of the nodes in the topology from the nodes themselves. As in existing epidemic topology construction protocols (e.g. T-Man, Vicinity), each Polystyrene node starts with one position. However, contrary to traditional topology construction systems, we allow Polystyrene nodes to change their positions (i.e. migrate) when nodes fail, and to redistribute themselves around the target shape. As a result, the original shape is maintained, albeit at a lower sampling density, resulting from the lower number of surviving nodes.

We enable this migratory behavior by storing two sets of data points per node: a first set of guest data points hold the points the node is in charge of, either as initial assignment or as a result of failures. We say that the node is a primary holder of these guest data points. A second set of ghost data points contain copies of data held elsewhere in the network. When Polystyrene starts, there are no ghosts, and only one guest data point per node: the node’s original position. At any given time, guest data points are used to derive a node’s actual position, which is then fed to the underlying topology construction protocol. We use a simple projection mechanism, but this is an independent piece of our protocol that can be easily adapted to more complex situation.

Decoupling nodes from data points allows us to implement the migration we need to redistribute nodes around the target shape when catastrophic failures occur. Nodes migrate by periodically exchanging guest data points in order to reach a density-aware tessellation of the data space, i.e. a partition of data points across physical nodes that seeks to maximize locality. After each exchange, each node recomputes the position it provides to the underlying topology construction algorithm, moving in effect around the shape. In other words, nodes migrate by following the migration of their data points.

The result is a decentralized collectively resilient overlay that unlike existing algorithms can maintain its systemic topological properties even in the face of extreme failures. The resulting protocol is further highly scalable, showing a logarithmic convergence time in the size of the system: for instance, a 51,200-node torus converges back to a full torus in only 10 rounds after 50% of the nodes have crashed.

  • Task 1 Results:
    • Nédelec, B., Tanke, J., Frey, D., Molli, P., & Mostéfaoui, A. (2017). An adaptive peer-sampling protocol for building networks of browsers. World Wide Web Journal, 1-33.
    • Damien Imbs, Achour Mostéfaoui, Matthieu Perrin, Michel Raynal: Set-Constrained Delivery Broadcast: Definition, Abstraction Power, and Computability Limits. ICDCN 2018: 7:1-7:10
    • Scalable Anti-KNN: Decentralized Computation of k-Furthest-Neighbor Graphs with HyFN Simon Bouget, Yérom-David Bromberg, François Taïani, and Anthony Ventresque, 17th IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems (DAIS 2017), Neuchâtel, Switzerland, June, pp. 101-114, 2017, doi:10.1007/978-3-319-59665-5_7
    • Mignon: a Fast Decentralized Content Consumption Estimation in Large-Scale Distributed Systems Stéphane Delbruel, Davide Frey, and François Taïani, 16th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS 2016), Heraklion, Greece, June, Lecture Notes in Computer Science, volume 9687, pp. 32-46, Springer, 2016 (14p.), doi:10.1007/978-3-319-39577-7_3
    • Filament: A Cohort Construction Service for Decentralized Collaborative Editing Platforms Ariyattu C. Resmi, and François Taïani, 17th IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems (DAIS 2017), Neuchâtel, Switzerland, June, pp. 146-160, 2017, doi:10.1007/978-3-319-59665-5_11.

Task 2: Quasi-causality and quasi-CRDT

“Given a federated social infrastructure produced by task 1, the objective is to provide probabilistic causal delivery and probabilistic Conflict-free Replicated Data Types (CRDT) structure such as sequences, sets and graphs.”
Selected results in this task include:

 

Major improvement in Sequence Replicated Data types:

Google Docs made distributed collaborative editors popular. There was more 190M of google drive users in 2015. If such service is really usefull, it has some issues about privacy, economic intelligence and service limitation. Currently, Google is limited to 50 users in real-time editing. In this paper, we aim to build an editor that can scale to millions of simultaneous users. It raises the issue of concurrently editing in real-time a sequence of elements. Real-time constraints force to replicate the shared sequence of characters on each participant. The main problem is to maintain consistency of the massively replicated sequence of element.

     Distributed collaborators follow the optimistic replication approach. Given a high number of different participants, each participant has a copy of the sequence of char.

  • Each Site freely update his copy locally, with no lock, no communication with any other site.
  • Then, it broadcast operations to others. Thank to task 1, we make the hypothesis that all operation eventually arrive.
  • Each site integrates remote operations.

The system is correct if it ensures eventual consistency (every site see the same values when system is idle) and preserves intentions (effects observed at generation time are preserved at integration time i.e. if ‘x’ inserted between ‘a’ and ‘b’, then it will be integrated between ‘a’ and ‘b’ whatever concurrent operations). Consistency can be formally defined thanks to Task 3.

Conflict-free replicated data types can be used to build collaborative editors. The general principle is to attach to every element, a unique identifier encoding the position of the element in the sequence as line identified in a BASIC program. The complexities of the sequence type is mainly decided by the function that compute these identifiers.

Imagine you want to type “QWERTY” and have internally of sequence of 32 slots. Each time a character is typed, you have to choose an index to store the character. If user start typing ‘Q’, then a simple function can allow index 1 for ‘Q’, ‘8’ for ‘W’ up to 21 for ‘Y’. But imagine, you started typing ‘Y’ and the function allocated index 1 for ‘Y’, then ‘T’ has to be inserted before, but there is no room. A naive function can just create a new sequence of 32 slots to expend the sequence at this position. But ‘T’ will take the index 1 in this new sequence and ‘R’ is coming. This allocation function will produce identifier with a complexity linear to the size of the sequence. Consequently, the sequence CRDT built on this will not scale in space.

The problem is to craft a function that will create small identifiers whatever the order of creation of the sequence.

The function LSEQ proposed in the paper rely on exponential tree and random allocation. Each time a new allocation space has to be created, the size of this new space is doubled and the strategy of allocation in this space (incremental or decremental) is chosen randomly. We demonstrate that this function allocates identifiers of (log(n))2 in the average case. Consequently, an editor built with LSEQ do not require costly procedures to rebalance identifiers.

Based on this contribution, we plan to demonstrate how it is possible for 1M of people to edit in real-time a shared document. The prototype CRATE allow to test LSEQ in an editor, CRATE relies on the SPRAY gossip algorithm produced by task 1. You can try online.

Major results in causality tracking

  • Probabilistic tracking :
  • New way to ensure deterministic causality: Full causality without communicating vector clocks.
    • Brice Nédelec, Pascal Molli and Achour Mostéfaoui. Breaking the Scalability Barrier of Causal Broadcast for Large and Dynamic Systems – Submitted to ICDCS2018.
    • Many distributed protocols and applications rely on causal broadcast to ensure consistency criteria. However, none of causality tracking state-of-the-art approaches scale in large and dynamic systems. This paper presents a new causal broadcast protocol. The proposed protocol outperforms state-of- the-art in size of messages, execution time complexity, and local space complexity. Most importantly, messages piggyback control information the size of which is constant. We prove that for both static and dynamic systems. Consequently, large and dynamic systems can finally afford causal broadcast.

Task 3 results: Disorderly Programming

“Given quasi-CRDT data structures produced by task 2, the objective is to deliver a language able to compose Quasi-CRDT and verify properties such as confluence. We aim to integrate Quasi-CRDT into dedicated languages such BloomL and go beyond monotonicity.”

Major results for this task includes a new formal framework able to weak consistency criteria. We expressed and compare all existing criteria and proposed new ones.

Reliability of large scale systems is a big challenge when building massive distributed applications over the Internet. At this scale, data replication is essential in order to ensure availability and convergence. Formal specifications of shared objects are a requirement for any attempt of verification. This problem is made harder by the CAP theorem (see Slide 2): it is impossible to provide the three following properties
together for a shared object:
  • strong consistency: the object behaves as if there was a unique physical instance of it in the network,
  • availability: the methods of the object always return when they are called,
  • partition tolerance: separate parts of the network may be unable to communicate with each other for an unbounded amount of time.
As partition tolerance and availability are required in decentralized federations, dealing with weak consistency is not an option. But if shared objects do not behave as their sequential counterparts, tools developed for sequential specifications cannot be used unchanged in large scale distributed systems.
Our general approach is to split concurrent specifications into two complementary aspects:
  • an abstract data type defined by a transition system and which produces a sequential specification
  • a consistency criterion that transforms distributed histories (i.e. partially ordered sets of events) into sequential histories in order to match them with the sequential specification.
One advantage to do so is that sequential specifications are easy to understand and to produce, they already are used in oriented objects programming languages, and they are at the base of most verification tools. Consistency criteria are relatively few and independant of the application: they can be explained and understood once for all.
Sequential consistency is an example of strong consistency criteria. All the events of a history admitted by a sequentially consistent object can be totally ordered with an ordering that respects the partial  ordering of the distributed history (lin(H)) as well as the sequential specification of the object (L(T)).
Eventual consistency  is much weaker: it only requires that, if all the participants stop updating, they will eventually converge to a common state. What it does not specify is to which states the object can converge. Because the definition of eventual consistency hardly mentions the sequential specification, fully specifying eventually consistent shared objects requires additional counterintuitive and error-prone concurrent specification techniques.
We propose a new consistency criterion, *update consistency*  that strengthens eventual consistency by imposing that the common state can be obtained by a linearization of the history. In other words, if we remove the reads made before convergence, the history is sequentially consistent. We also proved that any abstract data type has an update consistent implementation.

Task 4: Securing & Usage Control in Federation of Plugs

“Given a federated social infrastructure produced by task 1, the objective is to secure the federation of plugs by monitoring divergence evolution of streams on each node.”

Major results of task 4 concerns what metric can be computed on stream with limited memory. Many metrics can be applied to detect attacks. Major publications are:

Performance of many complex  monitoring applications, including Internet monitoring applications, data mining, sensors networks, network intrusion/anomalies detection applications,   depend  on the detection of correlated events. For instance, detecting correlated network anomalies should drastically reduce the number of false positive or negative alerts that networks operators have to currently face when using network management tools such as SNMP or NetFlow. Indeed, to cope with the complexity and the amount of raw data, current network management tools analyze their input streams in isolation.   Diagnosing flooding attacks through the detection of correlated flows should improve intrusions detection tools.

The point is that, in all these monitoring applications,  data streams arrive at nodes in a very high rate and  may contain  up to several billions of data items per day. Thus computing statistics with traditional methods is unpractical due to constraints on both available processing capacity, and memory. The problem tackled in this paper is the on-line  estimation of   data streams correlation. More precisely, we propose a distributed algorithm that approximates with guaranteed error bounds  in a single pass the linear relation between massive   distributed sequences of data.

Two main approaches exist to monitor in real time massive data streams. The first one consists in regularly sampling the input streams so that only a limited amount of data items is locally kept. This allows to    exactly compute  functions on these samples. However, accuracy of this computation with respect to the stream in its entirety fully depends on the volume of data items that has been sampled and their order in the stream. Furthermore, an adversary may  easily take advantage of the sampling policy to hide its attacks among data items that are not sampled, or in a way that prevents its “malicious” data items  from being  correlated.
In contrast, the streaming approach consists in scanning each piece of data of the input stream on the fly, and in locally  keeping only compact synopses or sketches that contain the most important information about these data. This approach enables to derive  some data streams statistics with guaranteed error  bounds without making any assumptions on the order in which data items are received at nodes. Most of the research done so far with this approach has focused on computing functions or statistics measures with very small error using sublinear space in the item domain size.

On the other hand, very few works have tackled the distributed streaming model, also called the functional monitoring problem, which combines features of both the streaming model and communication complexity models. As in the streaming model, the input data is read on the fly, and processed with a minimum workspace and time.
In the communication complexity model, each node receives an input data stream, performs some local computation, and communicates  only with a coordinator who wishes to continuously compute or estimate a given function of the union of all the input streams. The challenging issue in this model is for the coordinator to compute the given function by minimizing the number of communicated bits.

In this task, we go a step further by studying the dispersion matrix of distributed streams.
Specifically, we have proposed a novel metric that allows to approximate in real time  the correlation between distributed and massive streams. This metric, called the sketch* metric, allows us to quantify how observed data items  change together, and in which proportion. We are convinced that such a network-wide traffic monitoring tool should allow monitoring applications  to get significant information on the traffic behavior changes to subsequently inform more detailed detection tools on where DDoS attacks are currently active.

Task 5 : Usage Control

“Given a federated social infrastructure produced by task 1, the objective is to attach usage control policies to each data retrieved from the federation and to ensure usage policies at plug level.”

Major results concern how it is possible to recompose queries from traces in logs. This allow to understand how distributed data are used.

 

Following the principles of Linked Data (LD), data providers are producing thousands of interlinked datasets in multiple domains including life science, government, social networking, media and publications. Federated query engines allow data consumers to query several datasets through a federation of SPARQL endpoints. However, data providers just receive subqueries resulting from the decomposition of the original federated query. Consequently, they do not know how their data are crossed with other datasets of the federation. In this paper, we propose FETA, a Federated quEry TrAcking system for LD. We consider that data providers collaborate by sharing their query logs. Then, from a fed-erated log, FETA infers Basic Graph Patterns (BGPs) containing joined triple patterns, executed among endpoints. We experimented FETA with logs produced by FedBench queries executed with Anapsid and FedX federated query engines. Experiments show that FETA is able to infer BGPs of joined triple patterns with a good precision and recall.

Conclusion & Perspectives

Descent achieved important results:

  1. Task 1 on federated social infrastructure demonstrated how it possible to establish reliable communications adapting to the number of participants
  2. Task 2 on quasi-CRDT presented how it is possible to build CRDT that fully scales, breaking the barrier of causality.
  3. Task 3  reformalized all weak consistency criteria, making them comparable and proposed new ones.
  4. Task 4 proposed stream based approaches to detect malicious behaviors in a federation. The different contributions are compatible with limited memory and low CPU capacities of plugs computers.
  5. Task 5 proposed an original approach, able to recompose distributed queries and then to detect how scattered data are used.

As perspectives:

  • These works are continuing through the project “Web of Browsers” in order to disseminate results for local companies (APIZEE).
  • Works on usage control are continuing through CIFRE contract with company OpenDataSoft.
  • Works with browsers are continuing within the project ANR “O’Browser“.

Comments are closed.