Workplan 2015

We have identified three major work directions to be investigated as part of DALHIS associated team for 2015:

Scientific Workows
The development of the integrated workfow engine using HOCL and Tigres continues. We recently started to explore the expressiveness of HOCL to express self-adaptive behaviors. Tigres will specifically focus next year on failure recovery and fault tolerance API. Additionally, Tigres will investigate decentralized execution
and optimizations to enable executions on next-generation HPC systems that have deeper I/O-memory hierarchies. Our work plan is organized as follows.
Myriads

Complete the implementation of the whole set of templates in HOCL-WMS (2 templates already done)

Produce logs compliant with the Tigres format

Study fault tolerance and recovery mechanisms on top of the HOCL-TS/Tigres integration

Release HOCL-WMS in open source

LBNL

Support failure recovery and repeated executions

Support failure tolerance through the API

Evaluate the need to include loops into Tigres library.

Joint work

Develop and evaluate workfow examples such as Montage, MODIS, Light source workflows in the integrated system

Validate a larger set of workfows expressed in Tigres

Evaluate the system using large-scale experiments on HPC and cloud testbeds

Energy-efficient cloud elasticity for data-driven applications
Distributed and parallel systems oer to users tremendous computing capacities. They rely on distributed
computing resources linked by networks. They require algorithms and protocols to manage these resources
in a transparent way for users. Recently, the maturity of virtualization techniques has allowed for
the emergence of virtualized infrastructures (Clouds). These infrastructures provide resources to users
dynamically, and adapted to their needs. By benefiting from economies of scale, Clouds can efficiently
manage and offer virtually unlimited numbers of resources, reducing the costs for users.
However, the rapid growth for Cloud demands leads to a preoccupying and uncontrolled increase of
their electric consumption. In this context, we will focus on data driven applications which require to
process large amounts of data. These applications have elastic needs in terms of computing resources
as their workload varies over time. While reducing energy consumption and improving performance are
orthogonal goals, this internship aims at studying possible trade-offs for energy-efficient data processing
without performance impact. As elasticity comes at a cost of recongurations, these trade-offs will
consider the time and energy required by the infrastructure to dynamically adapt the resources to the
application needs.
The validations of the proposed algorithms may rely on the French experimental platform named
Grid’5000. This platform comprises about 8,000 cores geographically distributed in 10 sites linked with
a dedicated gigabit network. Some of these sites have wattmeters which provide the consumption of the
computing nodes in real-time. This validation step is essential as it will ensure that the selected criteria
are well observed: energy-efficiency, performance and elasticity.
Data Ecosystem
Scientic now routinely generates large and complex datasets as the result of experiments, observations,
or simulations and the number of scientists analyzing these datasets are also growing. These datasets
and their relationships have become increasingly difficult to manage and analyze. We will continue our
work in identifying, developing and implementing the data ecosystem for scientific applications.
In addition to the above activities, we will continue our work on FRIEDA to explore elasticity of
data and its impact on execution. Elasticity and auto-scaling of compute resources has been explored
before. However, its interaction with storage and data management is not well understood. There are a
number of open issues in this space. First, it is unclear how growing or shrinking data volumes should
be managed in virtualized environments. Also, when a compute resource is brought up or removed from
the pool, how should the data on these volumes be managed. In the context of FRIEDA, we will design,
implement and evaluate data management strategies to manage elasticity. This work will be evaluated
on Amazon, Grid 5000 and other cloud testbeds.
The DALHIS associated team has been in deep discussions about the appropriate data ecosystem
and its components suitable for scientic applications. In the coming year, we will be writing the paper
outlining what a data ecosystem accounts for.