Return to Research

Workplan 2015

We have identifi ed three major work directions to be investigated as part of DALHIS associated team for 2015:

Scienti fic Workows
The development of the integrated workfow engine using HOCL and Tigres continues. We recently started to explore the expressiveness of HOCL to express self-adaptive behaviors.  Tigres will speci fically focus next year on failure recovery and fault tolerance API. Additionally, Tigres will investigate decentralized execution
and optimizations to enable executions on next-generation HPC systems that have deeper I/O-memory hierarchies. Our work plan is organized as follows.
Myriads

  •  Complete the implementation of the whole set of templates in HOCL-WMS (2 templates already done)
  •  Produce logs compliant with the Tigres format
  •  Study fault tolerance and recovery mechanisms on top of the HOCL-TS/Tigres integration
  •  Release HOCL-WMS in open source

LBNL

  •  Support failure recovery and repeated executions
  •  Support failure tolerance through the API
  •  Evaluate the need to include loops into Tigres library.

Joint work

  •  Develop and evaluate workfow examples such as Montage, MODIS, Light source workflows in the integrated system
  •  Validate a larger set of workfows expressed in Tigres
  •  Evaluate the system using large-scale experiments on HPC and cloud testbeds

Energy-efficient cloud elasticity for data-driven applications
Distributed and parallel systems o er to users tremendous computing capacities. They rely on distributed
computing resources linked by networks. They require algorithms and protocols to manage these resources
in a transparent way for users. Recently, the maturity of virtualization techniques has allowed for
the emergence of virtualized infrastructures (Clouds). These infrastructures provide resources to users
dynamically, and adapted to their needs. By benefi ting from economies of scale, Clouds can efficiently
manage and off er virtually unlimited numbers of resources, reducing the costs for users.
However, the rapid growth for Cloud demands leads to a preoccupying and uncontrolled increase of
their electric consumption. In this context, we will focus on data driven applications which require to
process large amounts of data. These applications have elastic needs in terms of computing resources
as their workload varies over time. While reducing energy consumption and improving performance are
orthogonal goals, this internship aims at studying possible trade-o ffs for energy-efficient data processing
without performance impact. As elasticity comes at a cost of recon gurations, these trade-off s will
consider the time and energy required by the infrastructure to dynamically adapt the resources to the
application needs.
The validations of the proposed algorithms may rely on the French experimental platform named
Grid’5000. This platform comprises about 8,000 cores geographically distributed in 10 sites linked with
a dedicated gigabit network. Some of these sites have wattmeters which provide the consumption of the
computing nodes in real-time. This validation step is essential as it will ensure that the selected criteria
are well observed: energy-efficiency, performance and elasticity.
Data Ecosystem
Scienti c now routinely generates large and complex datasets as the result of experiments, observations,
or simulations and the number of scientists analyzing these datasets are also growing. These datasets
and their relationships have become increasingly difficult to manage and analyze. We will continue our
work in identifying, developing and implementing the data ecosystem for scienti fic applications.
In addition to the above activities, we will continue our work on FRIEDA to explore elasticity of
data and its impact on execution. Elasticity and auto-scaling of compute resources has been explored
before. However, its interaction with storage and data management is not well understood. There are a
number of open issues in this space. First, it is unclear how growing or shrinking data volumes should
be managed in virtualized environments. Also, when a compute resource is brought up or removed from
the pool, how should the data on these volumes be managed. In the context of FRIEDA, we will design,
implement and evaluate data management strategies to manage elasticity. This work will be evaluated
on Amazon, Grid 5000 and other cloud testbeds.
The DALHIS associated team has been in deep discussions about the appropriate data ecosystem
and its components suitable for scienti c applications. In the coming year, we will be writing the paper
outlining what a data ecosystem accounts for.