Results (2013-2015)

The work of the associate team in 2013-2014 centered around three areas: energy efficiency, scientific workflow management, and data life-cycle management in clouds. More precisely, the following results have been achieved.

Energy efficiency

Performance and energy-efficiency evaluation of Hadoop

The exponential growth of scientific and business data has resulted in the evolution of the cloud computing and the MapReduce parallel programming model. Cloud computing emphasizes increased utilization and power savings through consolidation while MapReduce enables large scale data analysis. The Hadoop framework is the most popular open source software implementing the MapReduce model. In our work, we evaluated Hadoop performance in two modes – the traditional model of collocated data and compute services and separated mode where the services are deployed on separate services. The separation of data and compute services provides more flexibility in environments where data locality might not have a considerable impact such as virtualized environments and clusters with advanced networks. In this work, we also conducted an energy efficiency evaluation of Hadoop on physical and virtual clusters in different configurations. The experiments were performed on the Grid’5000 experimentation testbed. To enable virtual machine management, the Snooze cloud stack developed by the Myriads project-team was used. Our extensive evaluation shows that: (1) performance on physical clusters is significantly better than on virtual clusters; (2) performance degradation due to separation of the services depends on the data to compute ratio; (3) application completion progress correlates with the power consumption and power consumption is heavily application specific. This work was published at the IEEE BigData 2013 conference. Please see HAL for more details. In 2014, we submitted an extended version of our IEEE BigData paper to a journal. In 2014 we revised this paper entitled “Performance and Energy Efficiency of Big Data Applications
in Cloud Environments: A Hadoop Case Study” and submitted it to the Journal of Parallel and Distributed Computing (JPDC) journal. This paper is accepted for publication and will appear in 2015.

Energy consumption models and predictions for large-scale systems

Responsible, efficient and well-planned power consumption is becoming a necessity for monetary returns and scalability of computing infrastructures. While there is a variety of sources from which power data can be obtained, analyzing this data is an intrinsically hard task. In our work, we described a generic approach to analyze large power consumption datasets collected from computing infrastructures. As a first step, we proposed a data analysis pipeline that can handle the large-scale collection of energy consumption logs, apply sophisticated modeling to enable accurate prediction, and evaluate the efficiency of the analysis approach. We presented the analysis of a power consumption data set collected over a 6-month period from two clusters of the Grid’5000 experimentation platform used in production. We used Hadoop with Pig to handle the large volume of data. Our data processing generated a summary of the data that provides basic statistical aggregations, over different time scales. The aggregate data was then analyzed as a time series using sophisticated modeling methods with R statistical software. We exploited time series to detect outliers and derive hourly and daily power consumption predictive models. We demonstrated the accuracy of the predictive models and the efficiency of the data processing performed on a 55-node cluster at NERSC. Energy models from such large dataset can help in understanding the evolution of consumption patterns, predicting future energy trends, and providing basis for generalizing the energy models to similar large-scale systems. Please see HAL for more details.

Energy efficiency analysis of resource management systems

First, we investigated the energy consumption of the deployment and termination phases of virtual clusters in IaaS clouds. Second, we studied the energy consumption of an IaaS cloud deployment taking the provider standpoint. We performed our experiments with two IaaS cloud technologies: CloudStack and
OpenNebula. We have submitted a journal paper. While we also plan to conduct a similar study with Snooze IaaS cloud management system developed in Myriads team, it could not be done in 2014. Our efforts on Snooze system were devoted to the design and implementation of a checkpointing
service enabling Snooze to automatically recover applications executed in virtual clusters in the event of server failures. One of our next steps will be to evaluate the power consumption of the virtual cluster checkpointing and recovery in different application failure and migration scenarios.

Energy-efficient Cloud Elasticity for Data-driven Applications

Nowadays enormous amounts of energy are consumed by Cloud infrastructures and this trend is still growing. An existing solution to lower this consumption is to turn off as many servers as possible, but these solutions do not involve the user as a main lever to save energy. We proposed an approach that provides users with an easy way to participate in the reduction of the data centers energy consumption. The targeted users are scientists executing massive workflows into the Cloud. The key idea is to give them the choice between different execution modes impacting the size of the VMs used for executing applications. The execution modes vary from energy-efficiency to performance. In energy-efficient mode, the size of the VMs is smaller than normally expected and may cause a longer execution time of the application but offers more opportunities to increase the number of idle servers. Indeed, by favoring the consolidation of the VMs under the least of number of servers in moderated loaded data centers decreases the total energy consumption of the infrastructure hosting the VMs. We experimented our system on Grid5000 and we used the Montage workflow as a benchmark. Experimentation results show promising outcomes. In energy efficiency mode, the energy consumed can be significantly reduced to the cost of a low increase of the execution time [5, 6]. Our contributions are: 1) an easy-to-use interface to involve the user in saving energy, 2) an algorithm to select the VM size depending on the execution mode chosen by the users, 3) an algorithm for the VMs placement on the servers, and 4) an evaluation of the benefits of our approach through the implementation of a prototype experimented on a real platform with data-driven workflow applications.

Scientific workflow management

Chemical runtime support for TIGRES workflows

The TIGRES workflow language, developed at LBNL, is targeted at the high level specification of scientific workflow-based applications. One objective of the DALHIS project is to offer a runtime layer for this language. The HOCL runtime developed at INRIA is a promising approach for this purpose, as it provides a ground for a distributed and dynamic workflow engine. A software prototype was developed prior to the DALHIS project and validated through experiments with scientific workflow-based applications [1].

The first step towards the actual deployment of the TIGRES language on top of the HOCL runtime was to validate the precise part of the TIGRES’ grammar to be used in the targeted runtime system. This first objective was reached through the use of Xtext [2], an Eclipse-based tool easing the development of domain specific languages, in particular through the generation of specific development environments easing the validation and usage of the languages.

The second step is the automated generation of the targeted (HOCL) code starting from a TIGRES specification.

The Tigres and the HOCL runtime systems target different application domains (i.e., scientic workflows running in HPC systems versus web services work
ows) and operate under completely different environments. Their integration requires the implementation of a consistent interface that matches with
the Tigres data model. Our objective is to be able to take portions of Tigres workfows, translate them in order to be able to execute them in the HOCL-TS engine. In 2014 we focused on the design of the integrated system and early implementation. It consists of an interface at the template level (each Tigres template being sent to the HOCL environment as a distinct workflow). Also, a REST API used for workflow enactments on top of the HOCL engine has been designed. The integrated systems enables the execution of templates in a decentralized manner using HOCL’s runtime system. The design of
this integration has been described in a paper published the WORKS workshop, held in conjunction
with Supercomputing 2014.

Adaptiveness at the workflow level, meaning the online modification of a running workflow, has also been designed and prototyped into GinFlow. Upon the detection of some problem (ie, the failure of some task to deliver an appropriate result), and given the programmers recom- mendation about how to reshape the workflow in case such a problem arises, some alternate workflow is enabled on-the-fly. An article about this aspect is planned to be submitted to a top conference in the field by the end of 2015.

References

[1] A Chemistry-Inspired Workflow Management System for Scientific Applications in Clouds. Héctor Fernández, Cédric Tedeschi, and Thierry Priol. 7th IEEE International Conference on e-Science (e-Science 2011), Stockholm, Sweden, December 5-8, 2011.

[2] Xtext. http://www.eclipse.org/Xtext/ (retrieved September 2013).

Data life-cycle management in clouds

Infrastructure as a Service (IaaS) clouds provide a flexible environment where users can choose and control various aspects of the machines of interest. However, the flexibility of IaaS clouds presents unique challenges for storage and data management in these environments. Users use manual and/or ad-hoc methods to manage storage and data in these environments. FRIEDA is a Flexible Robust Intelligent Elastic Data Management framework that employs a range of data management strategies approaches in elastic environments. In the context of this collaboration, we have been able to evaluate the importance of this framework on multiple cloud testbeds. Our evaluation showed that storage planning needs to be performed in coordination with compute planning and the specific configuration of virtual machine had a strong impact on the application (e.g., some applications performed better on small instances than large instances).

The work in 2014 in data life-cycle management in clouds was focused on the extended design and evaluation of the FRIEDA data management system. FRIEDA was tested to work on Amazon EC2 resources. In addition, we layered a commandline utility atop FRIEDA that allows users to plug-in applications to run in FRIEDA. These tools have been adopted by the LBL-ATLAS group to run their experiments on Amazon.

Mobile Application for Reliable Collection of Field Data for Fluxnet

Critical to the interpretation of global Fluxnet carbon flux dataset is the ancillary information and measurements taken at the measurement tower sites (e.g. vegetation species, leaf area index, instrument calibrations, etc). The submission and update of this data using excel sheets is difficult and error prone. In 2015, the team developed some initial sketches of the User Interface design and A. Sinha, D. Agarwal, and C. Morin performed an initial usability feedback interview with Chris Flechard (INRA Rennes), a CarboEurope participant who collects carbon flux data at several sites in Brittany. M. Sandesh simultaneously performed a couple of usability interviews at Berkeley. We updated the design based on the combined feedback. Currently, the mobile application prototype development is in progress. The design was presented by Dario Papale at the ICOS meeting in September 2015. The expectation is that the design will be adopted by ICOS (European flux towers) and AmeriFlux (flux towers in the Americas). The target is to have a first working demonstration prototype by end of November 2015.