Workplan 2016

1- Distributed Infrastructure Support for Workflow and Data Management

We plan to carry on our joint work on workflow management and to initiate a new joint research activity on distributed data integration.

Energy-Efficient Data-intensive Workflow Execution: Scientific workflows are composed of many computational tasks, and the dependency among them is represented by task graphs. The resources used to run these workflows consume huge amounts of energy and needs to be reduced. Clustering of short tasks has been proved to be an efficient way of reducing the overall execution time. Workflow scheduling has also a non-negligible influence over quality of service and execution time. This work will revisit classical clustering and scheduling techniques for scientific workflows with energy efficiency as a primary target. Our algorithms will adopt a tradeoff between energy consumption and performance. This work will rely on GinFlow, a workflow executor developed in the Myriads team, and TIGRES, a template description language from DST.

Design of a Cloud Approach for Dataset Integration: Next-generation scientific discoveries are at the boundaries of datasets, e.g., across multiple science disciplines, institutions and spatial and temporal scales. Today, data integration processes and methods are largely ad- hoc or manual. A generalized resource infrastructure that integrates knowledge of the data and the processing tasks being performed by the user in the context of the data and resource lifecycle is needed. Clouds provide an important infrastructure platform that can be leveraged by including knowledge for distributed data integration and that will be the focus of this research area. In 2016, we plan to work on the system design.

2- Deep Partnerships with Scientific Collaborations

Partnership across scientific disciplines depends on development of common understanding of the problem and priorities. User research in the form of ethnography and usability studies provide techniques that help with understanding culture and practices and testing interfaces. In 2016 we will focus on further developing our partnerships with the SNFactory and Fluxnet.

Mobile Application for Reliable Collection of Field Data for Fluxnet: Critical to the interpretation of Fluxnet carbon flux data is the ancillary information and measurements taken at the tower sites. The submission and update of this data using excel sheets is difficult and error prone. In partnership with ICOS and INRA personnel, we (LBL and Inria, Myriads) are innovating the data submission and organization method through a web User Interface (UI). The UI will be responsive and able to run on desktop, mobile etc.; thus easing the data lookup and entry process from anywhere including the field sites. We plan to develop the application and work on the UI design.

SNFactory Pipeline: a User-centric Approach: The Supernova Factory (SNFactory) data processing pipeline was developed several years ago and requires extensive manual operation to run. In this collaboration, we (LBL and Inria, Avalon) will work with the cosmologists to revamp the pipeline to reimplement as needed and enable robust execution. We also plan to contribute to the design of the data processing pipeline for the LSST.