Home

Damaris is a middleware for asynchronous I/O and data management targeting large-scale, MPI-based HPC simulations. It initially proposed to dedicate cores for asynchronous I/O in multicore nodes of recent HPC platforms, with an emphasis on ease of integration in existing simulations, efficient resource usage (with the use of shared memory) and simplicity of extension through plug-ins.
Over the years, Damaris has evolved into a more elaborate system, providing the possibility to use dedicated cores or dedicated nodes to carry out in situ data processing and visualization. It proposes a seamless connection to the VisIt visualization framework  to enable in situ visualization with minimum impact on run time. Damaris provides an extremely simple API and can be easily integrated into the existing large-scale simulations.

Short History

Damaris was at the core of the PhD thesis of Matthieu Dorier, who received an Accessit to the Gilles Kahn Ph.D. Thesis Award of the SIF and the Academy of Science in 2015. Developed in the framework of our collaboration with the JLESC – Joint Laboratory for Extreme-Scale Computing, Damaris was the first software resulted from this joint lab validated in 2011 for integration to the Blue Waters supercomputer project. It scaled up to 16,000 cores on Oak Ridge’s leadership supercomputer Titan (first in the Top500 supercomputer list in 2013) before being validated on other top supercomputers. Active development is currently continuing within the KerData team at Inria, where it is at the center of several collaborations with industry as well as with national and international academic partners.

Why Damaris?

Most HPC simulations work through a series of iterations, generating a large dataset on each iteration, as a result:

  • They trigger a heavy I/O burst at each iteration, that leads to inefficient I/O management and unpredictable variability in performance and execution time, also known as jitter.
  • In the usual approach, the datasets are shipped to some auxiliary post-processing platforms for analysis and visualization.
  • The mentioned data transfer is very costly, and no output is available until the end of the post-processing analysis and visualization phase.

Solution

As a solution to the mentioned problem, Damaris, that is a middleware for data management targeting large-scale, HPC simulations is designed. HPC simulations can benefit from Damaris by:

  • “In situ” data analysis and visualization by some dedicated cores/nodes of the simulation platform, in parallel with the computation
  • Asynchronous and fast data transfer from HPC simulation applications to Damaris using Damaris APIs
  • Semantic-aware simulation dataset processing by extending Damaris through plug-ins

Benefits

Any HPC simulation can benefit from Damaris for its I/O optimization:

  • Data analysis and visualization during the simulation, without any need for external data post-processing
  • Effective usage of processing cores, by overlapping data processing and I/Os with computation
  • No need to transfer huge simulation datasets to any auxiliary post-processing platform, but only processed results
  • Easy integration with existing simulation applications through a simple API
  • Integration with existing data analysis and visualization tools through plug-ins

Use Cases

Those simulation applications that model complex structures, dynamics, phenomena or behaviors, in order to predict their specific concerns to the highest possible degree of precision, can be considered as a beneficiary of Damaris.

  • Computer Aided Engineering
  • Geophysics and Oil Applications
  • Weather Prediction and Tornado Simulation
  • Numerical Analysis
  • Aerospace Studies,
  • Chemical and Pharmaceutical Studies,
  • Energy Research,
  • Computational Fluid Dynamics

Technologies

The following technologies has been adopted for development, benchmarking and validation of Damaris:

  • Development Technologies: C++, MPI, Fortran (around 27,000 LOC)
  • Supported platforms: From commodity clusters to supercomputers
  • Extendability: Through plug-ins (C++, Fortran, Shell scripts, Python)
  • Interface: Simple API in C++ and Fortran
  • Validated on: Top500-class supercomputers (Titan, Jaguar, Kraken), IBM Blue Gene platforms, Cray Blue Waters, French Grid5000
  • Simulation codes: Tornadoes (CM1), Ocean-Land-Atmosphere (OLAM), Navier-Stokes equations (Nek5000)
  • Visualization toolkits: VisIt, ParaView

Comments are closed.