IO days – Oct 8/9, 2018

Preliminary Program

The core of the event (talks and discussions) will take place on the 8th. Attendees are welcome to stay on the 9th for more discussions.

There is no strict constraints on the talks, although as the goal of this meeting is collaboration, we’d like to keep the talks to 30 minutes or less, with rooms for discussions during the talks.

  • I/O tracing, performance analysis, study of I/O pattern etc; (Univ. of Warwick)
  • I/O software stack for HPC; (Univ. Carlos III of Madrid)
  • Two-Phase IO for bandwidth improvement (Inria Bordeaux, Labri)
  • Data Scheduling in NEXTGenIO (BSC)
  • Scheduling strategies to use burst-buffers; (Inria Bordeaux, Labri)
  • I/O scheduling strategies to deal with congestion at the I/O bandwidth level. (Inria Bordeaux, Labri)
  • I/O-based algorithms in their resilience solutions; (ENS Lyon)
  • Discussions about collaborations and project

The workshop will take place at Inria Bordeaux Sud-Ouest, 200 Avenue de la vieille tour, 33405 Talence. Specifically it will be in room Alan Turing 2 (3rd floor) and Alan Turing 1.

Lunch on Oct 8 will be at Carpe diem in Talence.
Dinner on Oct 8 will be at Au bon Jaja
Lunch on Oct 9 will be at Yamato.

List of attendees

If you are interested in attending, please contact me.

  • Guillaume Aupy, Inria Bordeaux (France)
  • Olivier Beaumont, Inria Bordeaux (France)
  • Jesus Carretero, Univ. Carlos III, Madrid (Spain)
  • Dean Chester, Univ. of Warwick (UK)
  • Lionel Eyraud-Dubois, Inria Bordeaux (France)
  • Emmanuel Jeannot, Inria Bordeaux (France)
  • Valentin Le Fèvre, ENS Lyon (France)
  • Raymond Nou, BSC (Spain)
  • Yves Robert, ENS Lyon (France)
  • David E. Singh, Univ. Carlos III, Madrid (Spain)
  • Nicolas Vidal, Inria Bordeaux (France)

Link to register for meals (deadline, Sept 15)

Details of talks:

I/O Performance Analysis with Proxy Applications, Dean Chester (Warwick)

Related works:

  1. Enabling portable I/O analysis of commercially sensitive HPC applications through workload replication

Experiences combining malleability and I/O control mechanisms, David Singh (UC3M) [slides]
In this talk I will introduce a common framework that integrates CLARISSE, a cross-layer runtime for the I/O software stack, and FlexMPI, a runtime that provides dynamic load balancing and malleability capabilities for MPI applications. This integration is performed both at application level, as libraries executed within the application, as well as at central-controller level, as external components that manage the
execution of different applications. We show that a cooperation between both runtimes provides benefits for improving both the application I/O and overall system performance.

Related works:

  1. CLARISSE: A Middleware for Data-Staging Coordination and Control on Large-Scale HPC Platforms (CCGrid’16)
  2. Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration (ParCo 2015)
  3. Clarisse Project
  4. Flexmpi

Recent advance in 2-phase I/O. — Emmanuel Jeannot [slides]

Discussions on the use of aggregate nodes to improve IO performance

Related works:

  1. TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers
  2. Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers
  3. Tapioca Software

Data Scheduling in NEXTGenIO, Ramon Nou (BSC) [slides]
In this presentation, we are going to explore the different solutions to improve data management within the NextGenIO project and a new hardware architecture that fits NVRAM inside computing nodes. On the one hand, the Data Scheduler, coordinated with SLURM, will control data transfer from/to nodes/PFS reducing interferences. On the other hand, we present two ephemeral filesystems (one local, echofs, and one distributed, GekkoFS) that will go up with the job and will use the NVRAM space available for the job. Those filesystems can work with the datascheduler to coordinate transfers and reduce stress on the PFS.

Related works:

  1. Data scheduler – asynchronous transfers coordinated with SLURM
  2. Echofs: A Scheduler-guided Temporary Filesystem to leverage Node-local NVMs (SBAC-PAD 2018)
  3. GekkoFS – A temporary distributed file system for HPC applications (Cluster 2018)

Modeling Burst-Buffers to reduce inter-application contention — Lionel Eyraud-Dubois [slides]

We will discuss different usage and configurations of burst-buffers to reduce IO contention between applications

Related works:

  1. What size should your Buffers to Disks be? (IPDPS’18)

IO Scheduling in Supercomputers — Guillaume Aupy [slides]

We will discuss the use of I/O schedulers for supercomputers.

Related works:

  1. Scheduling the I/O of HPC applications under congestion. (IPDPS’15)
  2. Periodic I/O scheduling for super-computers (PMBS’17)
  3. Dash Project

Meals and breaks will be covered in part by the French National Research Agency (ANR) in the frame of DASH (ANR-17-CE25-0004), and the “Investments for the future” Program IdEx Bordeaux – SysNum (ANR-10-IDEX-03-02).

Comments are closed.