rse_hpc – TEACHING RSE AT THE UNIVERSITY LEVEL

Authors

Affiliations

Gesellschaft für Informatik

deRSE

Julian Dehne

Gesellschaft für Informatik

deRSE

Florian Goth

Jan Phillip Thiele

Jan Linxweiler

Anna-Lena Lamprecht

Maja Toebs

High‑Performance Computing for RSE

RSEs with expertise in HPC and other performance-critical computing domains specialize in optimizing code for efficient execution across various platforms, including clusters, cloud, edge, and embedded systems. They understand parallel programming models, hardware-specific optimizations, profiling tools, and platform constraints such as memory, energy, and latency. Their skills enable them to adapt software to diverse infrastructures, manage complex dependencies, and support researchers in accessing and using advanced computing resources effectively and sustainably.

Module Overview

Building on “Basic Scientific Computing”, this module dives deeper into scalable algorithms, architectures, and software engineering techniques required to run large‑scale simulations and data analysis on high‑performance computing (HPC) systems.

Intended Learning Outcomes

Participants who successfully complete the module will be able to

Classify scientific problems by their dominant parallel pattern (memory‑parallel, compute‑parallel, task‑parallel).
Map each class to appropriate numerical libraries and hardware architectures.
Analyse floating‑point and algorithmic approximation errors at scale.
Explain modern HPC hardware features (GPUs, SIMD/AVX, NUMA, high‑speed interconnects) and select relevant optimisation strategies.
Design portable, performance‑portable code employing MPI, OpenMP, and accelerator frameworks.
Use continuous benchmarking to guide sustainable performance evolution of research software.
Plan for long‑term maintenance, archival, and FAIR publication of large‑scale codes and data.

Syllabus (Indicative Content)

Week	Theme	Topics
1	Parallel Problem Taxonomy	Sparse vs. dense linear algebra · embarrassingly parallel workloads
2	Distributed Memory (MPI)	Domain decomposition · halo exchange · scalability metrics
3	Shared Memory & SIMD	OpenMP · threading pitfalls · AVX intrinsics
4	Accelerator Programming	Multi‑GPU kernels · unified memory · portability layers (Kokkos, SYCL)
5	Advanced I/O & Checkpointing	Parallel file systems · burst buffers · HDF5/ADIOS‑based workflows
6	Performance Engineering	Roofline model · continuous & comparative benchmarking · autotuning
7	Sustainable HPC Software	Release engineering · long‑term archiving · community governance

Teaching & Learning Methods

Blended delivery: interactive lectures (40%), coding workshops on the national cluster (50%), expert seminars (10%).

Assessment

Component	Weight	Details
Cluster labs	30%	Submission of working MPI/OpenMP/GPU exercises
Performance study	30%	Roofline + scaling analysis of an existing code
Capstone project	40%	Implement & optimise a solver or ML pipeline at scale, plus written report

Prerequisites

Completion of - Scientific Computing or equivalent experience
Familiarity with Linux command line and version control

High‑Performance Computing for RSE

Module Overview

Intended Learning Outcomes

Syllabus (Indicative Content)

Teaching & Learning Methods

Assessment

Prerequisites

Sources & Implementations:

Curricula

Courses

Recommended Course Literature

Programs