FDD3258 HT20-1 Introduction to High Performance Computing

FDD3258 HT20-1 Introduction to High Performance Computing

DD2356 Instructors – Who are we?

  • Dirk Pleiter - Course Responsible / Teacher / Examiner

What is this Course About?

  • Making effective use of the most powerful parallel computer systems for problems in science and engineering
  • Doing this requires paying attention to every part of the parallel system
  • It also requires a scientific and rigorous approach to performance

Intended Learning Outcomes

At the end of this course, you will be able to:

  • Describe and list HPC architecture design choices and features
  • Connect to the supercomputer, build and run codes on multicore systems and clusters
  • Use HPC tools to measure and analyze the performance of computer codes running on supercomputers
  • Design, implement and optimize a parallel engineering application for HPC systems using OpenMP and MPI
  • Discuss GPU architecture and its adavantages
  • Program Nvidia GPUs with CUDA


  • Knowledge of Linux (Supercomputers have Linux OS)
  • Good knowledge of C/C++ or Fortran
  • Knowledge of command line editors (emacs, vi, ...)
    • if you want to use IDEs is still OK but you need to move back and forth your code from your laptop to supercomputers

Course Organization: Lectures

Lectures will be only online

Four main modules:

  1. Performance Analysis & Engineering
  2. Shared Memory Programming with OpenMP
  3. Distributed Memory Programming with MPI
  4. GPU Architecture and GPU programming with CUDA
  • Slides, articles to read and codes will be available online in Canvas
  • The lectures will be online and available in Canvas

Course Organization: Assignments

There will be 4 assignments with P/F grade:

  1. Performance Analysis & Engineering
  2. Programming with OpenMP
  3. Programming with MPI
  4. GPU Architecture and Programming with CUDA

The assignments consist of:

  • Submission of a short document in pdf file using Canvas
  • Submission of GitHub repository with code for the assignment

Course Organization: Final Project = Exam

The submission of a final project report and code with grade A-F is required to pass the course..

The final project consists of 3 submissions: 1. Code Specification Document (P/F), 2. Report on Initial Prototype / Results (P/F), 3. Final Report and Code Submission (A-F)

The topic of the project:

  • Implement and optimize a parallel version of the 2D Heat Equation Solver on a Cartesian Grid with OpenMP and MPI
  • Complete performance and scaling analysis
  • Study the propagation of idle period propagation between MPI processes

Course Organization: Grading Criteria

To pass the course, the 3 assignments and the final project need to be completed


The final grade will be determined by:

  • The correctness of approach, design, implementation, and optimization
  • Quality of the final report in terms of clarity, organization, and readability
  • Critical analysis of performance and scaling results
  • Detailed grading score list project in terms of features, optimization and report quality will be provided later

Course Organization: Computer Resources

  • You will be given access to the PDC Tegner Cluster with an allocation
  • For the first two modules and assignment, you can use a Linux machine

Course Material and Textbooks

  • Lecture slides, suggested readings and additional material will be progressively posted in Canvas during the course.
  • Lectures and materials are adapted from Bill Gropp’s teaching material at UIUC
      • Designing and Building Applications for Extreme-Scale Systems

A textbook covering all the topics presented in this course is High-Performance Computing Modern Systems and Practices by Thomas Sterling, Matthew Anderson, and Maciej Brodowic.

On-line copy for KTH students available here (you need to log in with your KTH account).

A recent good book on OpenMP's advanced features is Using OpenMP - The Next Step by Ruud van der Pas, Eric Stotzer, and Christian Terboven.

Excellent books about MPI:

  • Using MPI: Portable Parallel Programming with the Message-Passing Interface, William Gropp, Ewing Lusk, and Anthony Skjellum, available to KTH students here
  • Using Advanced MPI: Modern Features of the Message- Passing Interface, William Gropp, Torsten Hoefler, and Rajeev Thakur, available to KTH students here

Books about GPUs programming are:

  • CUDA for Engineers by Duane Storti and Mete Yurtoglu


  • Programming Massively Parallel Processors by David Kirk and Wen-mei W. Hwu


Tentative Course Outline I

All assignments will be due on 2021-09-24.

  • HPC Architecture (1st  Assignment Due)
  • Shared Memory Programming with OpenMP (2nd Assignment Due)
  • Programming with MPI (3rd Assignment Due)
  • GPU Architecture and Programming GPUs (Final Assignment Due)

Course Summary:

Date Details Due