In the first assignment of the course, we would like to challenge your knowledge about GPU architecture. The goal is to make sure that you understand some of the main concepts that motivate the use of graphic cards today, including why these devices are massively parallel, or the differences in the memory hierarchy.
To submit your assignment, please, prepare and upload a PDF document that answers all the questions asked below. You must name the file following this format:
The assignment is individual, but students are encouraged to discuss these topics with the rest of the classmates and also join the discussion on Canvas. The idea is that we all have fun together and learn the most about programming GPUs!
Questions about GPU Architecture
- Why GPUs emerged as suitable hardware for computer graphics (e.g., games)?
- Why do we talk about throughput-oriented architecture when we we talk about GPUs?
- List the main differences between GPUs and CPUs in terms of architecture.
- Use the Internet to find out and list the number of SMs, the number of cores per SM, the clock frequency and the size of the memory of the NVIDIA GPU that you plan to use during the course. It might be the GPU of your laptop / workstation, or the GPUs on Tegner (NVIDIA Quadro K420 or NVIDIA Tesla K80). Please, make sure that you mention the specific model as well.
We have learned that one of the current weaknesses of GPU programming is the link between the host and device (GPU) memories. Measure the bandwidth between host-to-device, device-to-host and device-to-device on Tegner using the bandwidthTest utility.
- Important: Due to the scheduled maintenance period on Tegner between November 1st and November 2nd, the cluster might not be accessible during the laboratory session. Moreover, we might experiment issues that prevent us to use the cluster during the following days. If that is the case, we will propose you to obtain the bandwidth information from the Internet instead.
Use Google Scholar to find a scientific paper reporting about a work using GPUs in your main domain area (HPC, image processing, machine learning, ...). Report the title, authors, conference name / journal, the GPU type that has been used, and which programming approach has been employed.
Instructions for measuring bandwidth on Tegner
To connect to Tegner, use SSH with the username that has been assigned to you by the PDC Supercomputing Center. You must ask first for a Kerberos ticket as well:
kinit --forwardable your_username@NADA.KTH.SE
Important note: If you are using one the computers in the laboratory room, do not forgate to use pdc-kinit and pdc-ssh instead. More information can be found here.
Change the current directory to your Klemming folder and copy the bandwidthTest utility from the CUDA SDK examples:
cp -rf /pdc/vol/cuda/cuda-8.0/samples/1_Utilities/bandwidthTest ./bandwidthTest
To compile the bandwidth test, you have to load the GNU Compiler and CUDA modules, and compile the "bandwidthTest.cu" file using nvcc (do not use the Makefile that is provided inside the folder!):
module load gcc/4.9.2 cuda/8.0
nvcc -arch=sm_30 -I/pdc/vol/cuda/cuda-8.0/samples/common/inc bandwidthTest.cu -o bandwidthTest
The last step is to allocate a node with GPU on Tegner and use the srun command to execute the bandwidth test:
salloc --nodes=1 --gres=gpu:K420:1 -t 00:05:00 -A edu17.dd2360
srun -n 1 ./bandwidthTest
If there is an active reservation, you can add it to the salloc command as well. Note that we are asking for 5 minutes of computation time on one single node and that we are specifying that we want to get access to the GPU resource of the node with the --gres=gpu:K420:1 option. Check the Canvas pages "Introduction to PDC environment" and "Reserved allocation time on Tegner" for additional details on how to connect and run jobs on Tegner.
Please, always ask for a node with salloc when your code compiles without errors and you would like to run your program on Tegner. After you finish executing and if you are not going to run anything for some time, type exit to reclaim your allocation and allow other students to get quick access to the cluster. This way, we will efficiently share the resources and everyone will be able to run immediately.