GPU Programming with Julia - course at the Swiss National Supercomputing Centre (CSCS), ETH Zurich

Overview

Course title page

Course Description

The programming language Julia is being more and more adopted in High Performance Computing (HPC) due to its unique way to combine performance with simplicity and interactivity, enabling unprecedented productivity in HPC development. This course will discuss both basic and advanced topics relevant for single and Multi-GPU computing with Julia. It will focus on the CUDA.jl package, which enables writing native Julia code for GPUs. Topics covered include the following:

  • GPU array programming;
  • GPU kernel programming;
  • kernel launch parameters;
  • usage of on-chip memory;
  • Multi-GPU computing;
  • code reflection and introspection; and
  • diverse advanced optimization techniques.

This course combines lectures and hands-on sessions.

Target audience

This course addresses scientists interested in doing HPC using Julia. Previous Julia or GPU computing knowledge is not needed, but a good general understanding of programming is advantageous.

Instructors

  • Dr. Tim Besard (Lead developer of CUDA.jl, Julia Computing Inc.)
  • Dr. Samuel Omlin (Computational Scientist | Responsible for Julia computing, CSCS)

Course material

This git repository contains the material of day 1 and 2 (speaker: Dr. Samuel Omlin, CSCS). The material of day 3 and 4 is found in this git repository (speaker: Dr. Tim Besard, Julia Computing Inc.).

Course recording

The edited course recording is found here. The following list provides key entry points into the video.

Day 1:

00:00: Introduction to the course

05:02: General introduction to supercomputing

14:06: High-speed introduction to GPU computing

32:57: Walk through introduction notebook on memory copy and performance evaluation

Day 2:

1:24:53: Introduction to day 2

1:39:12: Walk through solutions of exercise 1 and 2 (data "transfer" optimisations)

2:34:12: Walk through solutions of exercise 3 and 4 (data "transfer" optimisations and distributed parallelization)

Day 3:

03:31:57: Introduction to day 3

03:32:59: Presentation of notebook 1: cuda libraries

04:24:31: Presentation of notebook 2: programming models

05:30:46: Presentation of notebook 3: memory management

06:03:48: Presentation of notebook 4: concurrent computing

Day 4:

06:27:15: Introduction to day 4

06:28:13: Presentation of notebook 5: application analysis and optimisation

07:35:08: Presentation of notebook 6: kernel analysis and optimisation

Owner
Samuel Omlin
Computational Scientist | Responsible for Julia computing, CSCS - Swiss National Supercomputing Centre
Samuel Omlin
Kinetics-Data-Preprocessing

Kinetics-Data-Preprocessing Kinetics-400 and Kinetics-600 are common video recognition datasets used by popular video understanding projects like Slow

Kaihua Tang 7 Oct 27, 2022
Codes for “A Deeply Supervised Attention Metric-Based Network and an Open Aerial Image Dataset for Remote Sensing Change Detection”

DSAMNet The pytorch implementation for "A Deeply-supervised Attention Metric-based Network and an Open Aerial Image Dataset for Remote Sensing Change

Mengxi Liu 41 Dec 14, 2022
HuSpaCy: industrial-strength Hungarian natural language processing

HuSpaCy: Industrial-strength Hungarian NLP HuSpaCy is a spaCy model and a library providing industrial-strength Hungarian language processing faciliti

HuSpaCy 120 Dec 14, 2022
LUKE -- Language Understanding with Knowledge-based Embeddings

LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transf

Studio Ousia 587 Dec 30, 2022
The FIRST GANs-based omics-to-omics translation framework

OmiTrans Please also have a look at our multi-omics multi-task DL freamwork 👀 : OmiEmbed The FIRST GANs-based omics-to-omics translation framework Xi

Xiaoyu Zhang 6 Dec 14, 2022
Neon: an add-on for Lightbulb making it easier to handle component interactions

Neon Neon is an add-on for Lightbulb making it easier to handle component interactions. Installation pip install git+https://github.com/neonjonn/light

Neon Jonn 9 Apr 29, 2022
VOS: Learning What You Don’t Know by Virtual Outlier Synthesis

VOS This is the source code accompanying the paper VOS: Learning What You Don’t

248 Dec 25, 2022
SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks

SalFBNet This repository includes Pytorch implementation for the following paper: SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolu

12 Aug 12, 2022
Implementation of the HMAX model of vision in PyTorch

PyTorch implementation of HMAX PyTorch implementation of the HMAX model that closely follows that of the MATLAB implementation of The Laboratory for C

Marijn van Vliet 52 Oct 13, 2022
Light-Head R-CNN

Light-head R-CNN Introduction We release code for Light-Head R-CNN. This is my best practice for my research. This repo is organized as follows: light

jemmy li 835 Dec 06, 2022
PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).

Scaffold-Federated-Learning PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020). Environment numpy=

KI 30 Dec 29, 2022
A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perform basic tasks.

AI_Personal_Voice_Assistant_Using_Python A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perf

Chumui Tripura 1 Oct 30, 2021
StyleGAN2-ADA - Official PyTorch implementation

Need Help? If you’re new to StyleGAN2-ADA and looking to get started, please check out this video series from a course Lia Coleman and I taught in Oct

Derrick Schultz 217 Jan 04, 2023
This is the repository for Learning to Generate Piano Music With Sustain Pedals

SusPedal-Gen This is the official repository of Learning to Generate Piano Music With Sustain Pedals Demo Page Dataset The dataset used in this projec

Joann Ching 12 Sep 02, 2022
All public open-source implementations of convnets benchmarks

convnet-benchmarks Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below. Machine: 6-cor

Soumith Chintala 2.7k Dec 30, 2022
RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting

RATCHET: RAdiological Text Captioning for Human Examined Thoraxes RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting. Based on t

26 Nov 14, 2022
Progressive Coordinate Transforms for Monocular 3D Object Detection

Progressive Coordinate Transforms for Monocular 3D Object Detection This repository is the official implementation of PCT. Introduction In this paper,

58 Nov 06, 2022
Implementation of TabTransformer, attention network for tabular data, in Pytorch

Tab Transformer Implementation of Tab Transformer, attention network for tabular data, in Pytorch. This simple architecture came within a hair's bread

Phil Wang 420 Jan 05, 2023
Seg-Torch for Image Segmentation with Torch

Seg-Torch for Image Segmentation with Torch This work was sparked by my personal research on simple segmentation methods based on deep learning. It is

Eren Gölge 37 Dec 12, 2022
PyTorch and GPyTorch implementation of the paper "Conditioning Sparse Variational Gaussian Processes for Online Decision-making."

Conditioning Sparse Variational Gaussian Processes for Online Decision-making This repository contains a PyTorch and GPyTorch implementation of the pa

Wesley Maddox 16 Dec 08, 2022