Basic repository showing how to use Hydra + Hydra launchers on SLURM cluster

Overview

Slurm-Hydra-Submitit

This repository is a minimal working example on how to:

Set up Hydra

⚠️ You need to install hydra-core for this step.

Hydra is fairly easy to set-up:

By simply running python slurm_hydra_submitit/script.py, you'll see how the main function takes the arguments from the configuration file and pass them to the following underlying functions.

Launch jobs on a SLURM cluster with Hydra submitit launcher

Launch a job on the cluster

⚠️ You need to install hydra-submitit-launcher for this step.

Now that our Hydra conf is setup, we want to run the job on a SLURM cluster instead of our local computer. For that, we need to:

  • specify the hydra launcher to work on the SLURM cluster
  • specify the hardware specifications for the SLURM job

If you connect to your SLURM cluster scheduler node, just by installing hydra-submitit-launcher, you can already launch jobs on the cluster with:

python slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm

To test locally before sending to the cluster, you can switch the hydra/launcher argument to submitit_local.

Adapt node parameters

You can easily adapt the SLURM parameters by modifying the following arguments SLURM launcher arguments.

For example, the following script is executed on nodes with 10 CPUs: python slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm hydra.launcher.cpus_per_task=10

Launch array of jobs on the SLURM cluster

Grid Search

You can launch multiple jobs at once by specifying their values in the launch command.

For example, the following command launches 4 jobs which corresponds to all the possible combinations of arguments.

python slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm project_name=P1,P2 train.epochs=30,40

Specific Parameters Combinations

Alternatively, you can pass sets of parameters to test together:

python slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm +compile="{project_name:P1,train.epochs:30}, {project_name:P2,train.epochs:40}"

To clean this command a bit, we can create a bash script similar to this:

#!/bin/bash
params=(
    '{project_name:P1,train.epochs:10},'
    '{project_name:P2,train.epochs:20}'
)

slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm +compile="${params[*]}"
Owner
Raphael Meudec
PhD candidate @Parietal-INRIA
Raphael Meudec
A GUI love Calculator which saves all the User Data in text file(sql based script will be uploaded soon). Interative GUI. Even For Admin Panel

Love-Calculator A GUI love Calculator which saves all the User Data in text file(sql based script will be uploaded soon). Interative GUI, even For Adm

Adithya Krishnan 1 Mar 22, 2022
WinBoost: Boost your windows system.

Winboost runs a complete checkup of your entire system locating junk files, speed-reducing issues and causes of any system or application glitches or crashes. Through a lot of research and testing, w

Smit Parmar 4 Oct 01, 2021
Async timeit - Async version of python's timeit

Async Timeit Replica of default python timeit module with small changes to allow

Raghava G Dhanya 3 Apr 13, 2022
A script to download all the challenges and files from the CTFd instance.

Python CTFd Downloader A script to download all the challenges and files from the CTFd instance. Installation Clone this repo: git clone https://githu

Jacob Elliott 19 Dec 16, 2022
Understanding the field usage of any object in Salesforce

Understanding the field usage of any object in Salesforce One of the biggest problems that I have addressed while working with Salesforce is to unders

Sebastian Undurraga 1 Dec 14, 2021
This repository holds those infrastructure-level modules, that every application requires that follows the core 12-factor principles.

py-12f-common About This repository holds those infrastructure-level modules, that every application requires that follows the core 12-factor principl

Tamás Benke 1 Dec 15, 2022
Generates images with semantic content from distribution A in the style of distribution B

A2B Generates images with semantic content from distribution A in the style of d

Richard Herbert 2 Dec 27, 2021
A blazing fast mass certificate generator script for the community ⚡

A simple mass certificate generator script for the community ⚡ Source Code · Docs · Raw Script Docs All you need Certificate Design a simple template

Tushar Nankani 24 Jan 03, 2023
Simple Wayland HotKey Daemon

swhkd Simple Wayland HotKey Daemon This project is still very new and I'm making new decisions everyday as to where I should drive this project. I'm u

Aakash Sen Sharma 407 Dec 30, 2022
A browser login credentials thief for windows and Linux

Thief 🦹🏻 A browser login credentials thief for windows and Linux Python script to decrypt login credentials from browsers in windows or linux Decryp

Ash 1 Dec 13, 2021
Here You will Find CodeChef Challenge Solutions

Here You will Find CodeChef Challenge Solutions

kanishk kashyap 1 Sep 03, 2022
RecurrentArchitectures - See the accompanying blog post

Why this? What is the goal? The goal of this repository is to write all the recurrent architectures from scratch in tensorflow for learning purposes.

Debajyoti Datta 9 Feb 06, 2022
"Hacking" the (Telekom) Zyxel GPON SFP module (PMG3000-D20B)

"Hacking" the (Telekom) Zyxel GPON SFP module (PMG3000-D20B) The SFP can be sour

Matthias Riegler 52 Jan 03, 2023
A small Blender addon for changing an object's local orientation while in edit mode

A small Blender addon for changing an object's local orientation while in edit mode.

Jonathan Lampel 50 Jan 06, 2023
Given an array of integers, calculate the ratios of its elements that are positive, negative, and zero.

Given an array of integers, calculate the ratios of its elements that are positive, negative, and zero. Print the decimal value of each fraction on a new line with places after the decimal.

Shruti Dhave 2 Nov 29, 2021
Simple programming language built on Python.

Serial Another programming language. Built on Python. Building and running program In order to run the program on serial, unfortunately you still need

Aleksey Demchenkov 1 Dec 09, 2021
EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.

EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.

EasyBuild community 87 Dec 27, 2022
The Playwright Workshop for TAU: The Homecoming

tau-playwright-workshop This repository contains the instructions and example code for the Playwright workshop for TAU: The Homecoming on December 1,

Pandy Knight 134 Dec 30, 2022
A python script based on OpenCV-Python, you can automatically hang up the Destiny 2 Throne to get the Dawning Essence.

A python script based on OpenCV-Python, you can automatically hang up the Destiny 2 Throne to get the Dawning Essence.

1 Dec 19, 2021
A python library for writing parser-based interactive fiction.

About IntFicPy A python library for writing parser-based interactive fiction. Currently in early development. IntFicPy Docs Parser-based interactive f

Rita Lester 31 Nov 23, 2022