The starter repository for submissions to the GeneDisco challenge for optimized experimental design in genetic perturbation experiments

Overview

GeneDisco ICLR-22 Challenge Starter Repository

Python version Library version

The starter repository for submissions to the GeneDisco challenge for optimized experimental design in genetic perturbation experiments.

GeneDisco (to be published at ICLR-22) is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery. GeneDisco contains a curated set of multiple publicly available experimental data sets as well as open-source implementations of state-of-the-art active learning policies for experimental design and exploration.

Install

pip install -r requirements.txt

Use

Setup

  • Create a cache directory. This will hold any preprocessed and downloaded datasets for faster future invocation.
    • $ mkdir /path/to/genedisco_cache
    • Replace the above with your desired cache directory location.
  • Create an output directory. This will hold all program outputs and results.
    • $ mkdir /path/to/genedisco_output
    • Replace the above with your desired output directory location.

How to Run the Full Benchmark Suite?

Experiments (all baselines, acquisition functions, input and target datasets, multiple seeds) included in GeneDisco can be executed sequentially for e.g. acquired batch size 64, 8 cycles and a bayesian_mlp model using:

run_experiments \
  --cache_directory=/path/to/genedisco_cache  \
  --output_directory=/path/to/genedisco_output  \
  --acquisition_batch_size=64  \
  --num_active_learning_cycles=8  \
  --max_num_jobs=1

Results are written to the folder at /path/to/genedisco_cache, and processed datasets will be cached at /path/to/genedisco_cache (please replace both with your desired paths) for faster startup in future invocations.

Note that due to the number of experiments being run by the above command, we recommend execution on a compute cluster.
The GeneDisco codebase also supports execution on slurm compute clusters (the slurm command must be available on the executing node) using the following command and using dependencies in a Python virtualenv available at /path/to/your/virtualenv (please replace with your own virtualenv path):

run_experiments \
  --cache_directory=/path/to/genedisco_cache  \
  --output_directory=/path/to/genedisco_output  \
  --acquisition_batch_size=64  \
  --num_active_learning_cycles=8  \
  --schedule_on_slurm \
  --schedule_children_on_slurm \
  --remote_execution_virtualenv_path=/path/to/your/virtualenv

Other scheduling systems are currently not supported by default.

How to Run A Single Isolated Experiment (One Learning Cycle)?

To run one active learning loop cycle, for example, with the "topuncertain" acquisition function, the "achilles" feature set and the "schmidt_2021_ifng" task, execute the following command:

active_learning_loop  \
    --cache_directory=/path/to/genedisco/genedisco_cache \
    --output_directory=/path/to/genedisco/genedisco_output \
    --model_name="bayesian_mlp" \
    --acquisition_function_name="topuncertain" \
    --acquisition_batch_size=64 \
    --num_active_learning_cycles=8 \
    --feature_set_name="achilles" \
    --dataset_name="schmidt_2021_ifng" 

How to Evaluate a Custom Acquisition Function?

To run a custom acquisition function, set --acquisition_function_name="custom" and --acquisition_function_path to the file path that contains your custom acquisition function (e.g. main.py in this repo).

active_learning_loop  \
    --cache_directory=/path/to/genedisco/genedisco_cache \
    --output_directory=/path/to/genedisco/genedisco_output \
    --model_name="bayesian_mlp" \
    --acquisition_function_name="custom" \
    --acquisition_function_path=/path/to/src/main.py \
    --acquisition_batch_size=64 \
    --num_active_learning_cycles=8 \
    --feature_set_name="achilles" \
    --dataset_name="schmidt_2021_ifng" 

...where "/path/to/custom_acquisition_function.py" contains code for your custom acquisition function corresponding to the BaseBatchAcquisitionFunction interface, e.g.:

import numpy as np
from typing import AnyStr, List
from slingpy import AbstractDataSource
from slingpy.models.abstract_base_model import AbstractBaseModel
from genedisco.active_learning_methods.acquisition_functions.base_acquisition_function import \
    BaseBatchAcquisitionFunction

class RandomBatchAcquisitionFunction(BaseBatchAcquisitionFunction):
    def __call__(self,
                 dataset_x: AbstractDataSource,
                 batch_size: int,
                 available_indices: List[AnyStr], 
                 last_selected_indices: List[AnyStr] = None, 
                 model: AbstractBaseModel = None,
                 temperature: float = 0.9,
                 ) -> List:
        selected = np.random.choice(available_indices, size=batch_size, replace=False)
        return selected

Note that the last class implementing BaseBatchAcquisitionFunction is loaded by GeneDisco if there are multiple valid acquisition functions present in the loaded file.

Submission instructions

For submission, you will need two things:

Please note that all your submitted code must either be loaded via a dependency in requirements.txt or be present in the src/ directory in this starter repository for the submission to succeed.

Once you have set up your submission environment, you will need to create a lightweight container image that contains your acquisition function.

Submission steps

  • Navigate to the directory to which you have cloned this repo to.
    • $ cd /path/to/genedisco-starter
  • Ensure you have ONE acquisition function (inheriting from BaseBatchAcquisitionFunction) in main.py
    • This is your pre-defined program entry point.
  • Build your container image
    • $ docker build -t submission:latest .
  • Save your image name to a shell variable
    • $ IMAGE="submission:latest"
  • Use the EvalAI-CLI command to submit your image
    • Run the following command to submit your container image:
      • $ evalai push $IMAGE --phase gsk-genedisco-challenge-1528
      • Please note that you have a maximum number of submissions that any submission will be counted against.

That’s it! Our pipeline will take your image and test your function.

If you have any questions or concerns, please reach out to us at [email protected]

Citation

Please consider citing, if you reference or use our methodology, code or results in your work:

@inproceedings{mehrjou2022genedisco,
    title={{GeneDisco: A Benchmark for Experimental Design in Drug Discovery}},
    author={Mehrjou, Arash and Soleymani, Ashkan and Jesson, Andrew and Notin, Pascal and Gal, Yarin and Bauer, Stefan and Schwab, Patrick},
    booktitle={{International Conference on Learning Representations (ICLR)}},
    year={2022}
}

License

License

Authors

Arash Mehrjou, GlaxoSmithKline plc
Jacob A. Sackett-Sanders, GlaxoSmithKline plc
Patrick Schwab, GlaxoSmithKline plc

Acknowledgements

PS, JSS and AM are employees and shareholders of GlaxoSmithKline plc.

Mad-cookiecutter - Cookiecutter templates for MaD projects

MaD Cookiecutter Templates A set of templates that can be used to quickly get st

Machine Learning and Data Analytics Lab FAU 1 Jan 10, 2022
Boilerplate for starting a python project

Python Project Boilerplate Simple boilerplate for starting a python proect. Using the repo Follow following steps to install client on server Create a

Prajwal Dahal 1 Nov 19, 2021
King is a simple boilerplate from a bigger Discord Bot project created for my Discord Server.

King A simple Discord bot boilerplate. King is a simple boilerplate from a bigger Discord Bot project created for my Discord Server. I intend to showc

Xminent 0 Aug 21, 2021
Cookiecutter-allpurpose-minimal-python - A simple cookiecutter template for general-purpose python projects.

cookiecutter-allpurpose-minimal-python A simple cookiecutter template for general-purpose python projects. To use, run pip install cookiecutter cookie

E. Tolga Ayan 2 Jan 24, 2022
Django starter project with 🔋

A batteries-included Django starter project. For a production-ready version see the book Django for Professionals. 🚀 Features Django 3.1 & Python 3.8

William Vincent 1.5k Jan 08, 2023
Template to quickly start your playwright-python project

Playwright-python template 🍪 Template to quickly start your playwright-python project Getting started • Demo • Configuration Getting started Clone th

Constantin 1 Dec 13, 2021
Cookiecutter Django is a framework for jumpstarting production-ready Django projects quickly.

Cookiecutter Django Powered by Cookiecutter, Cookiecutter Django is a framework for jumpstarting production-ready Django projects quickly. Documentati

Daniel Roy Greenfeld 10k Jan 01, 2023
The starter for the Flask React project

Flask React Project This is the starter for the Flask React project. Getting started Clone this repository (only this branch) git clone https://github

Parker Bolick 2 May 14, 2022
A framework for launching new Django Rest Framework projects quickly.

DRFx A framework for launching new Django Rest Framework projects quickly. Comes with a custom user model, login/logout/signup, social authentication

William Vincent 400 Dec 29, 2022
Um template para quem quiser usar o Docker + PGSQL + Django.

Um template para quem quiser usar o Docker + PGSQL + Django.

Drack 2 Mar 11, 2022
This is a boilerplate for a basic backend app using Python, Django and SQLite, as developed after tutorials with Programming with Mosh

This is a boilerplate for a basic backend app using Python, Django and SQLite, as developed after tutorials with Programming with Mosh

Gustavo Catala Sverdrup 1 Jan 07, 2022
A low dependency and really simple to start project template for Python Projects.

Python Project Template A low dependency and really simple to start project template for Python Projects. HOW TO USE THIS TEMPLATE DO NOT FORK this is

Yurii Dubinka 5 Jan 21, 2022
A simple cookiecutter to create Python Telegram bots, wrapped with Django.

PTB Django cookiecutter A simple cookiecutter to create Python Telegram bots, wrapped with Django. Based on this cool projects python-telegram-bot (PT

Carlos Lugones 20 Nov 12, 2022
Cookiecutter to create a Google Function. Powered by Poetry, GitHub actions, and Google Cloud Platform

Cookiecutter Google Function Cookiecutter template for a Google Function. Powered by Poetry, and GitHub actions. Quickstart Install the latest Cookiec

Arthur 1 Jan 07, 2022
simple flask starter app utilizing docker

Simple flask starter app utilizing docker to showcase seasonal anime using jikanpy (myanimelist unofficial api).

Kennedy Ngugi Mwaura 5 Dec 15, 2021
Django sample app with users including social auth via Django-AllAuth

demo-allauth-bootstrap Simple, out-of-the-box Django all-auth demo app A "brochure" or visitor (no login required) area A members-only (login required

Andrew E 215 Dec 20, 2022
Django project/application starter for lazybones :)

Django Project Starter Template My custom project starter for Django! I’ll try to support every upcoming Django releases as much as I can! Requirement

Uğur Özyılmazel 40 Jul 16, 2022
A template repo for use in the Advent of Code

AoC-Template A template repo for use in the Advent of Code The README_template.md must contain "STATS_TABLE" to be replaced by the generated table, an

0 Jan 14, 2022
A Boilerplate repo for Scientific Python Open Science projects

A Boilerplate repo for Scientific Python Open Science projects Installation Clone this repo If you need a fresh python environment, run $ conda env cr

Vincent Choqueuse 2 Dec 23, 2021
Basic Docker Compose template application with Flask, Celery, Redis, MySQL, SocketIO, Nginx and Gunicorn.

Nginx / Gunicorn / Flask 🐍 / Celery / SocketIO / MySQL / Redis / Docker 🐳 sample application Basic Docker Compose template application for orchestat

Alex Oarga 8 Aug 06, 2022