Tandem Mass Spectrum Prediction with Graph Transformers

Overview

MassFormer

This is the original implementation of MassFormer, a graph transformer for small molecule MS/MS prediction. Check out the preprint on arxiv.

Setting Up Environment

We recommend using conda. Three conda yml files are provided in the env/ directory (cpu.yml, cu101.yml, cu102.yml), providing different pytorch installation options (CPU-only, CUDA 10.1, CUDA 10.2). They can be trivially modified to support other versions of CUDA.

To set up an environment, run the command conda env create -f ${CONDA_YAML}, where ${CONDA_YAML} is the path to the desired yaml file.

Downloading NIST Data

Note: this step requires a Windows System or Virtual Machine

The NIST 2020 LC-MS/MS dataset can be purchased from an authorized distributor. The spectra and associated compounds can be exported to MSP/MOL format using the included lib2nist software. There is a single MSP file which contains all of the mass spectra, and multiple MOL files which include the molecular structure information for each spectrum (linked by ID). We've included a screenshot describing the lib2nist export settings.

Alt text

There is a minor bug in the export software that sometimes results in errors when parsing the MOL files. To fix this bug, run the script python mol_fix.py ${MOL_DIR}, where ${MOL_DIR} is a path to the NIST export directory with MOL files.

Downloading Massbank Data

The MassBank of North America (MB-NA) data is in MSP format, with the chemical information provided in the form of a SMILES string (as opposed to a MOL file). It can be downloaded from the MassBank website, under the tab "LS-MS/MS Spectra".

Exporting and Preparing Data

We recommend creating a directory called data/ and placing the downloaded and uncompressed data into a folder data/raw/.

To parse both of the datasets, run parse_and_export.py. Then, to prepare the data for model training, run prepare_data.py. By default the processed data will end up in data/proc/.

Setting Up Weights and Biases

Our implementation uses Weights and Biases (W&B) for logging and visualization. For full functionality, you must set up a free W&B account.

Training Models

A default config file is provided in "config/template.yml". This trains a MassFormer model on the NIST HCD spectra. Our experiments used systems with 32GB RAM, 1 Nvidia RTX 2080 (11GB VRAM), and 6 CPU cores.

The config/ directory has a template config file template.yml and 8 files corresponding to the experiments from the paper. The template config can be modified to train models of your choosing.

To train a template model without W&B with only CPU, run python runner.py -w False -d -1

To train a template model with W&B on CUDA device 0, run python runner.py -w True -d 0

Reproducing Tables

To reproduce a model from one of the experiments in Table 2 or Table 3 from the paper, run python runner.py -w True -d 0 -c ${CONFIG_YAML} -n 5 -i ${RUN_ID}, where ${CONFIG_YAML} refers to a specific yaml file in the config/ directory and ${RUN_ID} refers to an arbitrary but unique integer ID.

Reproducing Visualizations

The explain.py script can be used to reproduce the visualizations in the paper, but requires a trained model saved on W&B (i.e. by running a script from the previous section).

To reproduce a visualization from Figures 2,3,4,5, run python explain.py ${WANDB_RUN_ID} --wandb_mode=online, where ${WANDB_RUN_ID} is the unique W&B run id of the desired model's completed training script. The figues will be uploaded as PNG files to W&B.

Reproducing Sweeps

The W&B sweep config files that were used to select model hyperparameters can be found in the sweeps/ directory. They can be initialized using wandb sweep ${PATH_TO_SWEEP}.

Owner
Röst Lab
Röst lab at U of T -- join us at https://gitter.im/Roestlab/Lobby
Röst Lab
[RSS 2021] An End-to-End Differentiable Framework for Contact-Aware Robot Design

DiffHand This repository contains the implementation for the paper An End-to-End Differentiable Framework for Contact-Aware Robot Design (RSS 2021). I

Jie Xu 60 Jan 04, 2023
Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19

2s-AGCN Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19 Note PyTorch version should be 0.3! For PyTor

LShi 547 Dec 26, 2022
Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

536 Dec 20, 2022
Self-Supervised Learning for Domain Adaptation on Point-Clouds

Self-Supervised Learning for Domain Adaptation on Point-Clouds Introduction Self-supervised learning (SSL) allows to learn useful representations from

Idan Achituve 66 Dec 20, 2022
Fast, general, and tested differentiable structured prediction in PyTorch

Fast, general, and tested differentiable structured prediction in PyTorch

HNLP 1.1k Dec 16, 2022
This is an official pytorch implementation of Fast Fourier Convolution.

Fast Fourier Convolution (FFC) for Image Classification This is the official code of Fast Fourier Convolution for image classification on ImageNet. Ma

pkumi 199 Jan 03, 2023
You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling Transformer-based models are widely used in natural language processi

Zhanpeng Zeng 12 Jan 01, 2023
Unofficial Tensorflow 2 implementation of the paper Implicit Neural Representations with Periodic Activation Functions

Siren: Implicit Neural Representations with Periodic Activation Functions The unofficial Tensorflow 2 implementation of the paper Implicit Neural Repr

Seyma Yucer 2 Jun 27, 2022
PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

Image Super-Resolution with Non-Local Sparse Attention This repository is for NLSN introduced in the following paper "Image Super-Resolution with Non-

143 Dec 28, 2022
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Here is deepparse. Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning. Use deepparse to Use the pr

GRAAL/GRAIL 192 Dec 20, 2022
IGCN : Image-to-graph convolutional network

IGCN : Image-to-graph convolutional network IGCN is a learning framework for 2D/3D deformable model registration and alignment, and shape reconstructi

Megumi Nakao 7 Oct 27, 2022
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

Minqi 297 Dec 12, 2022
FairMOT for Multi-Class MOT using YOLOX as Detector

FairMOT-X Project Overview FairMOT-X is a multi-class multi object tracker, which has been tailored for training on the BDD100K MOT Dataset. It makes

Jonathan Tan 33 Dec 28, 2022
Pytorch implementation of paper Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Pytorch implementation of paper Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Hrishikesh Kamath 31 Nov 20, 2022
基于Pytorch实现优秀的自然图像分割框架!(包括FCN、U-Net和Deeplab)

语义分割学习实验-基于VOC数据集 usage: 下载VOC数据集,将JPEGImages SegmentationClass两个文件夹放入到data文件夹下。 终端切换到目标目录,运行python train.py -h查看训练 (torch) Li Xiang 28 Dec 21, 2022

Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data

federated is the source code for the Bachelor's Thesis Privacy-Preserving Federated Learning Applied to Decentralized Data (Spring 2021, NTNU) Federat

Dilawar Mahmood 25 Nov 30, 2022
PyTorch implementation for View-Guided Point Cloud Completion

PyTorch implementation for View-Guided Point Cloud Completion

22 Jan 04, 2023
M3DSSD: Monocular 3D Single Stage Object Detector

M3DSSD: Monocular 3D Single Stage Object Detector Setup pytorch 0.4.1 Preparation Download the full KITTI detection dataset. Then place a softlink (or

mumianyuxin 64 Dec 27, 2022
Synthetic LiDAR sequential point cloud dataset with point-wise annotations

SynLiDAR dataset: Learning From Synthetic LiDAR Sequential Point Cloud This is official repository of the SynLiDAR dataset. For technical details, ple

78 Dec 27, 2022
Systematic generalisation with group invariant predictions

Requirements are Python 3, TensorFlow v1.14, Numpy, Scipy, Scikit-Learn, Matplotlib, Pillow, Scikit-Image, h5py, tqdm. Experiments were run on V100 GPUs (16 and 32GB).

Faruk Ahmed 30 Dec 01, 2022