Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge

Overview

Part of the 9th place solution for the Bristol-Myers Squibb – Molecular Translation challenge translating images containing chemical structures into InChI (International Chemical Identifier) texts.

This repo is partially based on the following resources:

Requirements

  • install and activate the conda environment
  • download and extract the data into /data/bms/
  • extract and move sample_submission_with_length.csv.gz into /data/bms/
  • tokenize training inputs: python datasets/prepocess2.py
  • if you want to use pseudo labeling, execute: python datasets/pseudo_prepocess2.py your_submission_file.csv
  • if you want to use external images, you can create with the following commands:
python r09_create_images_from_allowed_inchi.py
python datasets/extra_prepocess2.py 
  • and also install apex

Training

This repo supports training any VIT/SWIN/CAIT transformer models from timm as encoder together with the fairseq transformer decoder.

Here is an example configuration to train a SWIN swin_base_patch4_window12_384 as encoder and 12 layer 16 head fairseq decoder:

python -m torch.distributed.launch --nproc_per_node=N train.py --logdir=logdir/ \
    --pipeline --train-batch-size=50 --valid-batch-size=128 --dataload-workers-nums=10 --mixed-precision --amp-level=O2  \
    --aug-rotate90-p=0.5 --aug-crop-p=0.5 --aug-noise-p=0.9 --label-smoothing=0.1 \
    --encoder-lr=1e-3 --decoder-lr=1e-3 --lr-step-ratio=0.3 --lr-policy=step --optim=adam --lr-warmup-steps=1000 --max-epochs=20 --weight-decay=0 --clip-grad-norm=1 \
    --verbose --image-size=384 --model=swin_base_patch4_window12_384 --loss=ce --embed-dim=1024 --num-head=16 --num-layer=12 \
    --fold=0 --train-dataset-size=0 --valid-dataset-size=65536 --valid-dataset-non-sorted

For pseudo labeling, use --pseudo=pseudo.pkl. If you want subsample the pseudo dataset, use: --pseudo-dataset-size=448000. For using external images, use --extra (--extra-dataset-size=448000).

After training, you can also use Stochastic Weight Averaging (SWA) which gives a boost around 0.02:

python swa.py --image-size=384 --input logdir/epoch-17.pth,logdir/epoch-18.pth,logdir/epoch-19.pth,logdir/epoch-20.pth

Inference

Evaluation:

python -m torch.distributed.launch --nproc_per_node=N eval.py --mixed-precision --batch-size=128 swa_model.pth

Inference:

python -m torch.distributed.launch --nproc_per_node=N inference.py --mixed-precision --batch-size=128 swa_model.pth

Normalization with RDKit:

./normalize_inchis.sh submission.csv
Owner
Erdene-Ochir Tuguldur
Берлиний Техникийн Их Сургууль
Erdene-Ochir Tuguldur
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 18 Oct 06, 2022
[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

template-pose Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions

Van Nguyen Nguyen 92 Dec 28, 2022
Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

RSCD (BS-RSCD & JCD) Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021) by Zhihang Zhong, Yinqiang Zheng, Imari Sato We co

81 Dec 15, 2022
Keras community contributions

keras-contrib : Keras community contributions Keras-contrib is deprecated. Use TensorFlow Addons. The future of Keras-contrib: We're migrating to tens

Keras 1.6k Dec 21, 2022
Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Fine-tuning StyleGAN2 for Cartoon Face Generation

Jihye Back 520 Jan 04, 2023
This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset

HiRID-ICU-Benchmark This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset for which the manuscript can be found here.

Biomedical Informatics at ETH Zurich 30 Dec 16, 2022
OneShot Learning-based hotword detection.

EfficientWord-Net Hotword detection based on one-shot learning Home assistants require special phrases called hotwords to get activated (eg:"ok google

ANT-BRaiN 102 Dec 25, 2022
CLEAR algorithm for multi-view data association

CLEAR: Consistent Lifting, Embedding, and Alignment Rectification Algorithm The Matlab, Python, and C++ implementation of the CLEAR algorithm, as desc

MIT Aerospace Controls Laboratory 30 Jan 02, 2023
YolactEdge: Real-time Instance Segmentation on the Edge

YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds. Specifically, YolactEdge runs at up to 30.8 FPS on a Jetson AGX Xavier (and 172.7

Haotian Liu 1.1k Jan 06, 2023
Code repository for the paper "Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation" with instructions to reproduce the results.

Doubly Trained Neural Machine Translation System for Adversarial Attack and Data Augmentation Languages Experimented: Data Overview: Source Target Tra

Steven Tan 1 Aug 18, 2022
Here is the diagnostic tool for BMVC 2021 paper Diagnosing Errors in Video Relation Detectors.

Here is the diagnostic tool for BMVC 2021 paper Diagnosing Errors in Video Relation Detectors. We provide a tiny ground truth file demo_gt.json, and t

Shuo Chen 3 Dec 26, 2022
GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery

GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery This is the code to the paper: Gradient-Based Learn

3 Feb 15, 2022
Official code implementation for "Personalized Federated Learning using Hypernetworks"

Personalized Federated Learning using Hypernetworks This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [

Aviv Shamsian 121 Dec 25, 2022
KwaiRec: A Fully-observed Dataset for Recommender Systems (Density: Almost 100%)

KuaiRec: A Fully-observed Dataset for Recommender Systems (Density: Almost 100%) KuaiRec is a real-world dataset collected from the recommendation log

Chongming GAO (高崇铭) 70 Dec 28, 2022
Offical implementation for "Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation".

Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation (NeurIPS 2021) by Qiming Hu, Xiaojie Guo. Dependencies P

Qiming Hu 31 Dec 20, 2022
Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Face Recognition Using Pytorch Python 3.7 3.6 3.5 Status This is a repository for Inception Resnet (V1) models in pytorch, pretrained on VGGFace2 and

Tim Esler 3.3k Jan 04, 2023
a practicable framework used in Deep Learning. So far UDL only provide DCFNet implementation for the ICCV paper (Dynamic Cross Feature Fusion for Remote Sensing Pansharpening)

UDL UDL is a practicable framework used in Deep Learning (computer vision). Benchmark codes, results and models are available in UDL, please contact @

Xiao Wu 11 Sep 30, 2022
Laplacian Score-regularized Concrete Autoencoders

Laplacian Score-regularized Concrete Autoencoders Requirements: torch = 1.9 scikit-learn = 0.24 omegaconf = 2.0.6 scipy = 1.6.0 matplotlib How to

JS 6 Dec 07, 2022
CAST: Character labeling in Animation using Self-supervision by Tracking

CAST: Character labeling in Animation using Self-supervision by Tracking (Published as a conference paper at EuroGraphics 2022) Note: The CAST paper c

15 Nov 18, 2022
[ICCV'21] Official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations

CrowdNav with Social-NCE This is an official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations by

VITA lab at EPFL 125 Dec 23, 2022