Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Last update: Jan 05, 2023

Related tags

Overview

Regression Transformer

Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Development setup

conda env create -f conda.yml
conda activate terminator
pip install -e .

Generate some data

Example data for QED can be generated using scripts/generate_example_data.py.

python scripts/generate_example_data.py examples/example.smi examples/qed_property_example.txt

If you need to create a new vocabulary for a dataset you can use scripts/create_vocabulary.py it will also automatically add some special tokens at the top of your vocabulary file.

python scripts/create_vocabulary.py examples/qed_property_example.txt examples/vocab.txt

At this point the folder containing the vocabulary file can be used to load a tokenizer compatible with any ExpressionBertTokenizer:

>>> from terminator.tokenization import ExpressionBertTokenizer
>>> tokenizer = ExpressionBertTokenizer.from_pretrained('examples')
>>> text = '
   
    0.3936|CBr'
   
>>> tokens = tokenizer.tokenize(text)
>>> print(tokens)
['
   
    '
   , '_0_0_', '_._', '_3_-1_', '_9_-2_', '_3_-3_', '_6_-4_', '|', 'C', 'Br']
>>> token_indexes = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text))
>>> print(token_indexes)
[16, 17, 18, 28, 45, 34, 35, 19, 15, 63]
>>> tokenizer.build_inputs_with_special_tokens(token_indexes)
[12, 16, 17, 18, 28, 45, 34, 35, 19, 15, 63, 13]

Prepare some train/eval data line by line:

head -n 900 examples/qed_property_example.txt > examples/train.txt
tail -n +901 examples/qed_property_example.txt > examples/eval.txt

Launch the training:

python scripts/run_language_modeling.py --output_dir examples/models/xlnet_selfies \
    --config_name configs/xlnet_selfies.json --tokenizer_name ./examples/vocab.txt \
    --do_train --do_eval --learning_rate 1e-4 --num_train_epochs 5 --save_total_limit 2 \
    --save_steps 500 --per_gpu_train_batch_size 16 --evaluate_during_training --eval_data_file ./examples/eval.txt \
    --train_data_file ./examples/train.txt --line_by_line --block_size 510 --seed 42 --logging_steps 250

Exemplary model configurations (number of heads, layers, etc.) can be found in the configs folder.

Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Related tags

Overview

Regression Transformer

Development setup

Generate some data

Owner

International Business Machines

CVAT is free, online, interactive video and image annotation tool for computer vision

Inference pipeline for our participation in the FeTA challenge 2021.

Byzantine-robust decentralized learning via self-centered clipping

repro_eval is a collection of measures to evaluate the reproducibility/replicability of system-oriented IR experiments

Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

Pytorch implementation of SELF-ATTENTIVE VAD, ICASSP 2021

Optimizing Value-at-Risk and Conditional Value-at-Risk of Black Box Functions with Lacing Values (LV)

Ansible Automation Example: JSNAPY PRE/POST Upgrade Validation

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

Framework to build and train RL algorithms

The Video-based Accident Detection System built in Python

PyTorch META-DATASET (Few-shot classification benchmark)

Notepy is a full-featured Notepad Python app

NLMpy - A Python package to create neutral landscape models

Share a benchmark that can easily apply reinforcement learning in Job-shop-scheduling

This repo is customed for VisDrone.

Can we do Customers Segmentation using PHP and Unsupervized Machine Learning ? Yes we can ! 🤡

Official Implementation of "DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization."

Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection".

Easy to use Python camera interface for NVIDIA Jetson