THOR: Transformer with Stochastic Experts

This PyTorch package implements Taming Sparsely Activated Transformer with Stochastic Experts.

Installation

The most convenient way to run the code is to use this docker image: tartarusz/adv-train:azure-pytorch-apex-v1.7.0. The image supports running on Microsoft Azure.
Our implementation is based on Fairseq.

Instructions

Download Fairseq (v1.0.0+) to the current directory.
Run pip install -e . to install the package locally.
To run a sample translation task on IWSLT'14 De-En, first follow the instructions here to download and tokenize the data, then use bash preprocess.sh to pre-process the tokenized data.
Run bash run.sh to train a THOR model.

Notes

Contact Information

For personal communication related to this package, please contact Simiao Zuo ([email protected]), Xiaodong Liu ([email protected]), or Jian Jiao ([email protected]).

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

This package implements THOR: Transformer with Stochastic Experts.

Related tags

Overview

THOR: Transformer with Stochastic Experts

Installation

Instructions

Notes

Contact Information

Contributing

Trademarks

Owner

Microsoft

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Satellite labelling tool for manual labelling of storm top features such as overshooting tops, above-anvil plumes, cold U/Vs, rings etc.

NeurIPS workshop paper 'Counter-Strike Deathmatch with Large-Scale Behavioural Cloning'

Classification Modeling: Probability of Default

Implementations of orthogonal and semi-orthogonal convolutions in the Fourier domain with applications to adversarial robustness

Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

A Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Training Data》

Deep Inside Convolutional Networks - This is a caffe implementation to visualize the learnt model

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Dynamic Slimmable Network (CVPR 2021, Oral)

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Code, pre-trained models and saliency results for the paper "Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images".

This repository comes with the paper "On the Robustness of Counterfactual Explanations to Adverse Perturbations"

TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

An experiment on the performance of homemade Q-learning AIs in Agar.io depending on their state representation and available actions

ChainerRL is a deep reinforcement learning library built on top of Chainer.

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles

Automatically erase objects in the video, such as logo, text, etc.