GluonMM is a library of transformer models for computer vision and multi-modality research

Last update: Dec 02, 2022

Overview

GluonMM

GluonMM is a library of transformer models for computer vision and multi-modality research. It contains reference implementations of widely adopted baseline models and also research work from Amazon Research.

Install

First, clone the repository locally,

git clone https://github.com/amazon-research/gluonmm.git

Then install dependencies,

conda create -n gluonmm python=3.7
conda activate gluonmm
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install timm tensorboardX yacs tqdm requests pandas decord scikit-image opencv-python

# Install apex for half-precision training (optional)
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

We have extensively tested the usage with PyTorch 1.8.1 and torchvision 0.9.1 with CUDA 10.2.

Model zoo

Image classification

Video action recognition

VidTr

Usage

For detailed usage, please refer to the README file in each model family. For example, the training, evaluation and model zoo information of video transformer VidTr can be found at here.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Acknowledgement

Parts of the code are heavily derived from pytorch-image-models, DeiT, Swin-transformer, vit-pytorch and vision_transformer.

GluonMM is a library of transformer models for computer vision and multi-modality research

Related tags

Overview

GluonMM

Install

Model zoo

Image classification

Video action recognition

Usage

Security

License

Acknowledgement

Owner

This is a pytorch implementation for the BST model from Alibaba https://arxiv.org/pdf/1905.06874.pdf

3D position tracking for soccer players with multi-camera videos

An implementation of the AdaOPS (Adaptive Online Packing-based Search), which is an online POMDP Solver used to solve problems defined with the POMDPs.jl generative interface.

Categorizing comments on YouTube into different categories.

Deep Reinforcement Learning based Trading Agent for Bitcoin

Source code for paper "Deep Diffusion Models for Robust Channel Estimation", TBA.

Image Fusion Transformer

A minimal yet resourceful implementation of diffusion models (along with pretrained models + synthetic images for nine datasets)

TensorFlow implementation of original paper : https://github.com/hszhao/PSPNet

Reproducing code of hair style replacement method from Barbershorp.

Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"

Auto-updating data to assist in investment to NEPSE

This reposityory contains the PyTorch implementation of our paper "Generative Dynamic Patch Attack".

Self-Learning - Books Papers, Courses & more I have to learn soon

[CVPR 2022] Thin-Plate Spline Motion Model for Image Animation.

VQMIVC - Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

Assessing syntactic abilities of BERT

A Model for Natural Language Attack on Text Classification and Inference

Official PyTorch implementation of "BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation" (NeurIPS 2021)

The final project of "Applying AI to EHR Data" of "AI for Healthcare" nanodegree - Udacity.