Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Last update: Dec 21, 2022

Overview

Reading list in Transformer

We are a team from KAUST Vision-CAIR group and focus on the Multi-modal representation learning.

This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of Vision Transformer, NLP and multi-modal, etc.

Recent News

CVPR multi-modal papers are collected in here

The code of VisualGPT is open sourced. They can be found here

The code and paper of LeViT is open sourced. They can be found here

The paper MLP-Mixer: An all-MLP Architecture for Vision is availble here

The code and paper of MDTER is open sourced. They can be found here

The code and papper of RelTransformer is open sourced. They can be found here

The code and paper of Twins-SVT is open sourced. They can be found here

Vision Transformer for deepfake detection. They can be found here

The code of VideoGPT is open sourced. They can be found here

The code of CoaT is open sourced. They can be found here

The code of Kaleido-BERT is open sourced. They can be found here

The code of TimeSformer is open sourced. They can be found here

The code of SwinTransformer is open sourced. They can be found here

Topics (paper and code)

Review Paper in multi-modal

Video-language

Tutorials and workshop

Datasets

Multi-modal Datasets

Blogs

Lil's blogs

Tools

PyTorchVideo a deep learning library for video understanding research
horovod a tool for multi-gpu parallel processing
accelerate an easy API for mixed precision and any kind of distributed computing
hyperparameter search: optuna
AI Conference Deadlines

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Related tags

Overview

Reading list in Transformer

Recent News

Topics (paper and code)

Tutorials and workshop

Datasets

Blogs

Tools

Owner

Jun Chen

Implementation of Fast Transformer in Pytorch

Easy genetic ancestry predictions in Python

This repository contains source code for the Situated Interactive Language Grounding (SILG) benchmark

Rule-based Customer Segmentation

InsightFace: 2D and 3D Face Analysis Project on MXNet and PyTorch

This porject is intented to build the most accurate model for predicting the porbability of loan default

PyTorch implementation of "Efficient Neural Architecture Search via Parameters Sharing"

Car Parking Tracker Using OpenCv

Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

Over-the-Air Ensemble Inference with Model Privacy

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

TensorFlow implementation of "Learning from Simulated and Unsupervised Images through Adversarial Training"

RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth, in ICCV 2021 (oral)

Run object detection model on the Raspberry Pi

A Pytorch loader for MVTecAD dataset.

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

Deep Learning Theory

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

🌾 PASTIS 🌾 Panoptic Agricultural Satellite TIme Series

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Related tags

Overview

Reading list in Transformer

Recent News

Topics (paper and code)

Tutorials and workshop

Datasets

Blogs

Tools

Owner

Jun Chen

Implementation of Fast Transformer in Pytorch

Easy genetic ancestry predictions in Python

This repository contains source code for the Situated Interactive Language Grounding (SILG) benchmark

Rule-based Customer Segmentation

InsightFace: 2D and 3D Face Analysis Project on MXNet and PyTorch

This porject is intented to build the most accurate model for predicting the porbability of loan default

PyTorch implementation of "Efficient Neural Architecture Search via Parameters Sharing"

Car Parking Tracker Using OpenCv

Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

Over-the-Air Ensemble Inference with Model Privacy

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

TensorFlow implementation of "Learning from Simulated and Unsupervised Images through Adversarial Training"

RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth, in ICCV 2021 (oral)

Run object detection model on the Raspberry Pi

A Pytorch loader for MVTecAD dataset.

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

Deep Learning Theory

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

🌾 PASTIS 🌾 Panoptic Agricultural Satellite TIme Series

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务