Swin Transformer

By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo.

This repo is the official implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows". The code will be coming soon.

Introduction

Swin Transformer is initially described in arxiv, which capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text. To address these differences, we propose a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size. These qualities of Swin Transformer make it compatible with a broad range of vision tasks, including image classification (86.4 top-1 accuracy on ImageNet-1K) and dense prediction tasks such as object detection (58.7 box AP and 51.1 mask AP on COCO test-dev) and semantic segmentation (53.5 mIoU on ADE20K val).

Citing Swin Transformer

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

a general-purpose Transformer based vision backbone

Related tags

Overview

Swin Transformer

Introduction

Citing Swin Transformer

Contributing

Trademarks

Owner

Microsoft

Replication package for the manuscript "Using Personality Detection Tools for Software Engineering Research: How Far Can We Go?" submitted to TOSEM

Codebase for "Revisiting spatio-temporal layouts for compositional action recognition" (Oral at BMVC 2021).

CLDF dataset derived from Robbeets et al.'s "Triangulation Supports Agricultural Spread" from 2021

An official implementation of "Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation" (CVPR 2021) in PyTorch.

Pytorch implementation of MaskFlownet

Live training loss plot in Jupyter Notebook for Keras, PyTorch and others

The object detection pipeline is based on Ultralytics YOLOv5

Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

Neural implicit reconstruction experiments for the Vector Neuron paper

Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020

Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language

This is just a funny project that we want to see AutoEncoder (AE) can actually work to enhance the features we want

"Domain Adaptive Semantic Segmentation without Source Data" (ACM MM 2021)

PyTorch Implementation of Vector Quantized Variational AutoEncoders.

A certifiable defense against adversarial examples by training neural networks to be provably robust

VLGrammar: Grounded Grammar Induction of Vision and Language

Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

Generic image compressor for machine learning. Pytorch code for our paper "Lossy compression for lossless prediction".

Incremental Cross-Domain Adaptation for Robust Retinopathy Screening via Bayesian Deep Learning

BMVC 2021: This is the github repository for "Few Shot Temporal Action Localization using Query Adaptive Transformers" accepted in British Machine Vision Conference (BMVC) 2021, Virtual