Official implementation of TMANet.

Related tags

Deep LearningTMANet
Overview

Temporal Memory Attention for Video Semantic Segmentation, arxiv

PWC PWC

Introduction

We propose a Temporal Memory Attention Network (TMANet) to adaptively integrate the long-range temporal relations over the video sequence based on the self-attention mechanism without exhaustive optical flow prediction. Our method achieves new state-of-the-art performances on two challenging video semantic segmentation datasets, particularly 80.3% mIoU on Cityscapes and 76.5% mIoU on CamVid with ResNet-50. (Accepted by ICIP2021)

If this codebase is helpful for you, please consider give me a star ⭐ 😊 .

image

Updates

2021/1: TMANet training and evaluation code released.

2021/6: Update README.md:

  • adding some Camvid dataset download links;
  • update 'camvid_video_process.py' script.

Usage

  • Install mmseg

    • Please refer to mmsegmentation to get installation guide.
    • This repository is based on mmseg-0.7.0 and pytorch 1.6.0.
  • Clone the repository

    git clone https://github.com/wanghao9610/TMANet.git
    cd TMANet
    pip install -e .
  • Prepare the datasets

    • Download Cityscapes dataset and Camvid dataset.

    • For Camvid dataset, we need to extract frames from downloaded videos according to the following steps:

      • Download the raw video from here, in which I provide a google drive link to download.
      • Put the downloaded raw video(e.g. 0016E5.MXF, 0006R0.MXF, 0005VD.MXF, 01TP_extract.avi) to ./data/camvid/raw .
      • Download the extracted images and labels from here and split.txt file from here, untar the tar.gz file to ./data/camvid , and we will get two subdirs "./data/camvid/images" (stores the images with annotations), and "./data/camvid/labels" (stores the ground truth for semantic segmentation). Reference the following shell command:
        cd TMANet
        cd ./data/camvid
        wget https://drive.google.com/file/d/1FcVdteDSx0iJfQYX2bxov0w_j-6J7plz/view?usp=sharing
        # or first download on your PC then upload to your server.
        tar -xf camvid.tar.gz 
      • Generate image_sequence dir frame by frame from the raw videos. Reference the following shell command:
        cd TMANet
        python tools/convert_datasets/camvid_video_process.py
    • For Cityscapes dataset, we need to request the download link of 'leftImg8bit_sequence_trainvaltest.zip' from Cityscapes dataset official webpage.

    • The converted/downloaded datasets store on ./data/camvid and ./data/cityscapes path.

      File structure of video semantic segmentation dataset is as followed.

      β”œβ”€β”€ data                                              β”œβ”€β”€ data                              
      β”‚   β”œβ”€β”€ cityscapes                                    β”‚   β”œβ”€β”€ camvid                        
      β”‚   β”‚   β”œβ”€β”€ gtFine                                    β”‚   β”‚   β”œβ”€β”€ images                    
      β”‚   β”‚   β”‚   β”œβ”€β”€ train                                 β”‚   β”‚   β”‚   β”œβ”€β”€ xxx{img_suffix}       
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ xxx{img_suffix}                   β”‚   β”‚   β”‚   β”œβ”€β”€ yyy{img_suffix}       
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ yyy{img_suffix}                   β”‚   β”‚   β”‚   β”œβ”€β”€ zzz{img_suffix}       
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ zzz{img_suffix}                   β”‚   β”‚   β”œβ”€β”€ annotations               
      β”‚   β”‚   β”‚   β”œβ”€β”€ val                                   β”‚   β”‚   β”‚   β”œβ”€β”€ train.txt             
      β”‚   β”‚   β”œβ”€β”€ leftImg8bit                               β”‚   β”‚   β”‚   β”œβ”€β”€ val.txt               
      β”‚   β”‚   β”‚   β”œβ”€β”€ train                                 β”‚   β”‚   β”‚   β”œβ”€β”€ test.txt              
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ xxx{seg_map_suffix}               β”‚   β”‚   β”œβ”€β”€ labels                    
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ yyy{seg_map_suffix}               β”‚   β”‚   β”‚   β”œβ”€β”€ xxx{seg_map_suffix}   
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ zzz{seg_map_suffix}               β”‚   β”‚   β”‚   β”œβ”€β”€ yyy{seg_map_suffix}   
      β”‚   β”‚   β”‚   β”œβ”€β”€ val                                   β”‚   β”‚   β”‚   β”œβ”€β”€ zzz{seg_map_suffix}   
      β”‚   β”‚   β”œβ”€β”€ leftImg8bit_sequence                      β”‚   β”‚   β”œβ”€β”€ image_sequence            
      β”‚   β”‚   β”‚   β”œβ”€β”€ train                                 β”‚   β”‚   β”‚   β”œβ”€β”€ xxx{sequence_suffix}  
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ xxx{sequence_suffix}              β”‚   β”‚   β”‚   β”œβ”€β”€ yyy{sequence_suffix}  
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ yyy{sequence_suffix}              β”‚   β”‚   β”‚   β”œβ”€β”€ zzz{sequence_suffix}  
      β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ zzz{sequence_suffix}              
      β”‚   β”‚   β”‚   β”œβ”€β”€ val                                   
      
  • Evaluation

    • Download the trained models for Cityscapes and Camvid. And put them on ./work_dirs/{config_file}
    • Run the following command(on Cityscapes):
    sh eval.sh configs/video/cityscapes/tmanet_r50-d8_769x769_80k_cityscapes_video.py
  • Training

    • Please download the pretrained ResNet-50 model, and put it on ./init_models .
    • Run the following command(on Cityscapes):
    sh train.sh configs/video/cityscapes/tmanet_r50-d8_769x769_80k_cityscapes_video.py

    Note: the above evaluation and training shell commands execute on Cityscapes, if you want to execute evaluation or training on Camvid, please replace the config file on the shell command with the config file of Camvid.

Citation

If you find TMANet is useful in your research, please consider citing:

@misc{wang2021temporal,
    title={Temporal Memory Attention for Video Semantic Segmentation}, 
    author={Hao Wang and Weining Wang and Jing Liu},
    year={2021},
    eprint={2102.08643},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgement

Thanks mmsegmentation contribution to the community!

Owner
wanghao
wanghao
Active window border replacement for window managers.

xborder Active window border replacement for window managers. Usage git clone https://github.com/deter0/xborder cd xborder chmod +x xborders ./xborder

deter 250 Dec 30, 2022
Your interactive network visualizing dashboard

Your interactive network visualizing dashboard Documentation: Here What is Jaal Jaal is a python based interactive network visualizing tool built usin

Mohit 177 Jan 04, 2023
SeMask: Semantically Masked Transformers for Semantic Segmentation.

SeMask: Semantically Masked Transformers Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi This repo co

Picsart AI Research (PAIR) 186 Dec 30, 2022
Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains This repository contains the source code for an end-to-end open-domain question

7 Sep 27, 2022
High frequency AI based algorithmic trading module.

Flow Flow is a high frequency algorithmic trading module that uses machine learning to self regulate and self optimize for maximum return. The current

59 Dec 14, 2022
πŸ”… Shapash makes Machine Learning models transparent and understandable by everyone

πŸŽ‰ What's new ? Version New Feature Description Tutorial 1.6.x Explainability Quality Metrics To help increase confidence in explainability methods, y

MAIF 2.1k Dec 27, 2022
Deep Sketch-guided Cartoon Video Inbetweening

Cartoon Video Inbetweening Paper | DOI | Video The source code of Deep Sketch-guided Cartoon Video Inbetweening by Xiaoyu Li, Bo Zhang, Jing Liao, Ped

Xiaoyu Li 37 Dec 22, 2022
"Domain Adaptive Semantic Segmentation without Source Data" (ACM MM 2021)

LDBE Pytorch implementation for two papers (the paper will be released soon): "Domain Adaptive Semantic Segmentation without Source Data", ACM MM2021.

benfour 16 Sep 28, 2022
Deep Markov Factor Analysis (NeurIPS2021)

Deep Markov Factor Analysis (DMFA) Codes and experiments for deep Markov factor analysis (DMFA) model accepted for publication at NeurIPS2021: A. Farn

Sarah Ostadabbas 2 Dec 16, 2022
A hobby project which includes a hand-gesture based virtual piano using a mobile phone camera and OpenCV library functions

Overview This is a hobby project which includes a hand-gesture controlled virtual piano using an android phone camera and some OpenCV library. My moti

Abhinav Gupta 1 Nov 19, 2021
Face Detection & Age Gender & Expression & Recognition

Face Detection & Age Gender & Expression & Recognition

Sajjad Ayobi 188 Dec 28, 2022
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

MultiModal-InfoMax This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Informa

Deep Cognition and Language Research (DeCLaRe) Lab 89 Dec 26, 2022
Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

Flood Detection Challenge This repository contains code for our submission to the ETCI 2021 Competition on Flood Detection (Winning Solution #2). Acco

Siddha Ganju 108 Dec 28, 2022
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

OFA Sys 1.4k Jan 08, 2023
Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Dense Contrastive Learning for Self-Supervised Visual Pre-Training This project hosts the code for implementing the DenseCL algorithm for se

Xinlong Wang 491 Jan 03, 2023
Coursera - Quiz & Assignment of Coursera

Coursera Assignments This repository is aimed to help Coursera learners who have difficulties in their learning process. The quiz and programming home

ζ΅…ζ’¦ 828 Jan 04, 2023
Code for the paper "Multi-task problems are not multi-objective"

Multi-Task problems are not multi-objective This is the code for the paper "Multi-Task problems are not multi-objective" in which we show that the com

Michael Ruchte 5 Aug 19, 2022
nfelo: a power ranking, prediction, and betting model for the NFL

nfelo nfelo is a power ranking, prediction, and betting model for the NFL. Nfelo take's 538's Elo framework and further adapts it for the NFL, hence t

6 Nov 22, 2022
Attentional Focus Modulates Automatic Finger‑tapping Movements

"Attentional Focus Modulates Automatic Finger‑tapping Movements", in Scientific Reports

Xingxun Jiang 1 Dec 02, 2021
Robust Lane Detection via Expanded Self Attention (WACV 2022)

Robust Lane Detection via Expanded Self Attention (WACV 2022) Minhyeok Lee, Junhyeop Lee, Dogyoon Lee, Woojin Kim, Sangwon Hwang, Sangyoun Lee Overvie

Min Hyeok Lee 18 Nov 12, 2022