Puzzle-CAM: Improved localization via matching partial and full features.

Overview

PWC PWC

Puzzle-CAM

The official implementation of "Puzzle-CAM: Improved localization via matching partial and full features".

Citation

Please cite our paper if the code is helpful to your research. arxiv

@article{jo2021puzzle,
  title={Puzzle-CAM: Improved localization via matching partial and full features},
  author={Jo, Sanhyun and Yu, In-Jae},
  journal={arXiv preprint arXiv:2101.11253},
  year={2021}
}

Abstract

Weakly-supervised semantic segmentation (WSSS) is introduced to narrow the gap for semantic segmentation performance from pixel-level supervision to image-level supervision. Most advanced approaches are based on class activation maps (CAMs) to generate pseudo-labels to train the segmentation network. The main limitation of WSSS is that the process of generating pseudo-labels from CAMs which use an image classifier is mainly focused on the most discriminative parts of the objects. To address this issue, we propose Puzzle-CAM, a process minimizes the differences between the features from separate patches and the whole image. Our method consists of a puzzle module (PM) and two regularization terms to discover the most integrated region of in an object. Without requiring extra parameters, Puzzle-CAM can activate the overall region of an object using image-level supervision. In experiments, Puzzle-CAM outperformed previous state-of-the-art methods using the same labels for supervision on the PASCAL VOC 2012 test dataset.

Overview

Overall architecture


Prerequisite

  • Python 3.8, PyTorch 1.7.0, and more in requirements.txt
  • CUDA 10.1, cuDNN 7.6.5
  • 4 x Titan RTX GPUs

Usage

Install python dependencies

python3 -m pip install -r requirements.txt

Download PASCAL VOC 2012 devkit

Follow instructions in http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit

1. Train an image classifier for generating CAMs

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_classification_with_puzzle.py --architecture resnest101 --re_loss_option masking --re_loss L1_Loss --alpha_schedule 0.50 --alpha 4.00 --tag [email protected]@optimal --data_dir $your_dir

2. Apply Random Walk (RW) to refine the generated CAMs

2.1. Make affinity labels to train AffinityNet.

CUDA_VISIBLE_DEVICES=0 python3 inference_classification.py --architecture resnest101 --tag [email protected]@optimal --domain train_aug --data_dir $your_dir
python3 make_affinity_labels.py --experiment_name [email protected]@[email protected]@scale=0.5,1.0,1.5,2.0 --domain train_aug --fg_threshold 0.40 --bg_threshold 0.10 --data_dir $your_dir

2.2. Train AffinityNet.

CUDA_VISIBLE_DEVICES=0 python3 train_affinitynet.py --architecture resnest101 --tag [email protected]@Puzzle --label_name [email protected]@opt[email protected]@scale=0.5,1.0,1.5,[email protected]_fg=0.40_bg=0.10 --data_dir $your_dir

3. Train the segmentation model using the pseudo-labels

3.1. Make segmentation labels to train segmentation model.

CUDA_VISIBLE_DEVICES=0 python3 inference_rw.py --architecture resnest101 --model_name [email protected]@Puzzle --cam_dir [email protected]@op[email protected]@scale=0.5,1.0,1.5,2.0 --domain train_aug --data_dir $your_dir
python3 make_pseudo_labels.py --experiment_name [email protected]@[email protected]@[email protected][email protected] --domain train_aug --threshold 0.35 --crf_iteration 1 --data_dir $your_dir

3.2. Train segmentation model.

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_segmentation.py --backbone resnest101 --mode fix --use_gn True --tag [email protected]@[email protected] --label_name [email protected]@[email protected]@[email protected][email protected]@crf=1 --data_dir $your_dir

4. Evaluate the models

CUDA_VISIBLE_DEVICES=0 python3 inference_segmentation.py --backbone resnest101 --mode fix --use_gn True --tag [email protected]@[email protected] --scale 0.5,1.0,1.5,2.0 --iteration 10

python3 evaluate.py --experiment_name [email protected]@[email protected]@[email protected]=0.5,1.0,1.5,[email protected]=10 --domain val --data_dir $your_dir/SegmentationClass

5. Results

Qualitative segmentation results on the PASCAL VOC 2012 validation set. Top: original images. Middle: ground truth. Bottom: prediction of the segmentation model trained using the pseudo-labels from Puzzle-CAM. Overall architecture

Methods background aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor mIoU
Puzzle-CAM with ResNeSt-101 88.9 87.1 38.7 89.2 55.8 72.8 89.8 78.9 91.3 26.8 84.4 40.3 88.9 81.9 83.1 34.0 60.1 83.6 47.3 59.6 38.8 67.7
Puzzle-CAM with ResNeSt-269 91.1 87.2 37.3 86.8 61.4 71.2 92.2 86.2 91.8 28.6 85.0 64.1 91.8 82.0 82.5 70.7 69.4 87.7 45.4 67.0 37.7 72.2

For any issues, please contact Sanghyun Jo, [email protected]

Comments
  • ModuleNotFoundError: No module named 'core.sync_batchnorm'

    ModuleNotFoundError: No module named 'core.sync_batchnorm'

    `

    ModuleNotFoundError Traceback (most recent call last) in 1 from core.puzzle_utils import * ----> 2 from core.networks import * 3 from core.datasets import * 4 5 from tools.general.io_utils import *

    /working/PuzzleCAM/core/networks.py in 24 # Normalization 25 ####################################################################### ---> 26 from .sync_batchnorm.batchnorm import SynchronizedBatchNorm2d 27 28 class FixedBatchNorm(nn.BatchNorm2d):

    ModuleNotFoundError: No module named 'core.sync_batchnorm' `

    opened by Ashneo07 2
  • performance issue

    performance issue

    When I used the released weights for inference phase and evaluation, I found that the mIoU I got was different from the mIoU reported in the paper. I would like to ask whether this weight is corresponding to the paper, if it is, how to reproduce the result in your paper. Looking forward to your reply.

    PuzzleCAM PuzzleCAM2

    opened by linjiatai 0
  • Evaluation in classifier training is using supervised segmentation maps?

    Evaluation in classifier training is using supervised segmentation maps?

    Hello, thank you for the great repository! It's pretty impressive how organized it is.

    I have a critic (or maybe a question, in case I got it wrong) regarding the training of the classifier, though: I understand the importance of measuring and logging the mIoU during training (specially when creating the ablation section in your paper), however it doesn't strike me as correct to save the model with best mIoU. This procedural decision is based on fully supervised segmentation information, which should not be available for a truly weakly supervised problem; while resulting in a model better suited for segmentation. The paper doesn't address this. Am I right to assume all models were trained like this? Were there any trainings where other metrics were considered when saving the model (e.g. classification loss or Eq (7) in the paper)?

    opened by lucasdavid 0
  • error occured when image-size isn't 512 * n

    error occured when image-size isn't 512 * n

    dear author: I notice that if the image size isn't 512 x 512, it will have some error. I use image size 1280 x 496 and i got tensor size error at calculate puzzle module:the original feature is 31 dims and re_feature is 32 dims. So i have to change image size to 1280 x 512 and i work. So i think this maybe a little bug. It will better that you fixed it or add a notes in code~ Thanks for your job!

    opened by hazy-wu 0
  • the backbone of Affinitynet is resnet38. Why did you write resnet50?

    the backbone of Affinitynet is resnet38. Why did you write resnet50?

    In Table 2 of your paper, the backbone of Affinitynet is resnet38. Why did you write resnet50? After my experiment, I found that RW result reached 65.42% for Affinitynet which is based on resnet50 and higher than yours.

    opened by songyukino1 0
  • Ask for details of the training process!

    Ask for details of the training process!

    I am trying to train with ResNest101, and I also added affinity and RW. When I try to train, it runs according to the specified code. It is found that the obtained affinity labels are not effective, and the effect of pseudo_labels is almost invisible, which is close to the effect of all black. I don't know where the problem is, who can explain the details. help!

    opened by YuYue26 1
Releases(v1.0)
Owner
Sanghyun Jo
e-mail : [email protected] # DeepLearning #Computer Vision #AutoML #Se
Sanghyun Jo
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 01, 2023
State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

Fredrik Carlsson 88 Dec 30, 2022
Fast Differentiable Matrix Sqrt Root

Fast Differentiable Matrix Sqrt Root Geometric Interpretation of Matrix Square Root and Inverse Square Root This repository constains the official Pyt

YueSong 42 Dec 30, 2022
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Knodle (Knowledge-supervised Deep Learning Framework) - a new framework for weak supervision with neural networks. It provides a modularization for se

93 Nov 06, 2022
Large-Scale Unsupervised Object Discovery

Large-Scale Unsupervised Object Discovery Huy V. Vo, Elena Sizikova, Cordelia Schmid, Patrick Pérez, Jean Ponce [PDF] We propose a novel ranking-based

17 Sep 19, 2022
Keras Realtime Multi-Person Pose Estimation - Keras version of Realtime Multi-Person Pose Estimation project

This repository has become incompatible with the latest and recommended version of Tensorflow 2.0 Instead of refactoring this code painfully, I create

M Faber 769 Dec 08, 2022
Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation Input Image Initial CAM Successive Maps with adversar

Jungbeom Lee 110 Dec 07, 2022
OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages

OCR-Streamlit-App OCR Streamlit App is used to extract text from images using python's easyocr, pytorch and streamlit packages OCR app gets an image a

Siva Prakash 5 Apr 05, 2022
ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

ColossalAI-Examples This repository contains examples of training models with Co

HPC-AI Tech 185 Jan 09, 2023
[3DV 2020] PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction

PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction International Conference on 3D Vision, 2020 Sai Sagar Jinka1, Rohan

Rohan Chacko 39 Oct 12, 2022
U-Net: Convolutional Networks for Biomedical Image Segmentation

Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition, using Keras This tutorial shows how to use Keras library to build deep ne

Yihui He 401 Nov 21, 2022
FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data. Flexible EM-Inspired Discriminant Analysis is a robust supervised classification algorithm that performs well i

0 Sep 06, 2022
Automatically erase objects in the video, such as logo, text, etc.

Video-Auto-Wipe Read English Introduction:Here   本人不定期的基于生成技术制作一些好玩有趣的算法模型,这次带来的作品是“视频擦除”方向的应用模型,它实现的功能是自动感知到视频中我们不想看见的部分(譬如广告、水印、字幕、图标等等)然后进行擦除。由于图标擦

seeprettyface.com 141 Dec 26, 2022
DLFlow is a deep learning framework.

DLFlow是一套深度学习pipeline,它结合了Spark的大规模特征处理能力和Tensorflow模型构建能力。利用DLFlow可以快速处理原始特征、训练模型并进行大规模分布式预测,十分适合离线环境下的生产任务。利用DLFlow,用户只需专注于模型开发,而无需关心原始特征处理、pipeline构建、生产部署等工作。

DiDi 152 Oct 27, 2022
SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity

SSD: Single Shot MultiBox Detector Introduction Here is my pytorch implementation of 2 models: SSD-Resnet50 and SSDLite-MobilenetV2.

Viet Nguyen 149 Jan 07, 2023
Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

(ACMMM 2021 Oral) SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment This repository shows two tasks: Face landmark detection and Fac

BoomStar 51 Dec 13, 2022
Material for my PyConDE & PyData Berlin 2022 Talk "5 Steps to Speed Up Your Data-Analysis on a Single Core"

5 Steps to Speed Up Your Data-Analysis on a Single Core Material for my talk at the PyConDE & PyData Berlin 2022 Description Your data analysis pipeli

Jonathan Striebel 9 Dec 12, 2022
TuckER: Tensor Factorization for Knowledge Graph Completion

TuckER: Tensor Factorization for Knowledge Graph Completion This codebase contains PyTorch implementation of the paper: TuckER: Tensor Factorization f

Ivana Balazevic 296 Dec 06, 2022
Anchor-free Oriented Proposal Generator for Object Detection

Anchor-free Oriented Proposal Generator for Object Detection Gong Cheng, Jiabao Wang, Ke Li, Xingxing Xie, Chunbo Lang, Yanqing Yao, Junwei Han, Intro

jbwang1997 56 Nov 15, 2022
Paper: Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification

Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification T M Feroz Ali, Subhasis Chaudhuri, ICVGIP-20-21

T M Feroz Ali 3 Jun 17, 2022