Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

Last update: Nov 17, 2022

Related tags

Overview

CMIC-Retrieval

Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021.

Introduction

In this work, we tackle the problem of single image-based 3D shape retrieval (IBSR), where we seek to find the most matched shape of a given single 2D image from a shape repository. Most of the existing works learn to embed 2D images and 3D shapes into a common feature space and perform metric learning using a triplet loss. Inspired by the great success in recent contrastive learning works on self-supervised representation learning, we propose a novel IBSR pipeline leveraging contrastive learning. We note that adopting such cross-modal contrastive learning between 2D images and 3D shapes into IBSR tasks is non-trivial and challenging: contrastive learning requires very strong data augmentation in constructed positive pairs to learn the feature invariance, whereas traditional metric learning works do not have this requirement. However, object shape and appearance are entangled in 2D query images, thus making the learning task more difficult than contrasting single-modal data. To mitigate the challenges, we propose to use multi-view grayscale rendered images from the 3D shapes as a shape representation. We then introduce a strong data augmentation technique based on color transfer, which can significantly but naturally change the appearance of the query image, effectively satisfying the need for contrastive learning. Finally, we propose to incorporate a novel category-level contrastive loss that helps distinguish similar objects from different categories, in addition to classic instance-level contrastive loss. Our experiments demonstrate that our approach achieves the best performance on all the three popular IBSR benchmarks, including Pix3D, Stanford Cars, and Comp Cars, outperforming the previous state-of-the-art from 4% - 15% on retrieval accuracy.

About this repository

This repository provides data, pre-trained models and code.

Citations

@inProceedings{lin2021cmic,
	title={Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning},
	author={Lin, Ming-Xian and Yang, Jie and Wang, He and Lai, Yu-Kun and Jia, Rongfei and Zhao, Binqiang and Gao, Lin},
	year={2021},
	booktitle={International Conference on Computer Vision (ICCV)}
}

Updates

[Oct 1, 2021] Preliminary version of Data and Code released. For more code and data, coming soon. Please follow our updates.

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

Related tags

Overview

CMIC-Retrieval

Introduction

About this repository

Citations

Updates

Owner

Deep learning models for classification of 15 common weeds in the southern U.S. cotton production systems.

Implementation of TimeSformer, a pure attention-based solution for video classification

An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Code of TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

DI-smartcross - Decision Intelligence Platform for Traffic Crossing Signal Control

You can draw the corresponding bounding box into the image and save it according to the result file (txt format) run by the tracker.

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences

Learning to Identify Top Elo Ratings with A Dueling Bandits Approach

Pairwise model for commonlit competition

Convert weight file.pth to weight file.blob

Multi Camera Calibration

Reinforcement learning models in ViZDoom environment

Official PyTorch implemention of our paper "Learning to Rectify for Robust Learning with Noisy Labels".

a Lightweight library for sequential learning agents, including reinforcement learning