PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Last update: Nov 17, 2022

Overview

Learning to Generate Grounded Visual Captions without Localization Supervision

This is the PyTorch implementation of our paper:

Learning to Generate Grounded Visual Captions without Localization Supervision
Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira
European Conference on Computer Vision (ECCV), 2020

[arXiv] [GitHub] [Project]

10-min YouTube Video

How to start

Clone the repo recursively:

git clone --recursive [email protected]:chihyaoma/cyclical-visual-captioning.git

If you didn't clone with the --recursive flag, then you'll need to manually clone the pybind submodule from the top-level directory:

git submodule update --init --recursive

Installation

The proposed cyclical method can be applied directly to image and video captioning tasks.

Currently, installation guide and our code for video captioning on the ActivityNet-Entities dataset are provided in anet-video-captioning.

Acknowledgments

Chih-Yao Ma and Zsolt Kira were partly supported by DARPA’s Lifelong Learning Machines (L2M) program, under Cooperative Agreement HR0011-18-2-0019, as part of their affiliation with Georgia Tech. We thank Chia-Jung Hsu for her valuable and artistic helps on the figures.

Citation

If you find this repository useful, please cite our paper:

@inproceedings{ma2020learning,
    title={Learning to Generate Grounded Image Captions without Localization Supervision},
    author={Ma, Chih-Yao and Kalantidis, Yannis and AlRegib, Ghassan and Vajda, Peter and Rohrbach, Marcus and Kira, Zsolt},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
    year={2020},
    url={https://arxiv.org/abs/1906.00283},
}

PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Related tags

Overview

Learning to Generate Grounded Visual Captions without Localization Supervision

10-min YouTube Video

How to start

Installation

Acknowledgments

Citation

Owner

Chih-Yao Ma

Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)

Differentiable Optimizers with Perturbations in Pytorch

PyTorch implementation of paper: HPNet: Deep Primitive Segmentation Using Hybrid Representations.

Code for the paper: Sketch Your Own GAN

An excellent hash algorithm combining classical sponge structure and RNN.

Voice assistant - Voice assistant with python

project page for VinVL

NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

[CVPRW 2021] Code for Region-Adaptive Deformable Network for Image Quality Assessment

SurfEmb (CVPR 2022) - SurfEmb: Dense and Continuous Correspondence Distributions

code for generating data set ES-ImageNet with corresponding training code

A Python implementation of global optimization with gaussian processes.

Official implementation for "Symbolic Learning to Optimize: Towards Interpretability and Scalability"

[NIPS 2021] UOTA: Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration.

This reposityory contains the PyTorch implementation of our paper "Generative Dynamic Patch Attack".

CVPRW 2021: How to calibrate your event camera

Code release for the paper “Worldsheet Wrapping the World in a 3D Sheet for View Synthesis from a Single Image”, ICCV 2021.

Point Cloud Registration using Representative Overlapping Points.

Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".