Simple image captioning model - CLIP prefix captioning.

Overview

CLIP prefix captioning.


Inference Notebook:

🥳 New: 🥳 Integrated to Huggingface Spaces with Gradio. See demo: Hugging Face Spaces

🥳 New: 🥳 Run it in the browser using replicate.ai UI

Description

Image captioning is a complicated task, where usually a pretrained detection network is used, requires additional supervision in the form of object annotation. The features of the detected objects are then fed to an additional network that is trained to output the correct caption. We present a new approach that does not requires additional information (i.e. requires only images and captions), thus can be applied to any data. In addition, our model's training time is much faster than similar methods while achieving close to state-of-the-art results, even for the Conceptual Captions dataset contains over 3M images.

In our work, we use the CLIP model, which was already trained over an extremely large number of images, thus is capable of generating semantic encodings for arbitrary images without additional supervision. To produce meaningful sentences we fine-tune a pretrained language model, which has been proven to be successful for other natural language tasks. The key idea is to use the CLIP encoding as a prefix to the textual captions by employing a simple Multi-Layer Perceptron (MLP) over the raw encoding, and then fine-tune our language model to generate a valid caption.

COCO Examples

A couple of people standing next to an elephant. A wooden table sitting in front of a window. A bunch of bananas sitting on top of a table.
A woman holding a plate with a piece of cake in front of her face. A wooden table topped with lots of wooden utensils. A red motorcycle parked on top of a dirt field.

Conceptual Captions Examples

3D render of a man holding a globe. Students enjoing the cherry blossoms Green leaf of lettuce on a white plate.
The hotel and casino on the waterfront. The triangle is a symbol of the soul. Cartoon boy in the bath.

Inference Notebooks

To help visualize the results we provide a Colab notebook found in notebooks/clip_prefix_captioning_inference.ipynb.
The notebook will download the pretrained models and run inference on a sample images or on images of your choosing. It is recommended to run this in Google Colab. Both COCO and Conceptual Captions pretrained models are available.

Inference GUI

Run it in the browser using replicate.ai UI.

COCO training

Clone, create environment and install dependencies:

git clone https://github.com/rmokady/CLIP_prefix_caption && cd CLIP_prefix_caption
conda env create -f environment.yml
conda activate clip_prefix_caption

Download train_captions to data/coco/annotations.

Download training images and validation images and unzip (We use Karpathy et el. split).

Extract CLIP features using (output is data/coco/oscar_split_train.pkl):

python parse_coco.py

Train:

python train.py --data ./data/coco/oscar_split_train.pkl --out_dir ./coco_train/

Qualitative results

COCO dataset

Method [email protected] [email protected] [email protected] [email protected] METEOR ROUGE-L CIDEr SPICE
Oscar* 75.59 60.09 46.89 36.58 30.40 58.56 124.12 23.17
Ours 74.12 57.40 43.11 32.15 27.10 55.02 108.35 20.12

* uses additional object annotations for training.

Conceptual Captions dataset

Method ROUGE-L CIDEr SPICE
VLP 24.35 77.57 16.59
Ours 26.71 87.26 18.5

Acknowledgments

This project was created by Ron Mokady and Amir Hertz for the Advanced-NLP course by Omer Levy @ TAU. This repository is heavily based on CLIP and Hugging-faces repositories. For training we used the data of COCO dataset and Conceptual Captions. The project was also inspired from this paper.

Contact

For any inquiry please contact us at our email addresses: [email protected] or [email protected].

Hybrid Neural Fusion for Full-frame Video Stabilization

FuSta: Hybrid Neural Fusion for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 430 Jan 04, 2023
🚀 PyTorch Implementation of "Progressive Distillation for Fast Sampling of Diffusion Models(v-diffusion)"

PyTorch Implementation of "Progressive Distillation for Fast Sampling of Diffusion Models(v-diffusion)" Unofficial PyTorch Implementation of Progressi

Vitaliy Hramchenko 58 Dec 19, 2022
Random-Afg - Afghanistan Random Old Idz Cloner Tools

AFGHANISTAN RANDOM OLD IDZ CLONER TOOLS Install $ apt update $ apt upgrade $ apt

MAHADI HASAN AFRIDI 5 Jan 26, 2022
Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV

Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV File YOLOv3 weight can be downloaded

Ngoc Quyen Ngo 2 Mar 27, 2022
Reinforcement learning framework and algorithms implemented in PyTorch.

Reinforcement learning framework and algorithms implemented in PyTorch.

Robotic AI & Learning Lab Berkeley 2.1k Jan 04, 2023
Python scripts for performing lane detection using the LSTR model in ONNX

ONNX LSTR Lane Detection Python scripts for performing lane detection using the Lane Shape Prediction with Transformers (LSTR) model in ONNX. Requirem

Ibai Gorordo 29 Aug 30, 2022
Implementation of our NeurIPS 2021 paper "A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs".

PPO-BiHyb This is the official implementation of our NeurIPS 2021 paper "A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Grap

<a href=[email protected]"> 66 Nov 23, 2022
Library for fast text representation and classification.

fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme

Facebook Research 24.1k Jan 01, 2023
Transformer model implemented with Pytorch

transformer-pytorch Transformer model implemented with Pytorch Attention is all you need-[Paper] Architecture Self-Attention self_attention.py class

Mingu Kang 12 Sep 03, 2022
An All-MLP solution for Vision, from Google AI

MLP Mixer - Pytorch An All-MLP solution for Vision, from Google AI, in Pytorch. No convolutions nor attention needed! Yannic Kilcher video Install $ p

Phil Wang 784 Jan 06, 2023
This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.

Lite-HRNet: A Lightweight High-Resolution Network Introduction This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution

HRNet 675 Dec 25, 2022
Two types of Recommender System : Content-based Recommender System and Colaborating filtering based recommender system

Recommender-Systems Two types of Recommender System : Content-based Recommender System and Colaborating filtering based recommender system So the data

Yash Kumar 0 Jan 20, 2022
The second project in Python course on FCC

Assignment Write a function named add_time that takes in two required parameters and one optional parameter: a start time in the 12-hour clock format

Denise T 1 Dec 13, 2021
Some pvbatch (paraview) scripts for postprocessing OpenFOAM data

pvbatchForFoam Some pvbatch (paraview) scripts for postprocessing OpenFOAM data For every script there is a help message available: pvbatch pv_state_s

Morev Ilya 2 Oct 26, 2022
Pytorch implement of 'Unmixing based PAN guided fusion network for hyperspectral imagery'

Pgnet There's a improved version compared with the publication in Tgrs with the modification in the deduction of the PDIN block: https://arxiv.org/abs

5 Jul 01, 2022
NAS-FCOS: Fast Neural Architecture Search for Object Detection (CVPR 2020)

NAS-FCOS: Fast Neural Architecture Search for Object Detection This project hosts the train and inference code with pretrained model for implementing

Ning Wang 180 Dec 06, 2022
Graph Analysis From Scratch

Graph Analysis From Scratch Goal In this notebook we wanted to implement some functionalities to analyze a weighted graph only by using algorithms imp

Arturo Ghinassi 0 Sep 17, 2022
The repository contains reproducible PyTorch source code of our paper Generative Modeling with Optimal Transport Maps, ICLR 2022.

Generative Modeling with Optimal Transport Maps The repository contains reproducible PyTorch source code of our paper Generative Modeling with Optimal

Litu Rout 30 Dec 22, 2022
We have made you a wrapper you can't refuse

We have made you a wrapper you can't refuse We have a vibrant community of developers helping each other in our Telegram group. Join us! Stay tuned fo

20.6k Jan 09, 2023