Unofficial implementation of Pix2SEQ

Overview

Unofficial-Pix2seq: A Language Modeling Framework for Object Detection

Unofficial implementation of Pix2SEQ. Please use this code with causion. Many implemtation details are not following original paper and significantly simplified.

Aim

This project aims for a step by step replication of Pix2Seq starting from DETR codebase.

Step 1

Starting from DETR, we add bounding box quantization over normalized coordinate, sequence generator from normalized coordinate, auto-regressive decoder and training code for Pix2SEQ.

How to use?

Install packages following original DETR and command line is same as DETR.

By setting image size to 512, each epoch takes 3 minutes on 8 A100 GPU.

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path ../../data/coco/

Released at 8pm, 26th, Seq

Problem to be solved : 1) better logging 2) correct padding, end of sentence, start of sentence token 3) efficient padding 4) better organization of code 5) fixed order of bounding box 6) shared dictionary between position and category

Released at 10pm, 26th, Seq

Problem to be solved: 1) better organization of code 2) fixed order of bounding box

Step 2

Finish inference code of pix2seq and report performance on object detection benchmark. Note that we are going to write an inefficent greedy decoding. The progress can be significantly accelerated by following cache previous state in Fairseq. The quality can be improved by nucleus sampling and beam search. We leave these complex but engineering tricks for future implementation and keep the project as simple as possible for understanding language modeling object detection.

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path ../../data/coco/  --eval --resume checkpoint.pth --batch_size 4

After 30 epoches training, our replication of pix2seq can achieve 12.1 mAP on MSCOCO. Image resolution 512 for fast training.

COCO bbox detection val5k evaluation with maximum 25 boundingx box predictions (Original paper 100 bounding box):

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.121
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.239
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.107
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.007
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.091
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.267
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.144
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.166
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.166
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.011
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.128
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.350

After 107 epoches training, our replication of pix2seq can achieve 17.9 mAP on MSCOCO. Image resolution 512 for fast training. Checkpoint can be downloaded at here.

COCO bbox detection val5k evaluation with maximum 25 boundingx box predictions (Original paper 100 bounding box):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.179
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.314
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.177
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.021
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.157
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.375
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.191
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.233
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.233
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.028
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.210
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.469

Observation

(1). The sequence is tend to generate End of Sentence(EOS) early. After generating EOS token, langauge modeling will still genrate boudning box. (2). Repeatable sequence which is a common problem in seq2seq modeling.

Released at 10am, 28th, Seq

Problem to be solved : 1). Add sequence likelihood evaluationn on validation dataset. 2) Better organization of code. 3) FP16 support. 4) Beam Search

Step 3

Add tricks proposed in Pix2SEQ like droplayer, bounding box augmentation, multiple crop augmentation and so on.

Acknowledegement

This codebase heavily borrow from DETR, CART, minGPT and Fairseq and motivated by the method explained in Pix2Seq

AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages

AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages This repository contains the code for the pa

Kelechi 40 Nov 24, 2022
Python Implementation of Chess Playing AI with variable difficulty

Chess AI with variable difficulty level implemented using the MiniMax AB-Pruning Algorithm

Ali Imran 7 Feb 20, 2022
Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud

Google Cloud Vertex AI Samples Welcome to the Google Cloud Vertex AI sample repository. Overview The repository contains notebooks and community conte

Google Cloud Platform 560 Dec 31, 2022
Out-of-boundary View Synthesis towards Full-frame Video Stabilization

Out-of-boundary View Synthesis towards Full-frame Video Stabilization Introduction | Update | Results Demo | Introduction This repository contains the

25 Oct 10, 2022
Baseline and template code for node21 detection track

Nodule Detection Algorithm This codebase implements a baseline model, Faster R-CNN, for the nodule detection track in NODE21. It contains all necessar

node21challenge 11 Jan 15, 2022
Monitor your ML jobs on mobile devices📱, especially for Google Colab / Kaggle

TF Watcher TF Watcher is a simple to use Python package and web app which allows you to monitor 👀 your Machine Learning training or testing process o

Rishit Dagli 54 Nov 01, 2022
CR-Fill: Generative Image Inpainting with Auxiliary Contextual Reconstruction. ICCV 2021

crfill Usage | Web App | | Paper | Supplementary Material | More results | code for paper ``CR-Fill: Generative Image Inpainting with Auxiliary Contex

182 Dec 20, 2022
Deeprl - Standard DQN and dueling network for simple games

DeepRL This code implements the standard deep Q-learning and dueling network with experience replay (memory buffer) for playing simple games. DQN algo

Yao Zhou 6 Apr 12, 2020
Implementation of DropLoss for Long-Tail Instance Segmentation in Pytorch

[AAAI 2021]DropLoss for Long-Tail Instance Segmentation [AAAI 2021] DropLoss for Long-Tail Instance Segmentation Ting-I Hsieh*, Esther Robb*, Hwann-Tz

Tim 37 Dec 02, 2022
Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Multi-modal Interaction Graph Convolutioal Network for Temporal Language Localization in Videos Official implementation for Multi-Modal Interaction Gr

Zongmeng Zhang 15 Oct 18, 2022
PyG (PyTorch Geometric) - A library built upon PyTorch to easily write and train Graph Neural Networks (GNNs)

PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data.

PyG 16.5k Jan 08, 2023
Code for Deep Single-image Portrait Image Relighting

Deep Single-Image Portrait Relighting [Project Page] Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, David W. Jacobs. In ICCV, 2019 Overview Test script for

438 Jan 05, 2023
The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Easy-to-use toolkit for retrieval-based Chatbot Recent Activity Our released RRS corpus can be found here. Our released BERT-FP post-training checkpoi

GMFTBY 32 Nov 13, 2022
Code for paper: Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

Group-CAM By Zhang, Qinglong and Rao, Lu and Yang, Yubin [State Key Laboratory for Novel Software Technology at Nanjing University] This repo is the o

zhql 98 Nov 16, 2022
Breast Cancer Classification Model is applied on a different dataset

Breast Cancer Classification Model is applied on a different dataset

1 Feb 04, 2022
Data and code from COVID-19 machine learning paper

Machine learning approaches for localized lockdown, subnotification analysis and cases forecasting in São Paulo state counties during COVID-19 pandemi

Sara Malvar 4 Dec 22, 2022
YOLOX + ROS(1, 2) object detection package

YOLOX + ROS(1, 2) object detection package

Ar-Ray 158 Dec 21, 2022
Learning to Map Large-scale Sparse Graphs on Memristive Crossbar

Release of AutoGMap:Learning to Map Large-scale Sparse Graphs on Memristive Crossbar For reproduction of our searched model, the Ubuntu OS is recommen

2 Aug 23, 2022
Minimalist Error collection Service compatible with Rollbar clients. Sentry or Rollbar alternative.

Minimalist Error collection Service Features Compatible with any Rollbar client(see https://docs.rollbar.com/docs). Just change the endpoint URL to yo

Haukur Rósinkranz 381 Nov 11, 2022
Public repository containing materials used for Feed Forward (FF) Neural Networks article.

Art041_NN_Feed_Forward Public repository containing materials used for Feed Forward (FF) Neural Networks article. -- Illustration of a very simple Fee

SolClover 2 Dec 29, 2021