Replication of Pix2Seq with Pretrained Model

Last update: Nov 22, 2022

Related tags

Overview

Pretrained-Pix2Seq

We provide the pre-trained model of Pix2Seq. This version contains new data augmentation. The model is trained for 300 epochs and can acheive 37 mAP without beam search or neucles search.

Installation

Install PyTorch 1.5+ and torchvision 0.6+ (recommend torch1.8.1 torchvision 0.8.0)

Install pycocotools (for evaluation on COCO):

pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

That's it, should be good to train and evaluate detection models.

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

First link coco dataset to the project folder

ln -s /path/to/coco ./coco

Training

sh train.sh --model pix2seq --output_dir /path/to/save

Evaluation

sh train.sh --model pix2seq --output_dir /path/to/save --resume /path/to/checkpoints --eval

COCO

Method	backbone	Epoch	Batch Size	AP	AP50	AP75	Weights
Pix2Seq	R50	300	32	37.0	53.4	39.4	weight

Contributor

Qiu Han, Peng Gao, Jingqiu Zhou(Beam Search)

Acknowledegement

Pix2Seq, DETR

Replication of Pix2Seq with Pretrained Model

Related tags

Overview

Pretrained-Pix2Seq

Installation

Data preparation

Training

COCO

Contributor

Acknowledegement

Owner

peng gao

Deep Implicit Moving Least-Squares Functions for 3D Reconstruction

Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

Testbed of AI Systems Quality Management

A machine learning package for streaming data in Python. The other ancestor of River.

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

TigerLily: Finding drug interactions in silico with the Graph.

Uncertain natural language inference

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Feature board for ERPNext

Source code of article "Towards Toxic and Narcotic Medication Detection with Rotated Object Detector"

Artifacts for paper "MMO: Meta Multi-Objectivization for Software Configuration Tuning"

Implementation of DropLoss for Long-Tail Instance Segmentation in Pytorch

Indonesian Car License Plate Character Recognition using Tensorflow, Keras and OpenCV.

Recurrent Scale Approximation (RSA) for Object Detection

A visualization tool to show a TensorFlow's graph like TensorBoard

LibFewShot: A Comprehensive Library for Few-shot Learning.

[BMVC2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations"

Graph neural network message passing reframed as a Transformer with local attention

U-Net: Convolutional Networks for Biomedical Image Segmentation