[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Last update: Dec 12, 2022

Overview

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021.

Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yang, Timothy Hospedales, Tao Xiang, Yi-Zhe Song, “Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Abstract

Self-supervised learning has gained prominence due to its efficacy at learning powerful representations from unlabelled data that achieve excellent performance on many challenging downstream tasks. However, supervision-free pre-text tasks are challenging to design and usually modality specific. Although there is a rich literature of self-supervised methods for either spatial (such as images) or temporal data (sound or text) modalities, a common pre-text task that benefits both modalities is largely missing. In this paper, we are interested in defining a self-supervised pre-text task for sketches and handwriting data. This data is uniquely characterised by its existence in dual modalities of rasterized images and vector coordinate sequences. We address and exploit this dual representation by proposing two novel cross-modal translation pre-text tasks for self-supervised feature learning: Vectorization and Rasterization. Vectorization learns to map image space to vector coordinates and rasterization maps vector coordinates to image space. We show that our learned encoder modules benefit both raster-based and vector-based downstream approaches to analysing hand-drawn data. Empirical evidence shows that our novel pre-text tasks surpass existing single and multi-modal self-supervision methods.

Outline

Figure: Schematic of our proposed self-supervised method for sketches. Vectorization drives representation learning for sketch images; rasterization is the pre-text task for sketch vectors.

Architecture

Figure: Illustration of the architecture used for our self-supervised task for sketches and handwritten data (a,c), and how it can subsequently be adopted for downstream tasks (b,d). Vectorization involves translating sketch image to sketch vector (a), and the convolutional encoder used in the vectorization process acts as a feature extractor over sketch images for downstream tasks (b). On the other side, rasterization converts sketch vector to sketch image (c), and provides an encoding for vector-based recognition tasks downstream (d).

Citation

If you find this article useful in your research, please consider citing:

@InProceedings{sketch2vec,
author = {Ayan Kumar Bhunia and Pinaki Nath Chowdhury and Yongxin Yang and Timothy Hospedales and Tao Xiang and Yi-Zhe Song},
title = {Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021}
}

** More polished code is coming **

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Related tags

Overview

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021.

Abstract

Outline

Architecture

Citation

Work done at SketchX Lab, CVSSP, University of Surrey.

Owner

Ayan Kumar Bhunia

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA

Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.

AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models

An LSTM for time-series classification

🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

SiT: Self-supervised vIsion Transformer

This thesis is mainly concerned with state-space methods for a class of deep Gaussian process (DGP) regression problems

Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

Capstone-Project-2 - A game program written in the Python language

This is the reference implementation for "Coresets via Bilevel Optimization for Continual Learning and Streaming"

This project aims to segment 4 common retinal lesions from Fundus Images.

Data manipulation and transformation for audio signal processing, powered by PyTorch

EASY - Ensemble Augmented-Shot Y-shaped Learning: State-Of-The-Art Few-Shot Classification with Simple Ingredients.

Python scripts for performing stereo depth estimation using the MobileStereoNet model in Tensorflow Lite.

codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

Graph Convolutional Neural Networks with Data-driven Graph Filter (GCNN-DDGF)